<a href="https://colab.research.google.com/github/priyalimbu246/Valson-course/blob/main/class2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Python Bootcamp Notebook #2
In this notebook we will cover the following
- A brief review of Python lists
- `numpy` - an essential Python library for manipulating numeric data
- `matplotlib` - a core Python library for making plots and graphs

### A Brief Review of Python lists

Recall that list are **ordered, indexed, mutable, allow duplicates**.  We begin by defining a list with an integer, a float, and a string.

In [None]:
a_list = [1, 1.0, '1']

If we construct another list with the same items in the same order, the two lists are equal

In [None]:
# ordered
b_list = [1, 1.0, '1']
print(a_list == b_list)

However, if the list items are not in the same order. The lists will not be equal. Note that we swapped the order of the first two items.

In [None]:
c_list = [1.0, 1, '1']
print(c_list == a_list)

Lists are indexed, starting from **0**.

In [None]:
print(a_list[1])

Lists are mutable, meaning they can be changed.

In [None]:
a_list.append("5")
print(a_list)

Lists can contain dupilcates.

In [None]:
a_list.append(1)
print(a_list)

In addition to **append** which we saw above, Python has several other operators for lists.
- `reverse` - reverses a list in place
- `sort` - sorts a list in place
- `sorted` - an iterator that, when coupled with **list** returns a copy of a sorted list
- `reversed` - an iterator that, when coupled with **list** returns a copy of a reversed list
  
Try a few of these functions in the cell below.

In [None]:
unsorted_list = [10, 4, 3, 2, 18, 4]
sorted_list = list(reversed(sorted(unsorted_list)))
print(sorted_list)

### Numpy

[`numpy`](https://numpy.org/) is an extremely useful software library that allows for easy manipulation of data. It makes it easy to store, manipulate, read, and write data. All of this is permitted through the definition of the central object of `numpy`, the numpy array. It also allows for efficient calculations through a process called **vectorization** that will be explained shortly. These arrays can be generated from lists, as we will show soon, or they can be generated automatically using functions defined in the `numpy` library. Or they can be read in using I/O operations on already saved files. For these reasons, `numpy` has become a *de facto* essential library for scientific computing and coding in Python in general.

#### **Defining Numpy Arrays**
As mentioned above, a `numpy` array can be defined from a list.  It's important to note that, unlike a list, numpy arrays must contain the same datatype. Note that the convention is to `import numpy as np`.

In [None]:
import numpy as np

int_array = np.array([1, 2, 3])
float_array = np.array([1., 2., 3.])
str_array = np.array(['1', '2', '3'])
print(int_array)
print(float_array)
print(str_array)

[1 2 3]
[1. 2. 3.]
['1' '2' '3']


In [None]:
print(type(int_array), type(float_array), type(str_array))

In [None]:
print(int_array.dtype, float_array.dtype, str_array.dtype)

#### Numpy arrays

Numpy arrays are ordered, indexed, mutable, allows duplicates *(same as list)*

However, arrays differ from lists in 2 key ways:     
**1. All elements should be of the same data type**


Here all of the array elements are strings.

In [None]:
str_array = np.array(['1', '2', '3'])
print(str_array)
print(str_array.dtype)

In this case, two of the array elements are strings and one is an integer.  What happens to the array?

In [None]:
int_str_array = np.array([1, 2, '3'])
print(int_str_array)
# 'minority rules'
print(int_str_array.dtype)

In this case, we have two integers and a string.  What happens to the array? We get the lowest common denominator.   
**int >> float >> string**

In [None]:
int_float_array = np.array([1, 2, 3.0])
print(int_float_array)
print(int_float_array.dtype)

Now we have an integer, a float, and a string.  What happens to the array?

In [None]:
int_float_str_array = np.array([1, 2., '3.0'])
print(int_float_str_array)
print(int_float_str_array.dtype)

**2. Numpy arrays allow access to mathmatical operations.**    
These operations are different than what we have lists. First let's look at what happens when we add two lists.

In [None]:
a_list = [1, 1, 1]
b_list = [2, 2, 2]

# adding lists is concatenation
add_list = a_list + b_list

print(add_list)

We can write a loop to sequentially add the elements of `a_list` and `b_list`.

In [None]:
add_list = []

for x in range(3):
   add_list.append(a_list[x] + b_list[x])

print(add_list)

Numpy arrays can be added in a single operation.
```
a_array + b_array is the same as [a_array[0] + b_array[0], a_array[1] + b_array[1], a_array[2] + b_array[2]]
```

In [None]:
a_array = np.array(a_list)
b_array = np.array(b_list)

add_array = a_array + b_array

print(add_array)

Recall from above that the **+** operator concatenates lists.  To do the same with a numpy array, we can use the [`np.concatenate`](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html) function.

In [None]:
print(np.concatenate([a_array, b_array]))

 #### <span style="color:red">**Exercise**</span>
Create two numpy arrays `x` with [10,20,30] and `y` with [11,12,13].
- Create an array `z` that is the sum of `x` and `y`
- Create an array 'm' that is the concatenation of `x` and `y`

#### Working with Numpy Arrays

In addition to addition, which we showed above, numpy also supports element-wise subtraction, multiplication, and division.

In [None]:
a_array = np.array([1, 2, 3])

In [None]:
# array of same size
b_array = np.array([4, 5, 6])

print('element wise addition', a_array + b_array)
print('element wise substraction', a_array - b_array)
print('element wise multiplication', a_array * b_array)
print('element wise division', a_array / b_array)

Numpy also supports element-wise comparisons (e.g. >, <, ==).  To demonstrate this we'll using [`np.random.rand`](https://numpy.org/doc/2.1/reference/random/generated/numpy.random.rand.html) a function that generates numpy arrays of random numbers.  To better understand this function, we can look at its documentation.

In [None]:
help(np.random.rand)

To begin, we'll generate a numpy array with 4 rows and 3 columns of random values.

In [None]:
c_array = np.random.rand(4,3)
print(c_array)

To see which of these values is >0.5 we can write the code below. Note that the return value is an array, which can be used as a **mask**.

In [None]:
# test an element-wise condition
print(c_array > 0.5)

When we apply the mask, we get a 1-dimensional array containing the values > 0.5.

In [None]:
# grab all the elements of the array with values greater than or equal to 0.5
print(c_array[c_array > 0.5])

Numpy can also perform elment-wise comparsions of two arrays. Let's define another random 4x3 array.

In [None]:
d_array = np.random.rand(4,3)
print(d_array)

Now we can compare the corresponding elements of the two arrays.

In [None]:
# test again, but looking at each element individually
print(c_array > d_array)

 #### <span style="color:red">**Exercise**</span>

 Print out all the elements of `c_array` and `d_array` with values less than or equal to 0.25. Then, use Python to count the number of elements in each array with values greater than or equal to 0.25.

#### Indexing numpy arrays

`numpy` arrays are indexed as [row, column]

Diagramatically, for an array `c`.
```
c = np.array[
            [c11, c12, c13],
            [c21, c22, c23],
            [c31, c32, c33]
            ]
```

Again we start with `c_array`.

In [None]:
print(c_array)

Print the element in the first row and first column of `c_array`

In [None]:
print(c_array[0,0])

Try a few more.  What happens if we request a value beyond the bounds of the array?

In [None]:
c_array[5,5]

The **:** is a shorthand for all elements along an axis.

In [None]:
# slice out the first row of the array
# the colon (:) is a shorthand instruction for telling
# Python to return all elements along the given (or sliced)
# axis

# here, we are telling Python to return, in the first row, all elements,
# i.e. first row, all columns of this two-dimensional array
print(c_array[0,:])

In [None]:
# slice out all components of the array except the element in the first row and first column
print(c_array[1:,1:])

How can I get the first column of the numpy array?

In [None]:
# slice out all rows, first column of the array c_array
first_columns = c_array[:, 0]
first_columns

 #### <span style="color:red">**Exercise**</span>

Contruct a three-dimensional array using `np.random.rand`, and use slicing notation to get all the elements of the array in the first row, second column, and all elements along the third axis

### Broadcasting
The term 'broadcasting' loosely refers to performing an operation simultaneously over all the elements of a given array. Note that the `shape` attribute provides the number of rows and columns in a numpy array.

In [None]:
# begin with a list
example_lst = [[4, 10, 2], [3, 8, 1]]
# transform the list into an array
example_arr = np.array(example_lst)
print(example_arr)
print(example_arr.shape)

In [None]:
print(a_array)
print(a_array.shape)

In [None]:
c_array.shape

Any numpy array, regardless of shape can be added, substracted, multiplied and divided by a scalar values.

In [None]:
# broadcasting with a scalar
# add the value 2 to each element in the given arrays
b = 2

print('scalar addition', a_array + b)
print('scalar substraction', a_array - b)
print('scalar multiplication', a_array * b)
print('scalar division', a_array / b)

`a_array.shape` is (3,) and `c_array.shape` is (4,3).  Can we add these?

In [None]:
print(c_array)
print(c_array.shape)

In [None]:
# broadcasting with an array
# "Hadamard addition"
# both arrays must have the same shape
print(c_array + a_array)

In [None]:
c_array

The `reshape` operator can be used to transform the shape of a numpy array.  

In [None]:
# reshaping array
reshaped_c_array = c_array.reshape(6,2)
print("before reshape")
print(c_array)
print(c_array.shape)
print("after reshape")
print(reshaped_c_array)
print(reshaped_c_array.shape)

The arrays now have different sizes so they can't be added.

In [None]:
# broadcasting with an array -- they need to be the same size
print(reshaped_c_array + a_array)

If we reshape `reshaped_c_array` to have the same dimension as `a_array` addition will work.

In [None]:
# broadcasting with an array -- they need to be the same size
print(reshaped_c_array.reshape(4,3) + a_array)

In some cases, we may need to transform a numpy array into a list.  We can do this in either of two ways.
- cast the array with the `list` operator.
- use the `tolist()` method of the array

In [None]:
# converting array back to list
p_arr = np.array([10, 3, 2])
p_lst = list(p_arr) # p_arr.tolist()
p_lst.append(7)
p_arr = np.array(p_lst)
p_arr

### Important numpy functions and operations
In this section, we will cover a few of the key numpy functions and operations.    
First, [`np.linspace`](https://numpy.org/doc/2.1/reference/generated/numpy.linspace.html), which creates an array with evenly spaced floating numbers between a specified interval.
```python
np.linspace(start,stop,num)
```
`start` - starting value   
`stop` - ending value   
`num` - number of values inbetween   

In [None]:
example1 = np.linspace(10, 100, 10)
print(example1)
print(len(example1))

In [None]:
example2 = np.linspace(-100, 100, 5)
print(example2)
print(len(example2))

 #### <span style="color:red">**Exercise**</span>
 Use np.linspace to create this array. Notice that the values in the array are **integers**.
```
array([ 2,  4,  6,  8, 10, 12, 14, 16])
```

### Getting minimum and maximum of an array
numpy has many operations such as these for calculating both global and local properties of arrays

In [None]:
arr = np.array([1,5,6,6,2])
print(np.min(arr))
print(np.max(arr))

The `max` and `min` functions can operate on entire arrays or on slices of arrays.

In [None]:
arr = np.random.rand(4, 3)
print(arr)
# find the overall minimum of the array
print(np.min(arr))
# find the minimum value in each row.  If we set axis=0, we get the minimum for columns.
print(np.min(arr, axis=1))

#### Getting unique values

In [None]:
arr = np.array([1,1,2,3,3,4,5,6,8,6,2])
print(np.unique(arr))

# we can even get the frequency!
unique_elements, counts = np.unique(arr, return_counts=True)
print(counts)

# Can we print the number of times 6 appears in this list?
counts[unique_elements == 6]

#### Array creation
`numpy` contains lots of functions for automatic array creation:

In [None]:
ones = np.ones((2, 2)) # create an array of the given size filled with 1's
zeros = np.zeros((2, 2)) # create an array of the given size filled with 0's
identity = np.eye(2) # create an identity matrix (1's on the diagonal, 0's elsewhere) of the given size

print(ones)
print(zeros)
print(identity)

`np.diag` can be used to set the diagonal elements of an array.

In [None]:
np.diag((1,1,1))

The `k` option in `np.diag` can be used to set the elements above (positive values) or below (negative values) the diagonal.   

In [None]:
np.diag([1,1],k=1)

#### <span style="color:red">**Exercise**</span>

Generate the following array using only automatic `numpy` array creation operations (e.g. `np.eye`,`np.ones`,`np.diag`,...):

```
array([[0,1,2],
       [2,0,1],
       [2,1,0]])
```

### Array axes

See the [numpy documentation](https://numpy.org/doc/stable/user/basics.indexing.html) for fuller explanations.

Conceptually, arrays can be though of as matrices of arbitrary size and dimensionality. For a two-dimensional array, it's basically a two-dimensional matrix with the first index corresponding to the rows of the matrix and the second index corresponding to the columns.

In [None]:
print(c_array)

If we don't specify an axis, the `np.mean` function calculates the mean for the entire array.

In [None]:
print(np.mean(c_array)) # global mean

If we specify `axis=0` we get the mean along columns.

In [None]:
print(np.mean(c_array, axis=0)) # mean along the columns

If we specify axis=1 we get the mean along rows.  

I remember this as **columns comes alphabetically before rows, 0 comes numerically before 1.**

In [None]:
print(np.mean(c_array, axis=1)) # mean along the rows

#### Playing with N-Dimensional Arrays

Let's say we have an array of shape 2 by 2 (2 rows and 2 columns)

In [None]:
A = np.random.randint(10, size=(2, 2))
print(A)

We can reshape it to a 1D array

In [None]:
flatten = A.reshape(-1)
print(flatten)
print(flatten.shape)

If we want to keep both dimensions, we can reshape it to a 4 x 1 array (column vector)

In [None]:
flatten = A.reshape(-1, 1)
print(flatten)
print(flatten.shape)

Or we can have it as a 1 x 4 array (row vector)

In [None]:
flatten = A.reshape(1, -1)
print(flatten)
print(flatten.shape)

#### Adding and removing dimensions from arrays
Sometimes we need to add or remove dimensions to our arrays. Let's start with a 5x1 array

In [None]:
A = np.random.randint(10, size=(5))
print(A, A.shape)

The [`np.expand_dims`](https://numpy.org/doc/stable/reference/generated/numpy.expand_dims.html) fuction can be used to expand the dimensions of an array. Let's say we need to expand the dimension to make the array `A` a 2D array.

In [None]:
A_exp = np.expand_dims(A, axis=1)
print(A_exp, A_exp.shape)

The [`np.squeeze`](https://numpy.org/doc/2.2/reference/generated/numpy.squeeze.html) function does the opposize and removes a dimension from an array. Let's begin with a 3D array.

In [None]:
A_3d = np.random.randint(10, size=(3, 3, 1))
print(A_3d)
print(A_3d.shape)

Now let's squeeze the 3D array to a 2D array.

In [None]:
A_2d = A_3d.squeeze(axis=2)
print(A_2d)
print(A_2d.shape)

 #### <span style="color:red">**Exercise**</span>

"Unsqueeze" `A_2d` back into `A_3d`

Better yet, write a function that will take a generic two-dimensional array and unsqueeze it to a three-dimensional array with dimension one along the third axis.

### Matrix operations
Numpy can also do matrix operations

#### Matrix Inner (dot) product

Given two vectors, u and v, with components in an n-dimensional space:
```
u = <u_1, u_2, ..., u_n>     
v = <v_1, v_2, ..., v_n>
```

Their dot product is calculated by multiplying their corresponding components and then summing the results:

```
u . v = u_1 * v_1 + u_2 * v_2 + ... + u_n * v_n
```

Example: For vectors u = <1, 2, 3> and v = <4, 5, 6>:  
```
u . v = (1)*(4) + (2)*(5) + (3)*(6) = 4 + 10 + 18 = 32
```

In [None]:
# Inner Product
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Dot Product: ", np.dot(a, b))

#### Matrix transpose

Let's take a matrix A:
```
A = [ 1 2 3 ]
    [ 4 5 6 ]
```

Here, A is a 2 x 3 matrix.

To find its transpose, A^T, we turn its rows into columns:

The first row [1 2 3] becomes the first column of A^T.
The second row [4 5 6] becomes the second column of A^T.

So, A^T is:

```
A^T = [ 1 4 ]
      [ 2 5 ]
      [ 3 6 ]
```

Notice that A^T is a 3 x 2 matrix.

In [None]:
# Transpose
a = np.array([[1, 2, 3], [4, 5, 6]])
print("a =")
print(a)
print("a transposed =")
print(a.transpose())

Note that the output of `transpose` is different from that of `reshape`.

In [None]:
a.reshape(3, 2)

#### Matrix Multiplication
The `@` operator can be used to multiply two matrics.  If we multiply two matrics `A` and `B`, the number columns in `A` must equal the number of rows in `B`.

Let's consider two matrices, A and B:
```
A = [ 1 2 ]
    [ 3 4 ]

B = [ 5 6 ]
    [ 7 8 ]
```

Here, A is a 2 x 2 matrix and B is a 2 x 2 matrix. Since the number of columns in A (2) equals the number of rows in B (2), their product AB is defined and will be a 2 x 2 matrix.

Let C = AB. We calculate each element c_ij:

c_11 (1st row of A, 1st column of B):
(1)*(5) + (2)*(7) = 5 + 14 = 19

c_12 (1st row of A, 2nd column of B):
(1)*(6) + (2)*(8) = 6 + 16 = 22

c_21 (2nd row of A, 1st column of B):
(3)*(5) + (4)*(7) = 15 + 28 = 43

c_22 (2nd row of A, 2nd column of B):
(3)*(6) + (4)*(8) = 18 + 32 = 50

So, the resulting matrix C is:
```
C = [ 19 22 ]
    [ 43 50 ]
```

In [None]:
a = np.array([[1, 2], [3, 4]])
print("a =")
print(a)
print("a.shape =",a.shape)

b = np.array([[5, 6], [7, 8]])
print("b =")
print(b)
print("b.shape = ",b.shape)

product = a @ b
print("product =")
print(product)
print("product.shape = ",product.shape)

Equivalently, we could use np.matmul(a, b)

In [None]:
np.matmul(a, b)

If the dimensions of the matrices are not compatible, we must reshape one of the matrices. In the example below the number of columns in `A` is not the same as the number of rows in `B`.  Consequently, the matrix multiplication is undefined.

In [None]:
a = np.array([[1, 2], [3, 4], [5, 6]])
print(a)
print("a.shape = ",a.shape)

b = np.array([[[0, 8], [5, 9], [3, 2]]])
print("b =")
print(b)
print("b.shape = ",b.shape)

a @ b

To fix this problem, we can reshape the matrix `B`.

In [None]:
# How do I compute matrix multiplication between A and B now?
b_reshaped = np.squeeze(b, axis=0).transpose()
print("b reshaped: ", b_reshaped.shape)

product = a @ b_reshaped
print("product =")
print(product)
print(product.shape )

### Data handling

Storing data and the results of your programming efforts is important for working over multiple sessions and sharing your results with collaborators. When Python closes, all the variables in the memory are lost, so data must be stored in the file system.

To work with text files, we need to use open function which returns a file object. It is commonly used with two arguments:

```
f = open(filename, mode)
```

`f` is the returned file object. The filename is a string where the location of the file you want to open, and the mode is another string containing a few characters describing the way in which the file will be used, the common modes are:



- ‚Äòr‚Äô, this is the default mode, which opens a file for reading
- ‚Äòw‚Äô, this mode opens a file for writing, if the file does not exist, it creates a new file.
- ‚Äòa‚Äô, open a file in append mode, append data to end of file. If the file does not exist, it creates a new file.
- ‚Äòb‚Äô, open a file in binary mode.
- ‚Äòr+‚Äô, open a file (do not create) for reading and writing.
- ‚Äòw+‚Äô, open or create a file for writing and reading, discard existing contents.
- ‚Äòa+‚Äô, open or create file for reading and writing, and append data to end of file.


Write into a file. The second argument `w` indicates that the file should be opened in `write` mode. If the file exists, it will be overwritten.

In [None]:
f = open('test.txt', 'w')
for i in range(5):
    f.write(f"This is line {i}\n")

f.close()

Append into an existing file.  The second argument `a` indicates that the file should be opened in `write` mode.  

In [None]:
f = open('test.txt', 'a')
f.write(f"This is another line\n")
f.close()

Read a file. The second argument `r` indicates that the file should be opened in `read` mode.

In [None]:
f = open('./test.txt', 'r')
content = f.read()
f.close()
print(content)

 We can use this I/O technique to read files into numpy arrays. Note how we use the `with` operator to open the file.  When do this, the file is automatically closed.

In [None]:
dummy = []
with open('test.txt', 'r+') as f:
    for line in f:
        dummy.append(line)
dummy = np.array(dummy)
print(dummy)
print(dummy.shape)

Using this way, we could store all the lines in the file into one string variable, we could verify that variable content is a string.

In [None]:
content = dummy[0]
content

In [None]:
type(content)

But sometimes we want to read in the contents in the files line by line and store it in a list. We could use `f.readlines()` to achieve this.

In [None]:
f = open('./test.txt', 'r')
contents = f.readlines()
f.close()
print(contents)
print(type(contents))

In [None]:
contents[0]

When we work with numbers or arrays, we can use the numpy package to directly save/read an array.

In [None]:
arr = np.array([[1.20, 2.20, 3.00], [4.14, 5.65, 6.42]])

We can save a numpy array to disk using the `np.savetxt` function.

In [None]:
np.savetxt?

Let's save the array `arr` to disk.  The first argument is the file name, second argument is the arr object we save, and the third argument is the format for the output (‚Äò%.2f‚Äô indicates 2 decimals). The fourth argument is the header.

In [None]:
np.savetxt('my_arr.txt', arr, fmt='%.2f', header = 'Col1 Col2 Col3')

We can load the array back into memory as follows.

In [None]:
my_arr = np.loadtxt('my_arr.txt')
my_arr

After reading from disk, the array is ready to use.

In [None]:
my_arr.dot(my_arr.T)

#### Reading and writing CSV files.
Scientific data are sometimes stored in the comma-separated values (CSV) file format, a delimited text file that uses a comma to separate values. It is a very useful format that can store large tables of data (numbers and text) in plain text. Each line (row) in the data is one data record, and each record consists of one or more fields, separated by commas. It also can be opened using Microsoft Excel.

Python has its own csv module that could handle the reading and writing of the csv file, but we can also use numpy.

In [None]:
data = np.random.random((100,5))
# the delimiter argument instructs the function to place the indicated delimiter
# (in this case a comma) between each of the values instead of the default space
np.savetxt('test.csv', data, fmt = '%.2f', delimiter=',', header = 'c1, c2, c3, c4, c5')

Download data from an online csv file.

In [None]:
# np.loadtxt can be used to read in CSV files
import numpy as np
url = 'https://raw.githubusercontent.com/hsiav2000/simple-regression/master/Salary_Data.csv'
data = np.loadtxt(url, delimiter = ',', skiprows = 1)

#### A preview of the next section
Now, we can plot and manipulate the visualization of the data using another Python library called [`matplotlib`](https://matplotlib.org/). More on this in the next section...

In [None]:
import matplotlib.pyplot as plt
plt.scatter(data[:,0], data[:,1])
plt.xlabel('years of experience')
plt.ylabel('salary')
plt.tight_layout()
plt.show()
plt.close()

 #### <span style="color:red">**Exercise**</span>

Look up `np.genfromtxt` and load the CSV file using this function instead of `np.loadtxt.`

### Visualizing Data with Matplotlib

Visualizing data is usually the best way to convey important engineering and science ideas and information, especially if the information is made up of many numbers. The ability to visualize and plot data quickly and in many different ways is one of Python‚Äôs most powerful features.

Python has numerous graphics functions that enable you to efficiently display plots, surfaces, volumes, vector fields, histograms, animations, and many other data plots. The most common package for visualization in Python is [matplotlib](https://matplotlib.org/).

Have a look at the [matplotlib gallery](https://matplotlib.org/stable/gallery/index.html) and get a sense of what could be done there. We'll cover the basic syntax for plotting here.

#### Basic plotting

The matplotlib package is typically used as plt. Pyplot is a useful module within matplotlib for Jupyter notebooks.

In [None]:
import matplotlib.pyplot as plt

Given the lists x = [0, 1, 2, 3] and y = [0, 1, 4, 9], use the plot function to produce a plot of x versus y.    
Note the `plt.plot` creates a line plot.  If we change this to `plt.scatter` we get a scatter plot.

In [None]:
x = [0, 1, 2, 3]
y = [0, 1, 4, 9]
plt.plot(x, y) # plot the data
plt.xlabel('x') # set the label of the x-axis
plt.ylabel('y') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

By default, each point is connected with a blue line. To make the function look smooth, use a finer grid for the x-axis.

Let's plot the parabolic function $y = x^2$ on the domain $x\in[-5,5]$.

Try changing the number of values to 10 or 5 and look at the impact on the curve.

In [None]:
x = np.linspace(-5,5,100) # generate a grid of x values
y = x**2 # evaluate the function y := f(x) = x**2 for each of the given values of x...this is broadcasting the
         # the squared function over the grid of x values

plt.plot(x,y)

plt.xlabel('x') # set the label of the x-axis
plt.ylabel('y') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

You can play around with various customizations. Plot the sine function with green dashed lines and a star marking data points.
The argument `ls` sets the line style. Try `help(plt.plot)` to see the available options.

In [None]:
x = np.linspace(-np.pi, np.pi, 100)
y = np.sin(x)

In [None]:
plt.plot(x[::10], y[::10], color = 'green', ls = '--', marker = '*')

plt.xlabel('x') # set the label of the x-axis
plt.ylabel('sin(x)') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

You can chose predefined styles for your plots.

In [None]:
print(plt.style.available)
# plt.style.use('seaborn-paper')

In [None]:
plt.plot(x,np.sin(x), color='tab:blue',  linestyle='--', linewidth=2, label=r'$\sin x$')
plt.plot(x,np.cos(x), color='tab:orange', linestyle='-.',linewidth=6, label=r'$\cos x$')
plt.title('Phase-shifted waves')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.legend(loc='lower right')
plt.ylim(-1,1)
plt.xlim(-2,2)
plt.show()
plt.close()

Scatter plots work exactly the same as regular plots above except they have default behavior where the dots are not connected.

In [None]:
# Generate 20, normally distributed, random points
x, y = np.random.randn(2, 20)

plt.scatter(x, y)
plt.xlabel('x') # set the label of the x-axis
plt.ylabel('y') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

In [None]:
plt.plot(x, y,'o', color='tab:blue')
plt.xlabel('x') # set the label of the x-axis
plt.ylabel('y') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

Data points with a linear relationship

In [None]:
x = np.arange(100)
delta = np.random.poisson(40, size=100)

# y is a linear function of x, with some added noise
y = 0.8*x + 5 + delta

plt.scatter(x, y)

plt.xlabel('x') # set the label of the x-axis
plt.ylabel('y') # set the label of the y-axis
plt.show() # display the plot
plt.close() # close the plotting object after displaying the plot

There are several other plotting functions that plot x versus y data. Some of them are `bar`, `loglog`, `semilogx`, and `semilogy`.  The bar function plots bars centered at x with height y. The loglog, semilogx, and semilogy functions plot the data in x and y with the x and y axis on a log scale, the x axis on a log scale and the y axis on a linear scale, and the y axis on a log scale and the x axis on a linear scale, respectively.

In [None]:
x = np.arange(11)
y = x**2

plt.figure(figsize = (14, 8))

plt.subplot(2, 3, 1)
plt.plot(x,y)
plt.title('Plot')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid()

plt.subplot(2, 3, 2)
plt.scatter(x,y)
plt.title('Scatter')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid()

plt.subplot(2, 3, 3)
plt.bar(x,y)
plt.title('Bar')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid()

plt.subplot(2, 3, 4)
plt.loglog(x,y)
plt.title('Loglog')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(which='both')

plt.subplot(2, 3, 5)
plt.semilogx(x,y)
plt.title('Semilogx')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(which='both')

plt.subplot(2, 3, 6)
plt.semilogy(x,y)
plt.title('Semilogy')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid()

plt.tight_layout()
plt.show()
plt.close()


The statement `plt.tight_layout` ensures that the sub-figures not overlap with each other.

Sometimes, you want to save the figures in a specific format, such as pdf, jpeg, png, and so on. You can do this with the function plt.savefig.

In [None]:
plt.figure(figsize = (8,6))
plt.plot(x,y)
plt.xlabel('x')
plt.ylabel('y')
plt.savefig('image.png')

Data points with a linear relationship

In [None]:
x = np.arange(100)
delta = np.random.poisson(40, size=100)

y = 0.8*x + 5 + delta

plt.scatter(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.show()
plt.close()

#### "Object-oriented" plotting (optional)

So far, we used the procedural interface to make plots. You can get more granular control of yor plot by using the object-oriented interface.

You start the plot with the figure object.

In [None]:
fig = plt.figure()

In [None]:
type(fig)

In [None]:
fig = plt.figure()
ax = fig.add_subplot(1,1,1,)
# Set the title of plot
ax.set_title("Empty plot")

In [None]:
fig = plt.figure()

# Generate a grid of 2x2 subplots
# Axes object for 1st location
ax1 = fig.add_subplot(2,2,1)
ax1.set_title('First Location')

# Axes object for 2nd location
ax2 = fig.add_subplot(2,2,2)
ax2.set_title('Second Location')

# Axes object for 3rd location
ax3 = fig.add_subplot(2,2,3)
ax3.set_xlabel('Third Location')

# Axes object for 4th location
ax4 = fig.add_subplot(2,2,4)
ax4.set_xlabel('Fourth Location')

# Nice layout and display
plt.tight_layout()
plt.show()

In [None]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
ax1.set_title('First Location')
ax2.set_title('Second Location')
ax3.set_xlabel('Third Location')
ax4.set_xlabel('Fourth Location')
plt.tight_layout()

In [None]:
# Import ticker to control tick labels and positions
import matplotlib.ticker as tck

# Generate the x-axes grid
x = np.linspace(-np.pi, np.pi, 100, endpoint=True)

# Figure
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True)
fig.set_dpi(100)
fig.set_size_inches(10,6)

# First plot - sine
ax1.plot(x, np.sin(x))
ax1.set_title("sin")
ax1.set_ylabel("y")

# Second plot - cosine
ax2.plot(x, np.cos(x))
ax2.set_title("cos")
ax2.set_ylabel("y")

# Third plot - tangens
ax3.plot(x, np.tan(x))
ax3.set_title("tan")
ax3.set_ylabel("y")
ax3.set_xlabel("x")

# Set ticks for the shared axes on the bottom of the plot
x_ticks = np.arange(-np.pi,np.pi+np.pi/2,step=(np.pi/2))
ax3.set_xticks(x_ticks, [r'$-\pi$', r'$-\frac{\pi}{2}$', r'$0$', r'$\frac{\pi}{2}$', r'$\pi$'])

plt.tight_layout()

#### <span style="color:red">**Exercises**</span>

1. Plot the functions $y_1(x)=3+e^{‚àíùë•}\sin(6 x)$ and $y_2(x)=4+e^{-x} \cos(6x)$ for $0\leq x \leq 5$ on a single axis. Give the plot axis labels, a title, and a legend.
2. A cycloid is the curve traced by a point located on the edge of a wheel rolling along a flat surface. The $(x,y)$ coordinates of a cycloid generated from a wheel with radius, $r$, can be described by the parametric equations:
$$ x = r (\phi - \sin \phi), \qquad  y = r (1-\cos\phi) $$
where $\phi$ is the number in radians that the wheel has rolled through. Generate a plot of the cycloid for $\phi \in [0,2\pi]$ using 1000 increments and $r=3$. Give your plot a title and labels. Turn the grid on and modify the axis limits to make the plot neat.

3. Generate 1000 normally distributed random numbers using the `np.random.randn` function. Use the `plt.hist` function to plot a histogram of the randomly generated numbers. Use the `plt.hist` function to distribute the randomly generated numbers into 10 bins. Create a bar graph of output of hist using the `plt.bar` function. It should look very similar to the plot produced by `plt.hist`.
4. Another Python library commonly used in scientifc computing is the [scipy library](https://scipy.org/). One useful tool it possesses is the ability to fit data to arbitrary, continuous functions using the `curve_fit` function. For this exercise, import the `curve_fit` function from scipy using the following command: `from scipy import optimize.curve_fit`. Use `help(curve_fit)` to figure out what arguments the function takes and in what order. Then, use the following commands from the cell above to generate a set of (x,y) order pairs and see if the `curve_fit` function can reproduce the expected slope and intercept of a linear fit.
```
x = np.arange(100)
delta = np.random.poisson(40, size=100)
y = 0.8*x + 5 + delta
```