# Introduction to Scientific Programming in Python - 2

Created by Vahid Rostami. Accessed via Social ComQuant project https://socialcomquant.ku.edu.tr/,

and revised by M.Fuat Kına.

Important functions that are frequently used are stored in libraries. In this second day we will introduce two importan python libraries, namely **Numpy** and **Matplotlib**

## Numpy

Numpy (Numerical Python) is a core library in Python for numerical calculation. It consists of multidimensional arrays and a rich collection of methods for applying mathematical operations on those arrays.

In [None]:
# Let's look at nested lists

my_list = [1,2,5,[0,1],[2,3,5]]

In [None]:
my_list[4]

### Numpy Arrays

Numpy arrays are great alternatives to Python Lists. Specifically for scientific computing and numerical calculation numpy arrays are much easier to handle and do computation. 

How to create a numpy array:

In [None]:
# first we import numpy. Note that as we gonna call numpy again and again
# it is convenient to give a short name to like np
import numpy as np

arr1d = np.array([1,2,3]) # Create a one dimensional array
arr2d = np.array( [ [1,2,3] , [4,5,6] ] ) # Create two dimensional array

In [None]:
type(arr1d)

In [None]:
print(arr1d.shape)
print(arr2d.shape)

In [None]:
arr2d

###### Not: An n × m matrix A is a rectangular array of numbers with n rows and m columns.

Numpy also provides many functions to create arrays:

In [None]:
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

In [None]:
b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

In [None]:
c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

In [None]:
d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

In [None]:
e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

### Array indexing

To access any element(s) in a numpy array we use index, something similar to Python lists with a difference that for each dimension of array we will have one index. For example extracting the second value of the first dimension of the arr2d would be: `arr2d[0,2]`. 

Example:

In [None]:
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 3  7  3  4]
#  [ 5  6  7  2]
#  [ 2  1  1  1]]
a = np.array([[3,7,3,4], [5,6,7,2], [2,1,1,1]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
print(a)

In [None]:
b = a[:2, 1:3]
# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(b)

In [None]:
print(a[0, 1])   # Prints "7"

In [None]:
b[0, 0] = 3     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "3"

In [None]:
print(b)

In [None]:
print(a)

There are many usefull functions in Numpy. Here we mention some of them.

append, where, add, random, reshape, vstack, mean, median, std, isnan,

### Exercises (numpy)

1- Create a 3×3 numpy null array (all the values are zeros)

In [None]:
np.zeros((3,3))

*2*- Extract all odd numbers from an array 

hint: input: [0,1,2,3,4,5,6,7] and desired output: [1,3,5,7]

In [None]:
x = np.array([0,1,2,3,4,5,6,7])
y = np.array([])

for ele in x:
    if ele % 2 != 0:
        y = np.append(y, ele)
    else:
        continue
        
y

3- Convert a 1D array to a 2D array with 2 rows

hint: input: [0,1,2,3,4,5,6,7], desired output: array([[0,1,2,3], [4,5,6,7]])

In [None]:
x = np.array([0,1,2,3,4,5,6,7])

x.shape

In [None]:
x.reshape(2,4)

In [None]:
new_column = int(len(x)/2)

In [None]:
x_revised = x.reshape(2,new_column) # it gives an output, but does not rewrite the object of x
print(x_revised)
#OR x_revised = np.reshape(x,(2,new_column))

In [None]:
x_revised = np.reshape(x,(2,-1)) # the unspecified value is inferred by the program
print(x_revised)

4- Stack arrays `first_array` and `second_array` vertically.

```
first_array = np.arange(10).reshape(2,-1)
second_array = np.repeat(1, 10).reshape(2,-1)
```

In [None]:
first_array = np.arange(10)
print(first_array)
print(first_array.shape)

In [None]:
first_array = first_array.reshape(2,-1)
print(first_array)
print(first_array.shape)

In [None]:
second_array = np.repeat(1, 10)
print(second_array)
print(second_array.shape)

In [None]:
second_array = second_array.reshape(2,-1)
print(second_array)
print(second_array.shape)

In [None]:
stack = np.append(first_array, second_array)
print(stack)
print(stack.shape)

In [None]:
stack = np.reshape(stack,(4,-1))
print(stack)
print(stack.shape)

5- Get items if they are larger than 5 and smaller than 10, from an array of ```a```.

```
a = np.arange(0,15,2)
```

In [None]:
a = np.arange(0,15,2)
a

In [None]:
a = np.arange(0,15,2)
b = np.array([])

for ele in a:
    if ele < 10 and ele > 5:
        b = np.append(b,ele)
        
print(b)

6- Find the mean, median and standard deviation of a.

In [None]:
print(np.mean(a))
print(np.median(a))
print(np.std(a))

7- What are the indices of `a`'s elements, which are more than 0.5. Define these indices in a new variable called `idx`.

```
a = np.random.rand(1000)
```

In [None]:
a = np.random.random(1000)
idx = np.array([])

for i in range(len(a)):
    ele = a[i]
    if ele > 0.5:
        idx = np.append(idx,i)

print(len(idx))

8- Reverse a vector (first element becomes last).

In [None]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7])

rev = np.flip(x)
rev

In [None]:
# OR

x[::-1]

9- Create random vector of size 10 and insert "0" before the maximum value.

In [None]:
vector = np.random.rand(10)
print(vector)

In [None]:
print(np.argmax(vector))

In [None]:
vector_rev = np.insert(vector, np.argmax(vector), 0)
print(vector_rev)

10- Subtract the mean of each row of a matrix.

In [None]:
x = np.reshape(vector,(2,5))
print(x)

In [None]:
mean = np.mean(x) #second argument of mean function present the axis rows = 1, columns = 0 always 
print(mean)

row_means = np.mean(x, axis=1) #second argument of mean function present the axis rows = 1, columns = 0 always 
print(row_means)

In [None]:
column_means = np.mean(x, axis = 0)
print(column_means)

In [None]:
row_means = np.mean(x, axis = 1)
row_means

In [None]:
np.repeat(row_means, 5).shape

In [None]:
row_means_array = np.repeat(row_means, 5).reshape(2,5)
row_means_array

In [None]:
final = x - row_means_array
print(final)

How does repeat function work?

In [None]:
print(np.repeat(3, 4))
x = np.array([[1,2],[3,4]])
print(x)
print(np.repeat(x, 2))
print(np.repeat(x, 3, axis=0))
print(np.repeat(x, 3, axis=1))
print(np.repeat(x, [1, 2], axis=0))

11- Based on worldometers.info website, the world population on 17th of February 2019 at 11:00 am is ~7684621550 and the average growth rate per day is ~107000. 
```
# World population
world_pop = 7684621550
# average growth rate per day
growth_rate = 107000
```
Create an array (call it: `world_pop_arr`) which shows the world population from 17th of February until 26th of February. Now create a dictionary with all these information and save the dictionary into a file using `numpy.save`.


###### Not: the zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together, and then the second item in each passed iterator are paired together etc.

12- Write a _time_resolved function_ for an arbitrary array and an arbitrary step (or window size) that you define.

The function will take three objects as inputs:
`time_resolved(arr, w_size, shift)`

When I ask python to print `time_resolved(arr,3,4)` for `arr=np.array([1,2,6,4,5,4,2,7,8,9])`,
I expect to see `[4, 5, 4]` as the output of this function.

## Matplotlib

Visualization is one of the most important segments of data analysis. It allows us visual access to huge amounts of data in a simple and powerful way. The core library in Python used for visualization is matplotlib. Matplotlib development started with the idea of emulating Matlab commands in Python. By importing the following module,
```
import matplotlib.pyplot as plt
```
one will have access to many functions which are similar to Matlab function for plotting. Let's create some simple plots to learn the important syntaxes.

In [None]:
import matplotlib.pyplot as plt 

# data to be plotted
X = [1,2,3,4]
Y = [10,22,30,40]

# plot 
plt.plot(X, Y, 'r--1')
# define xlabel
plt.xlabel('X label')
# define ylabel
plt.ylabel('Y label')
# show the plotted figure
plt.show()

Check the following code and the produced figures. Can you tell what each line of code is doing? 

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# evenly sampled time at 200ms intervals
t = np.arange(0, 5, 0.2)

# red dashes, blue squares and green triangles
plt.plot(t, t, 'r', label='line one', linewidth=5)
plt.plot(t, t**2, 'b--^',label='line two', linewidth=2)
plt.plot(t,t**3, 'g-o', label='marker', markersize = 4)
plt.legend()
plt.grid(True, alpha=0.5 ,color='k')
plt.show()

In [None]:
import matplotlib.pyplot as plt
population_age = [22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]

In [None]:
bins = [0,10,20,30,40,50,60,70,80,90,100]

plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()

What you tried to do with the last two figures is the best way of learning matplotlib syntaxes and produces whatever figure you like. Probably you did the following: Checking the figure axes, colors, lines, labels, bars, etc and trying to figure out the line in the code responsible for that feature. This is a very efficient way to create figures and add whatever details you like. One of the best collections for these examples can be found at the following link:

[https://matplotlib.org/gallery.html](https://matplotlib.org/gallery.html )

Click on the link and find the figure you are interested in and check the code for that figure.

As a good exercise try to find out how one can plot multiple subplots in one plot?


### Exercises (matplotlib)

1- Plot two or more lines with legends, different widths and colors.

In [None]:
import matplotlib.pyplot as plt 

# data to be plotted
X = [1,2,3,4]
Y = [10,22,30,40]
Z = [3,5,7,9]
Q = [3,15,27,39]

plt.plot(X, Y, 'r-o', label="Y", alpha=0.4, linewidth=2)
plt.plot(X, Z, 'k--^', label="Z", linewidth=1)
plt.plot(X, Q, 'g-+', label="Q", linewidth=1)
# define xlabel
plt.xlabel('X label',)
# define ylabel
plt.ylabel('Y label')
plt.grid(True)

plt.title("Başlık")

plt.legend(loc=2)

plt.show

2- Generate random values from 0 to 1000 and plot these values. For the color of the plot, choose blue color for the values below 500 and red for those above 500.

In [None]:
import numpy as np

data=np.random.randint(0,1000,50)
x = np.arange(50)

data_red=np.array([])
data_red_index=np.array([])

for i in range(len(data)):
    if data[i]>500:
        data_red = np.append(data_red, data[i])
        data_red_index = np.append(data_red_index, i)

data_blue=np.array([])
data_blue_index=np.array([])

for i in range(len(data)):
    if data[i]<=500:
        data_blue = np.append(data_blue, data[i])
        data_blue_index = np.append(data_blue_index, i)
plt.plot(x,data,"k--", alpha=0.5)        
plt.plot(data_blue_index,data_blue,"b*")
plt.plot(data_red_index,data_red,"r^")
plt.show()

3- Now apply the _time_resolved analysis_ function that you wrote during numpy exercise on the data from exercise 4 and plot it in a new plot.

In [None]:
def time_resolved(arr,w_size,shift):
    sub_arr=np.array([])
    for i in range(shift-1,shift+w_size-1):
        sub_arr=np.append(sub_arr,arr[i])
    return sub_arr

In [None]:
import matplotlib.pyplot as plt 

rand_val=np.random.randint(0,1000,50)
print(rand_val)
nr=np.arange(1,51)
print(nr)
plt.plot(nr, rand_val, 'r')

w_size = int(input())
shift = int(input())

sub_arr = time_resolved(rand_val,w_size,shift)
nr2=np.arange(shift,w_size+shift)

print(sub_arr)
plt.plot(nr2,sub_arr,'b')
plt.show