This tutorial covers the basics of python programming. It introduces some of the built-in python module and how to use them.

Author: Collab.

# The Numpy Array Object

Numpy is a fundamental scientific package (module) for multi-dimensional arrays. Arrays are similar to lists, but the elements of an array must be of the same type, e.g, float, int or string. To begin with, we need to import the numpy package. In most cases np is used as an abbreviation for numpy.

In [None]:
import numpy as np

## Creating Numpy Arrays

Numpy arrays can be created in various fashions. They can be created manually or initialised using the in-built functions, e.g np.array(), np.zeros(), np.ones(), np.arange(), np.linspace, np.indices. The in-built function np.array does not require any parameter and it returns an empty array. Given the desired shape, np.zeros returns an array filled with zeros. For instance, np.zeros((2,2)) returns a 2×2 array filled with zeros. Other parameters also can be included such as type and order; they are optional paremters. Same applies to np.ones, but now the array is filled with ones instead. np.arange takes start, stop and step values as parameters. It returns an array with regularly specified incrementing values, the default start is 0 and the default increment is 1. np.linspace returns an array with evenly spaced elements over a specified interval. np.indices() acts like a grid, it returns an array representing the indices of a grid.

Let us give it a try!

In [None]:
A= np.array([[2,3,4],[6,7,8]],dtype='int')
A

In [None]:
B = np.zeros((2,3))
B

In [None]:
C = np.ones((2,4),dtype='float')
C

In [None]:
D = np.arange(2,10)
D

In [None]:
E = np.linspace(0,1,30)
E

In [None]:
F = np.indices((3,3))
F

Once we have initialised an array, we can get hold of its shape, dimension and type of the elements.

A.shape: shape of array A

A.ndim: dimension of array A

A.dtype: type of the elements in array A

In [None]:
A.shape, A.ndim, A.dtype

## Array Manipulation

### Basic Operations

Numpy arrays basically behave similar to matrices but they are not matrices. The basic operations such as addition, subtraction and multiplication can be easily performed on them. Note that if it is a binary operation, then the numpy arrays involved must be of the same shape.

In [None]:
A = np.array([(1,7,3),(4,5,6)]) ;B = np.array([(8,12,1),(6,9,11)])
A ** 2 #Squaring the elements in the array

In [None]:
A + B # Adding two numpy array

In [None]:
A - B # Subtracting two numpy arrays

In [None]:
A * B # Elementwise multiplication of two numpy arrays

In [None]:
np.dot(A,B.T) # matrix multiplication; B.T returns the transpose of B

### Shape Manipulation

The shape of a numpy array is given by the number of elements along the individual axis and it can be manipulated through various commands.

In [None]:
A = np.arange(8) # one dimensional array of shape (8,)

In [None]:
B = np.arange(2,10) # one dimensional array of shape (8,)

In [None]:
A.reshape(4,2) #changing the shape of A to (4,2)

In [None]:
np.concatenate((A,B),axis=0) #concatening two one-dimensional numpy arrays

In [None]:
np.hstack((A,B)) #stack arrays in sequence horizontally (column wise); similar to concatenate

In [None]:
np.vstack((A,B)) #stack arrays in sequence horizontally (row wise).

In [None]:
np.dstack((A,B)) #stack arrays in sequence depth wise (along third dimension)

In [None]:
C = A.reshape(2,4) #reshaping the array
print (C)
np.vsplit(C,2) # spliting the array into subarrays

In [None]:
np.append(A,17) #adding a value (17) at the end of array A

In [None]:
np.append(17,A) #adding a value (17) at the beginning of array A

In [None]:
np.append(C,10) #appending an element to a (4,2) array. This returns a 1-dimensional array

### Indexing and Slicing

Indexing refers to the process of assigning indinces or positions of an elemnt in the array. Slicing is taking out a desired portion or subarray of the array. The basic slice syntax is $i:j:k$ where $i$ is the starting index, $j$ is the stopping index, and $k$ is the step ($k≠0$). The starting index is zero and the end index is $n−1$ for an n-dimensional array.

In [None]:
A = np.array([(2,4,5,6),(7,10,23,13),(45,0,23,5)])

In [None]:
A[0] # returns the first row of the array

In [None]:
A[0,:] #returns the first row of the array

In [None]:
A[:,1] #returns the second column of the array

In [None]:
A[:,-1] #returns the last column of the array

In [None]:
A[0,0] #returns the first element of the array

In [None]:
A[0:2,1:3] # returns subarray ranging from first to second row and second to third column  

The dimension of the array can be increased by using new axis.

In [None]:
a = A[np.newaxis,:]
b = A[:,np.newaxis]
a.shape, b.shape

### Statistical Functions

In [None]:
np.average(A) #returns the average value

In [None]:
np.mean(A) #returns the mean value

In [None]:
np.median(A) #returns the median

In [None]:
np.var(A) # returns the variance

In [None]:
np.std(A) #returns the standard deviation

In [None]:
np.cov(A) #returns the covariance

### Linear Algebra Functions

In [None]:
a = np.array([(2,3),(4,5)])
b = np.array([(7,1),(3,9)])

In [None]:
np.inner(a,b) #returns the inner product of two arrays

In [None]:
np.kron(a,b) #Kronecker product of two array

In [None]:
np.linalg.eig(a) #returns the eingenvalues and right eigenvectors of a square array

In [None]:
np.linalg.eigh(a) #returns the eigenvalues and eigenvectors of a Hermitian or symmetric matrix

In [None]:
np.linalg.eigvals(a) #returns the eigenvalues of a matrix

In [None]:
np.linalg.eigvalsh(a) #returns the eigenvalues of a Hermitian or symmetric matrix

In [None]:
np.linalg.det(a) #computes the determinant of the matrix

In [None]:
np.trace(a) #returns the trace of the matrix

In [None]:
np.linalg.inv(a) #returns the inverse of the matrix

### Other Functions

In [None]:
A = np.array([3,6,3,1,7,4,2,8])
np.copy(A) # returns a copy of the array

In [None]:
np.sort(A) # sorting the array in ascending order

In [None]:
np.sort(A)[::-1] # sorting the array in descending order

In [None]:
np.argsort(A) #returns the indices that would sort an array.

In [None]:
np.min(A) #returns the maximum element

In [None]:
np.max(A) #returns the maximum element

In [None]:
np.argmax(A) #returns the index of the maximum element

In [None]:
np.argwhere(A>3) #returns all the indices whose values are greater than 3 

In [None]:
np.mat(A) #returns matrix format of the array

In [None]:
np.sum(A) #returns the sum of all the elements

In [None]:
np.cumsum(A) #returns the sum of all the elements

In [None]:
np.prod(A) #returns the product of all the elements

# Matplotlib Pyplot- 2D Plotting

Matplotlib is a python plotting library with excellent 2D and 3D graphics.

We need to import matplotlib library to be able to access the library.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

## Simple Plots

In [None]:
x = np.linspace(0,3,100)
y = np.sin(x)
plt.xlabel('x')
plt.ylabel('sin x')
plt.plot(x,y)
plt.show()

The sin curve can be enhanced in various fashion. By default, the color is blue and the marker is a straight line. The basic in-built colors in matplotlib are: b-blue, g-green, r-red, c-cyan, m-magenta, y-yellow, k-black, w-white. For more information, visit http://matplotlib.org/api/colors_api.html.

In [None]:
# Comparing the distance covered by Mark,and Sam over an interval of time
time = np.arange(0,50,2)
dMark = np.array([0,20,40,59,76,93,109,125,139,152,162,172,181,189,197,205,213,220,227,233,239,243,247,251,253])
dSam = np.array([0,25,49,71,91,110,129,150,169,188,205,222,237,249,261,271,281,290,298,306,313,320,325,329,331])
plt.plot(time,dMark,'b*',label='Mark')
plt.plot(time,dSam,'r*',label='Sam')
plt.xlabel('Time in minutes')
plt.ylabel('Distance covered in metres')
plt.legend(loc='upper left')
plt.xlim(0,50)
plt.ylim(0,350)
plt.show()

## Scatter Plots

In [None]:
# Making a scatter plot from random distributions 
x = np.arange(20)
y = np.random.rand(20)
z = np.random.rand(20)
plt.scatter(x,y,s=50,facecolor='b',edgecolor='k',alpha=0.5,marker='d',label='Battlefield1')
plt.scatter(x,z,s=120,facecolor='r',edgecolor='k',alpha=0.3,marker='d',label='Battlefield2')
plt.legend(loc='upper left')
plt.ylim(-0.2,1.2)
plt.show()

In [None]:
N = 10
data = np.random.random((10,4))
labels = ['{0}'.format(i) for i in range(N)]
plt.subplots_adjust(bottom = 0.1)
plt.scatter(
    data[:, 0], data[:, 1], marker = 'o', c = data[:, 2], s = data[:, 3]*1500,
    cmap = plt.get_cmap('cool'))

for label, x, y in zip(labels, data[:, 0], data[:, 1]):
    plt.annotate(label, xy = (x, y), xytext = (-10, 10),
        textcoords = 'offset points', ha = 'right', va = 'bottom',
        bbox=dict(boxstyle='round,pad=0.3',fc='red', alpha=0.4),arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
    
plt.show()

## Histograms

In [None]:
from numpy.random import normal,poisson,uniform
f, ax = plt.subplots(3)
gaussian_numbers = normal(size=1000)
poisson_numbers = poisson(4,1000)
uniform_numbers = uniform(low=-2, high=2,size=1000)
ax[0].hist(gaussian_numbers,bins=50,facecolor='b',alpha=0.7)
ax[1].hist(poisson_numbers,bins=50,facecolor='g',alpha=0.7)
ax[2].hist(uniform_numbers,bins=50,facecolor='r',alpha=0.7)
ax[0].set_title("Gaussian Histogram")
ax[1].set_title("Poisson Histogram")
ax[2].set_title("Uniform Histogram")
ax[0].set_ylabel("Frequency")
ax[1].set_ylabel("Frequency")
ax[2].set_ylabel("Frequency")
plt.tight_layout()
plt.show()

When we increase the value for lambda in the poisson distribution, the histrogram approaches the shape of the gaussion. Give it a try!

Now let us merge the gaussian and uniform distribution since their values lie in the same range.

In [None]:
plt.hist(gaussian_numbers,bins=20,facecolor='b',alpha=0.7,histtype='stepfilled', normed=True,label='Gaussian')
plt.hist(uniform_numbers,bins=20,facecolor='r',alpha=0.5,histtype='stepfilled', normed=True,label='Uniform')
plt.title("Gaussian/Uniform Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()

## Bar Charts

In [None]:
# Showing the average percentage scores success for the past 5 years at Hudson Park Primary School by each gender
MScores = [72.2,78.9,75.3,80,81.4]
FScores = [77.2,80.9,78.1,84,86.2]
MStd = [3,5,4,3,5]
FStd = [5,4,3,6,4]
ind = np.arange(5)
year = ['2009','2010','2011','2012','2013','2014']
plt.bar(ind,MScores,width=0.3,color='y',yerr=MStd,edgecolor='r',linewidth=2,label='Male')
plt.bar(ind+0.3,FScores,width=0.3,color='c',yerr=FStd,edgecolor='r',linewidth=2,label='Female')
plt.xticks(ind,year,rotation=45)
plt.title("Average passing rate for past 5 years")
plt.ylabel("Percentage Success ($\%$)")
plt.ylim(0,120)
plt.legend()
plt.show()

# SciPy Module

SciPy is an extension of the Numpy package. It is collection of mathematical algorithms. Here we will have a look at the complex scipy functions that are not availble in the Numpy package. Similar to numpy, we need to import scipy.

In [None]:
import scipy

## Numerical Integration and ODEs

Suppose we want to compute the integral
\begin{equation}
I = \int_0^1 2e^{-2\pi i t}~dx.
\end{equation}
First, we need to import the function quad from a sub-module in scipy.

In [None]:
from scipy.integrate import quad
quad(lambda t:2*np.exp(-2*1j*np.pi*t),0,1)

The result is 8.326672684688674e-17 and the second value is the error associated with the computation.

Consider the following ordinary differential equation (ode):
    \begin{equation}
    y = \frac{dy}{dt}.
    \end{equation}
The initial condition is 1 and the points over which the ode needs to be solved is 10.

In [None]:
from scipy.integrate import odeint
odeint(lambda y, t: y, 1, np.linspace(0,1,10))

## Interpolation

There are in-built functions in scipy that allow us to perform univariate and multivariate interpolation easily.

In [None]:
from scipy import interpolate
x = np.arange(4, 20,)
y = np.exp(-x/3.0)
f = interpolate.interp1d(x, y) # interpolating a 1D function

xnew = np.linspace(4,19,1000)
plt.plot(x,y,'ro',label='data point')
plt.plot(xnew,f(xnew),'g--',label='interpolated') # f(xnew) gives the interpolated points
plt.legend()
plt.show()


Next, we are going to do a univariate spline interpolation. Spline interpolation is a special type of piecewise function which is continuous at the intervals and at the interpolating nodes.

In [None]:
from scipy.interpolate import UnivariateSpline
x = np.linspace(-4, 4, 100)
y = np.sin(x**2) + np.random.randn(100)/10
s = UnivariateSpline(x, y, s=1)
xnew = np.linspace(-4, 4, 1000)
plt.plot(x, y, '.',label='data points')
plt.plot(xnew, s(xnew),'g',label='spline interpolated')
plt.ylim(-1.5,2.0)
plt.legend()
plt.show()    

# Fast Fourier Transform (FFT)

A fast Fourier transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and its inverse. The fft function in numpy computes the 1-dimensional fourier transform of n discrete points.

Let us compute the Fourier inverse of $f(t)=2cos(t)$.
\begin{equation}
f(t) = \int_{-\infty}^{\infty} f(t)e^{-2 \pi i t}
\end{equation}

In [None]:
t = np.arange(100)
sp = np.fft.fft(np.cos(2*t))
freq = np.fft.fftfreq(t.shape[0])
plt.plot(freq, sp.real,label='real') # frequency bins
plt.plot(freq,sp.imag,label='imaginary')
plt.legend()
plt.show()

In [None]:
# 2-dimensional FFT
a = np.mgrid[0:4, 0:4][0]
np.fft.fft2(a)

# Curve Fitting

Curve fitting is very commom when dealing with astronomical data. The most popular fiiting algorithm is least squares, but we are going to get acquainted to other ways as well.

In [None]:
x = np.arange(10)
y = np.array([0.2,0.3,0.5,0.8,0.95,0.9,0.7,0.5,0.3,0.1])
# constructing polynomial
z = np.polyfit(x,y,3)
p = np.poly1d(z)

#generting new x's and y's
xnew = np.linspace(x[0],x[-1],30)
ynew = p(xnew)

plt.plot(x,y,'go')
plt.plot(xnew,ynew,'r-')
plt.show()

In [None]:
# using least square to fit a straight line
x = np.arange(10)
y = np.array([1.2,1.0,0.95,0.9,0.8,0.7,0.5,0.3,0.1,0.09])

A = np.vstack([x, np.ones(len(x))]).T
m, c = np.linalg.lstsq(A, y)[0]
plt.plot(x,y,'bo')
plt.plot(x,m*x+c,'r-')
plt.show()

Another way of fitting data is curve_fit found in scipy library.

In [None]:
from scipy.optimize import curve_fit

# Defining the function to be fit to the data
def fitFunc(x,m,c):
    return m*x + c


params, cov = curve_fit(fitFunc, x, y) # returns the fitting paramters

plt.plot(x,y,'go')
plt.plot(x,fitFunc(x,*params),'r-')
plt.show()

This cheat sheet walks us through the basics of numpy, scipy and matplotlib libraries. For more details, have a look at
http://docs.scipy.org/doc .

### COMPLETED!