Python for Machine Learning
========================
In this tutorial we will be exploring a few Python packages useful for Machine Learning such as NumPy, SciPy, Matplotlib, Pandas and Scikit-learn.

NumPy
-----
NumPy facilitates fast N-dimensional array creation, storage and manipulation. Here is the link to NumPy reference: http://docs.scipy.org/doc/numpy/user/index.html

### Array Creation ###

In [None]:
import numpy as np
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
b=np.random.rand(5,1)
c=np.zeros(shape = (5,2))
a.T

In [None]:
a[0:2,1:3]

### Products ###

In [None]:
a=np.ones((3,3))
b=np.random.rand(3,3)
c=a+b
c

In [None]:
a = np.ones( (3,2) )
b = np.array([1,2])
print('shape of a', a.shape)
print('shape of b', b.shape)

p = a.dot(b)
print('shape of p', p.shape)
print(p)

Matplotlib
----------
Visualization of data plays a key role in Machine Learning; 
Python's functionality for plotting data resides in the Matplotlib package.  

In [None]:
#%matplotlib inline
from matplotlib import pyplot as plt
x=np.linspace(0,20,200)
y1=np.exp(-0.1*x)*np.sin(x)
y2=np.exp(-0.3*x)*np.sin(x)
plt.plot(x,y1)
plt.plot(x,y2)
plt.title('Just enough!')
plt.show()

In [None]:
plt.plot(x,y1, label='original' ,linewidth=4, linestyle='-')
plt.plot(x,y2, label='predicted',linewidth=4, linestyle='--')
plt.xlabel('time in seconds',fontsize=12)
plt.ylabel(r'some important quantity',fontsize=12)
plt.title('This is better!',fontsize=20)
plt.legend()
plt.show()

SciPy
-----
* A collection of mathematical algorithms
* Gives Python similar capabilities as Matlab
* Many submodules are used for different domains
* We will see examples from `linalg` and `optimize` submodules
* For details: http://docs.scipy.org/doc/scipy/reference/tutorial/index.html


### `linalg`: Linear Algebra submodule ###
Linear algebra submodule provides several routines for matrix computations. For example to find the inverse of matrix $A$

$$
A = \left[\begin{array}{ccc} 
5 & 3 & 5\\
2 & 2 & 0\\
1 & 3 & 1
\end{array}\right]
$$ 

In [None]:
from scipy import linalg as la
A = np.array([
             [5,3,5], \
             [2,2,0], \
             [1,3,1]])
iA = la.inv(A)
print(iA)

Solving linear systems of equations
$$
Ax=b\\
$$
$$
\left[\begin{array}{ccc} 
5 & 3 & 5\\
2 & 2 & 0\\
1 & 3 & 1
\end{array}\right]
\left[\begin{array}{c} 
x_1 \\
x_2 \\
x_3 
\end{array}\right] 
=\left[\begin{array}{c} 
2 \\
5 \\
1
\end{array}\right]
$$ 


In [None]:
A = np.array([
             [5,3,5], \
             [2,2,0], \
             [1,3,1]])
b = np.array([ 2, 5, 1])
x = la.solve(A,b)
print('Solution:', x)
# x = la.inv(A).dot(b) # same result


Matrix Decomposition
$$
\left[\begin{array}{ccc} 
a_{11} & a_{12} & a_{13}\\
a_{21} & a_{22} & a_{23}\\
a_{31} & a_{32} & a_{33}
\end{array}\right] =
\left[\begin{array}{ccc} 
l_{11} & 0 & 0\\
l_{21} & l_{22} & 0\\
l_{31} & l_{32} & l_{33}
\end{array}\right]
\left[\begin{array}{ccc} 
u_{11} & u_{12} & u_{13}\\
0 & u_{22} & u_{23}\\
0 & 0 & u_{33}
\end{array}\right]
$$ 

In [None]:
p,l,u = la.lu(A,permute_l=False)
print('L = \n',l)
print('U = \n',u)

### `optimize`: Optimization submodule ###
`optimize` implements several optimization algorithms. Optimization is finding the minimum or maximum value of a function. In this demonstration we will find the minimum of the `Levy` function:
$$f(x,y)=\sin ^{2}\left(3\pi x\right)+\left(x-1\right)^{2}\left(1+\sin ^{2}\left(3\pi y\right)\right)
{\displaystyle +\left(y-1\right)^{2}\left(1+\sin ^{2}\left(2\pi y\right)\right)} +\left(y-1\right)^{2}\left(1+\sin ^{2}\left(2\pi y\right)\right)$$

In [None]:
def obj(x):
    f = (np.sin(3*np.pi*x[0]))**2  +\
        (x[0]-1)**2 * (1+(np.sin(3*np.pi*x[1]))**2) +\
        (x[1]-1)**2 * (1+(np.sin(2*np.pi*x[1]))**2)
    #f=x[0]**2 + x[1]**2
    return f

### Visualizing the objective function ###

In [None]:
# Just for the visualization
def obj1(x,y):
    f = (np.sin(3*np.pi*x))**2  +\
        (x-1)**2 * (1+(np.sin(3*np.pi*y))**2) +\
        (y-1)**2 * (1+(np.sin(2*np.pi*y))**2) 
    #f=x**2 + y**2
    return f

from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-10, 10, 0.3)
Y = np.arange(-10, 10, 0.3)
X, Y = np.meshgrid(X, Y)
Z = obj1(X,Y)
surf = ax.plot_surface(X, Y, Z, cmap=cm.jet, rstride=1, cstride=1 )
plt.show()

### Minimizing the objective function ###

In [None]:
# minimizing function
from scipy import optimize as opt
res = opt.minimize(obj, x0=[0.85,1.2], method='nelder-mead', options={'maxfev':1e6, 'maxiter':1e6} )
print('Minimum value: ',res.fun)
print('At x: ',res.x)
print('Analitical global minimum is at x = [1, 1] with value 0')

Pandas
------

`pandas` provides easy-to-use data structures and data analysis tools for Python.
A good reference for Pandas is the cookbook available at: http://pandas.pydata.org/pandas-docs/stable/cookbook.html

The design matrix contains features as columns and examples as rows. In `pandas` jargon the design matrix is called a data frame; the examples are called series.

$$
D=\begin{pmatrix}
  &length & width & \cdots & type  \\
  S_1&80 & 25 & \cdots & 0 \\
  S_2&130 & 65 & \cdots & 1 \\
  \vdots&\vdots  & \vdots  & \ddots & \vdots  \\
  S_m&110 & 29 & \cdots & 0 
 \end{pmatrix}
$$

### Data Frame Creation ###

In [None]:
import pandas as pd
from matplotlib import pyplot as plt
url='http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
df=pd.read_csv(url)
#df = pd.DataFrame(data,columns=['x','y','C'])
#df=pd.read_csv('iris.data')
df.columns=['sepal_length','sepal_width','petal_length','petal_width','flower_type']
df['flower_type'] = df['flower_type'].astype('category')
df.flower_type = df.flower_type.cat.rename_categories([0,1,2])

### Basic Analysis ###

In [None]:
df.head()

In [None]:
df.dtypes

In [None]:
df.describe()

In [None]:
df['flower_type'].describe()

### Data Frame Visualization ###

In [None]:
df.hist()
plt.show()

In [None]:
pd.scatter_matrix(df, diagonal='kde')
plt.show()

### Operations on the Data Frame ###

In [None]:
df = df.sort_values(by='sepal_width')
df.head()

In [None]:
# Normalizing your data set
df=df.ix[:,0:4].apply( lambda f: ( f - f.mean() )/( f.max() - f.min() ) )

df.hist()
plt.show()

In [None]:
# Get a random sample from the data set
df=df.sample(frac=1.0)
df.head()

In [None]:
# Split the data set into test and train set
train=df.sample(frac=0.8,random_state=123)
test=df.drop(train.index)

### Read/Write ###

In [None]:
df.to_csv('iris_normalized.csv')
new_df = pd.read_csv('iris_normalized.csv')

scikit-learn
-------------

A level above SciPy is Scikit-learn that implements many classification, regression and clustering algorithms. For details: http://scikit-learn.org/stable/tutorial/basic/tutorial.html

### Import the dataset ###

In [None]:
from sklearn import svm
from sklearn import datasets
iris = datasets.load_iris()
X, y = iris.data, iris.target


### Train the classifier ###

In [None]:
clf = svm.SVC()
clf.fit(X, y) 

### Make prediction ### 

In [None]:
clf.predict(iris.data[range(0,150,25)])

In [None]:
iris.target[range(0,150,25)]