# Mathematical Principals for Data Science

Today we go over essential mathematics used in Data Science. This notebook includes example applications for the concepts reviewed in the DSGT lecture presentation. 

Areas of Focus: 
- Statistics and Probability 
- Linear Algebra
- Calculus     

<img src="https://media.licdn.com/dms/image/C4E0BAQGZ-7dAEaqmCg/company-logo_200_200/0?e=2159024400&v=beta&t=-9_7r8w3C8umvoQ8-67w1FcfzHdGQympxHup_2CPof8" style="height:100px">  

## Load Libraries and Datasets

In [None]:
#Import all libraries needed:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt

#Datasets and preprocessing:
from sklearn import datasets
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler

#Principal Component Analysis
from sklearn.decomposition import PCA

#Neural Network example:
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report,confusion_matrix

#Supress Warnings
import warnings
warnings.filterwarnings('ignore')

## Statistics and Probability

In [None]:
#Loading the Boston House Price dataset
boston = load_boston()

In [None]:
#Converting the data into a dataframe
bostonDF = pd.DataFrame(data = boston.data, columns=boston.feature_names)

In [None]:
#Creating a price column and adding to dataframe
bostonDF["Price"] = boston.target

In [None]:
#        - CRIM     per capita crime rate by town
#        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
#        - INDUS    proportion of non-retail business acres per town
#        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
#        - NOX      nitric oxides concentration (parts per 10 million)
#        - RM       average number of rooms per dwelling
#        - AGE      proportion of owner-occupied units built prior to 1940
#        - DIS      weighted distances to five Boston employment centres
#        - RAD      index of accessibility to radial highways
#        - TAX      full-value property-tax rate per $10,000
#        - PTRATIO  pupil-teacher ratio by town
#        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
#        - LSTAT    % lower status of the population
#        - MEDV     Median value of owner-occupied homes in $1000's
bostonDF.head()

Let's start by looking at the distribution for price data

In [None]:
sb.distplot(bostonDF['Price'])

Let's try and take a look and see which variables correlate to price

In [None]:
sb.scatterplot(bostonDF['Price'], bostonDF['LSTAT'])

Create a scatterplot in this cell to show the relationship between Average Rooms and Home Price


#### Time to run some basic probability calculations. 

What is the probability that a region has more than 6 rooms per home on average?

In [None]:
totalHomes = len(bostonDF)
largeHomes = len(bostonDF[bostonDF["RM"] > 6.0])
print(largeHomes/totalHomes)

#### What about conditional probability? 

What's the probability that a home has more than 6 rooms given that the price is over 25K?

In [None]:
expensiveHomes = bostonDF[bostonDF["Price"] > 25]
totalExpensiveHomes = len(expensiveHomes)
largeExpensiveHomes = len(expensiveHomes[bostonDF["RM"] > 6.0])
print(largeExpensiveHomes / totalExpensiveHomes)

Calculate the probability that a home has more than 6 rooms given that the LSTAT is > 10%

Well doing this is a pain... can we do it faster?

In [None]:
sb.heatmap(bostonDF.corr())

## Linear algebra basics with numpy:
##### Some examples for vectors

In [None]:
# Create two arrays:
x = np.array([1,2,3])
y = np.array([4,5,6])

# Scalar operations:
print(10*x)

# Elementwise operations:
# Addition:
print(x+y)

# Subtraction:
print(x-y)

# Division:
print(x/y)

# Multiplication (Hadamard product):
print(x*y)

#Dot Product:
print(np.dot(y,x))

##### Some examples for Matrices

In [None]:
# Matrices:
a = np.array([
    [1,2,3],
    [4,5,6]
])

b = np.array([[1,2,3]])

c = np.array([
    [6,5,4],
    [3,2,1]
])

d = np.array([
    [1,2],
    [3,4]
])
# Check sizes:
print(a.shape)
print(b.shape)
print(c.shape)
print(d.shape)

# Scalar operations:
print(a+1)

# Elementwise operations:
# Addition:
print(a+c)

# Subtraction:
print(a-c)

# Multiplication (Hadamard product):
print(a*c)

# Matrix multiplication (not elementwise):
print(np.dot(d,a))

# Transpose:
print(d.T)

# Inverse:
print(np.linalg.inv(d))
print(np.dot(np.linalg.inv(d),d)) #Check to see if product is I

## Principal Component Analysis:

In [None]:
#Display iris data:
iris = datasets.load_iris()
x = iris.data
x = StandardScaler().fit_transform(x)
y = iris.target
df = pd.DataFrame(data = iris.data, columns=iris.feature_names)
df.head()

As shown below, there are many ways to visualize the iris dataset:
<img src="Resource/Iris_dataset_scatterplot.png" width="500">

PCA can help simplify things by reducing the dimensions of the data

In [None]:
pca = PCA(n_components = 2)
components = pca.fit_transform(x)
#Show principal components in a table:
PC = pd.DataFrame(data = components, columns = ['PC 1', 'PC 2'])
PC.head()

In [None]:
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('PCA for first and second components', fontsize = 20)

for name, label in [('Setosa', 0), ('Versicolour', 1), ('Virginica', 2)]:
    ax.text(components[y == label, 0].mean(),
              components[y == label, 1].mean(), name,
              horizontalalignment='center',
              bbox=dict(alpha=.5, edgecolor='w', facecolor='w'))
ax.scatter(components[:,0],
            components[:,1],
            c = y,
            s = 50)
ax.grid()

## Neural Network:

Neural networks are computational models where many simple units work in parallel and updating the weights between these units helps the network learn new information.

Components of a neural network:
- Neurons
- Layers
- Connections

<img src="Resource/nn_Patterson_Gibson_DeepLearning.png" width="500">

An equation to model this this is: X*w + b = y, where X is a feature matrix, w is the vector of weights, b is a bias and y is an output.

A perceptron is a linear-model binary classifier with a simple input/output relationship. A number of inputs is given, they are summed after being given their associated weights, and an output is determined.
Similarly, a multilayer perceptron is a network with one or more hidden layers.

Below, we use sklearn's multi-layer perceptron classifier to attempt to classify a species of iris with given petal/sepal measurements based on our iris dataset:

In [None]:
#Split into training and testing sets:
x_train, x_test, y_train, y_test = train_test_split(x, y)

#Train a Multi-Layer Perceptron Classifier (Neural Network):
mlp = MLPClassifier()
mlp.fit(x_train,y_train)

### Points of interest:
- Notice the solver used is 'adam'. This is a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba. What if we changed to just using stochastic gradient decent ('sgd')?
- There is one hidden layer, what happens when you change the size? How about adding more layers?
- How does changing the type of solver for weight optimization effect the number of iterations needed to converge (you might need to turn on warnings to see what happens)?

### Evaluating our model:
Below are evaluations of our test results.

An explanation of each measurement (Let TP, TN, FP, and FN correspond to True/False Negatives/Positives):

Accuracy: The degree of closeness of the predicted to the true value
- Accuracy = (TP+TN)/(TP+FP+FN+TN)

Precision: The degree to which repeated predictions under the same conditions give the same results
- Precision = TP/(TP+FP)

Recall (TP rate, Sensitivity): How well the model avoids false negatives
- Recall = TP/(TP+FN)

F1: measure of the model's accuracy using precision and recall measures
- F1 = 2TP/(2TP+FP+FN)

In [None]:
predictions = mlp.predict(x_test)
print(classification_report(y_test,predictions))

### Confusion Matrix:

In [None]:
cm = confusion_matrix(y_test,predictions)
accuracy = accuracy_score(y_test, predictions)

plt.figure(figsize=(9,9))
sb.heatmap(cm, annot=True, fmt=".0f", linewidth=0.5, square=True, cmap="Blues_r")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.title("Accuracy Score: {0}%".format(round(accuracy * 100, 2)), size=15)