## Vector and Matrix Calculus

Let $w = [w_1 , w_2 , w_3 ]^T$ . \\
Represent the following functions in the form of $w^T Aw$. \\
(i) $g(w) = 5w_{12} + w_2 + 5w_{32} + 4w_1w_2 − 8w_1w_3 − 4w_2w_3$ \\
(ii) $g(w) = 3w_{12} + w_2 + 5w_{32} + 4w_1w_2 − 6w_1w_3 − 4w_2w_3$ \\
Find the Hessian of g(w). 

Approach: \\
* extract A from $g(w)$
* Then recall : \\
$\frac{\partial ^2 w^TAw}{\partial w^2} = A + A^T$

Concepts: \\

* Multi-variate systems
* Differentiation
* Dot products
* Hessians and Jacobians
* Other extremely useful concepts in linear algebra for Machine Learning - see reference

Useful intuitions on [multi-variate calculus](https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives) from Khan Academy. 

This [article](https://www.khanacademy.org/math/multivariable-calculus/applications-of-multivariable-derivatives/modal/a/quadratic-approximation) explains why we seek the Hessian in quadratic form: \\
$ax^2 + 2bxy + cy^2$ is represented in matrix form as \\
$z^TMz$ where $z = [x,y]$

## Lagrange 

Let $w = [w1, w2]^T$ . \\
$J(w) = 8w12 + 7w2 + 2w1w2 $ \\
Use Lagrange method to find the minimum of $J(w)$, subject to $h(w) = 2w1 +w2 −2 = 0$

Concepts: \\

* Optimisation
* With constraints
* Solving linear systems of equations


## Parzens - Density Estimation

In [0]:
import numpy as np
import matplotlib.pyplot as plt

#c=np.array([2, 2.5, 3, 1, 6])
c=np.array([-1, -1, 0, 0, 1])
#c=np.array([1, 2,3,4,5])
#c=np.array([-1, 0.5, -1, 0, 0.5])
x = np.zeros((1000))
p1 = np.zeros((1000))
p2 = np.zeros((1000))
p3 = np.zeros((1000))
p4 = np.zeros((1000))
p5 = np.zeros((1000))
p5 = np.zeros((1000))


for i in range(1000):
  x[i]=(i-500)/50
  p1[i]=np.exp(-(x[i]-c[0])**2/2)/np.sqrt(2*np.pi)
  p2[i]=np.exp(-(x[i]-c[1])**2/2)/np.sqrt(2*np.pi)
  p3[i]=np.exp(-(x[i]-c[2])**2/2)/np.sqrt(2*np.pi)
  p4[i]=np.exp(-(x[i]-c[3])**2/2)/np.sqrt(2*np.pi)
  p5[i]=np.exp(-(x[i]-c[4])**2/2)/np.sqrt(2*np.pi)

p=(p1+p2+p3+p4+p5)/5

In [0]:
plt.plot(x,p1, linestyle='dashed')
plt.plot(x,p2, linestyle='dashed')
plt.plot(x,p3, linestyle='dashed')
plt.plot(x,p4, linestyle='dashed')
plt.plot(x,p5, linestyle='dashed')
plt.plot(x,p, color='black')
plt.legend(['P1', 'P2', 'P3', 'P4', 'P5', 'P'], loc='upper left')
plt.xlabel('x')
plt.ylabel('Prob')

In [0]:
print(x[600:605])
print(p1[600:605])
print(p[600:605])

## Clustering

### Nearest Neighbours

Key thing to remember is the distance measure: \\
$d(x,x_k) = (x_1-x_{k1})^2 + (x_2-x_{k2})^2$ where $x = [x1, x2]^T$ and $x_k =[x_{k1}, x_{k2}]^T $

Calculate distance of new point and existing points. Choose the smallest distance. 

### k-NN 

Same as above but comparison against new point and only k-closest points. 

### k-means

Clusters have centroids. 
Online k-means update: $c_k^{new} = c_k^{old} + \eta(x_j-c_k^{old})$

# General Resources: 

## ML Theory: 

* Free copy of Bishop: https://www.microsoft.com/en-us/research/people/cmbishop/#!prml-book
* Intro to ML - Andrew Ng: https://www.coursera.org/learn/machine-learning

## Mathematics for ML: 
* https://mml-book.github.io/
* https://www.coursera.org/specializations/mathematics-machine-learning

## Probability and Stats
* An Introduction to Statistical Learning: http://www-bcf.usc.edu/~gareth/ISL/
* Khan Academy

## Linear Algebra
* Recommend you also work through the book by Strang https://www.amazon.co.uk/Introduction-Linear-Algebra-Gilbert-Strang/dp/0980232775/ref=dp_ob_title_bk \\
With video lectures here: \\
https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/
* Khan Academy: https://www.khanacademy.org/math/multivariable-calculus
