# Homework Problem

$p(x)=a_nx^n+...+a_0$ and $q(x)=b_lx^l+...+b_0$.

Find $p(x)q(x)$.

$p(x)q(x)=\sum_{i=0}^na_i x^i*q(x)$

$a_i*q(x)$ can be computed using scaler multiplication

If $q(x)=[b_0,...b_l]$, then $x^iq(x)=[0,0,0,...,0,b_0,...,b_l]$, where there are $i$ zeros in front of $b_0$.

$x^2*(2x+3)=2x^3+3x^2$

coeff of $2x+3$ is $[3,2]$

coeff of $2x^3+3x^2=[0,0,3,2]$

Idea to finish the problem, find the coefficients of $a_ix^iq(x)$, use them to create a new polynomial $r_i$, and the sum up the polynomials $r_i$ to obtain the product of $p$ and $q$.

## Alternative Solution

The degree of $p(x)q(x)$ is $n+l$. Take the coefficients of $p(x)$ and $q(x)$ and extend them by zeros until they are length $n+l$. $[a_0,....,a_n,0,0,...,0]$. 

To find the coefficient of $x^i$, for $i$ between $0$ and $n$, we can use the formula $s_i=\sum_{j=0}^ia_{j}b_{i-j}$. the product polynomial is given by $\sum_{i=0}^{n+l}s_ix^i$.

Take $(2x^3+3x+1)(2x+3)$. The coefficient of the $x$ term, is given by $(2*1+3*3)=11$ according to the formula


# Central Limit Theorem

Suppose $x_n$ is drawn from a probability distribution with mean $\mu$ and standard deviation $\sigma$, and let

$y_n=\frac{\sum_{i=1}^nx_i}{n}$

In distribution $y_n$ approaches $N(\mu,\frac{\sigma}{\sqrt{n}})$ as $n\rightarrow \infty$.

## Problem
We can sample from the the $\chi_N^2$ distribution using np.random.chisquare(N,size). Using this function and various values  of $N$, attempt to determine the mean and standard deviation of $\chi_N^2$ using the central limit theorem.

In [17]:
import numpy as np
x=np.random.chisquare(10,(2000,2000))
y=np.sum(x,0)/2000

In [18]:
stddev=np.std(y)
mean=np.mean(y)

In [19]:
#This should be approximately  the mean of the chi squared 1 distribution
mean

10.001868539356256

In [16]:
#this should be approximately the standard deviation of the chi  squared 1 distribution
stddev*np.sqrt(2000)

2.8482740678914169

It turns out that the mean of $\chi_N^2$ is $N$. The standard deviation of $\chi_N^2$ is $\sqrt{2 N}$.

In [22]:
n=10000
for i in range(1,7):
    x=np.random.chisquare(i,(n,n))
    y=np.sum(x,0)/n
    print('sample mean', np.mean(y), 'actual mean', i)
    print('population standard deviation of chi square', np.sqrt(n)*np.std(y),
          'actual standard deviation', np.sqrt(2*i))

sample mean 0.999897054139 actual mean 1
population standard deviation of chi square 1.40599091804 actual standard deviation 1.41421356237
sample mean 1.99998558519 actual mean 2
population standard deviation of chi square 1.99670834727 actual standard deviation 2.0
sample mean 3.00034110708 actual mean 3
population standard deviation of chi square 2.45305683353 actual standard deviation 2.44948974278
sample mean 3.99955019058 actual mean 4
population standard deviation of chi square 2.81693504773 actual standard deviation 2.82842712475
sample mean 5.00004327468 actual mean 5
population standard deviation of chi square 3.15795031254 actual standard deviation 3.16227766017
sample mean 5.99962833144 actual mean 6
population standard deviation of chi square 3.49133901342 actual standard deviation 3.46410161514


# Function Optimization

Real life applications require the minimization of multi dimensional functions. for example in machine learning, the training a neural network involves choosing parameters that will minimize the error the network returns. 

# How to Minimize multidimensional functions

## Analytical approach

consider the points where $\nabla f=\langle \frac{\partial f}{\partial x_i}\rangle$ is $0$.

If the gradient is $0$, you are either at a local minimum, a local maximum, or a saddle point. If we want to find the minimum value of the function, we  can just plug in these values.

## Numerical Approach

We note that the gradient vector always points in the direction in which  $f$ is increasing the most. 

The negative gradient will point in the direction of decrease

Therefore, we obtain gradient descent

Algorithm

$\bf{x}\leftarrow \bf{x}-\eta \nabla f(\bf{x})$

where $\eta$ is some small number called the learning parameter

In [23]:
f=lambda x,y,z: x*x+y*y+z*z+2*y+z+3*x

In [24]:
dfdx=lambda x,y,z: 2*x+3
dfdy=lambda x,y,z: 2*y+2
dfdz=lambda x,y,z: 2*z+1

In [26]:
#initialize randomly
x,y,z=.3,.3,.3

eta=.001
x_vals=[]
y_vals=[]
z_vals=[]
f_vals=[]

num_steps=20000
for i in range(num_steps):
    x=x-eta*dfdx(x,y,z)
    y=y-eta*dfdy(x,y,z)
    z=z-eta*dfdz(x,y,z)
    x_vals.append(x)
    y_vals.append(y)
    z_vals.append(z)
    f_vals.append(f(x,y,z))
    
print(x_vals[-1],y_vals[-1],z_vals[-1],f_vals[-1])




-1.4999999999999445 -0.9999999999999722 -0.4999999999999861 -3.4999999999999996
