<div style="text-align:center"><span style="font-size:2em; font-weight: bold;">Lecture 10—Neural nets</span></div>

# Linear algebra: Mathematics

Basic model:
$$\hat y=F\left(Zw_z\right)$$
$$Z = G\left(XW_x\right)$$

Common linking function pairs:
Identity:
$$F(x)=x$$
Relu:
$$G(x)=\max(x,0)$$

Softmax:
$$F(x)=\frac{1}{1+e^{-x}}$$
Relu:
$$G(x)=\max(x,0)$$

Minimization problem:
$$\min_{w_z,W_x}\frac{1}{n}\sum_{i=1}^n (y_i-\hat y_i)^2$$

# Data science: Programming neural nets

In [3]:
import numpy as np
import pandas as pd
from scipy.stats import iqr
from scipy.stats import zscore
from itertools import product

relu = lambda x: np.maximum(x,0)
softmax = lambda x: np.exp(x-np.log(np.exp(x).sum(1).reshape(-1,1)))
identity = lambda x: x

drelu = lambda x: (x>0).astype(int)
dsoftmax = lambda x: softmax(x)*(1-softmax(x))
didentity = lambda x: np.ones(x.shape)

In [4]:
def generate_layers(x,r_vec,linking_function):
    z = x.copy()
    layers = []
    weights = []
    for i in range(len(r_vec)):
        xmat = np.hstack([np.ones((z.shape[0],1)),z])
        w = np.random.uniform(size=(xmat.shape[1],r_vec[i]))-0.5
        z = linking_function(xmat@w)
        weights += [w]
        layers += [z]
    return layers,weights
def generate_mvt_normal(n,r,means):
    if r == 1: return np.random.normal(loc=means,size=(n,1))
    P = np.array([[1,1]])
    for i in range(2,r):
        ones = np.ones([i,1])
        zeros = np.zeros([P.shape[0],1])
        ident = np.eye(i)
        upper = np.hstack([ones,ident])
        lower = np.hstack([zeros,P])
        P = np.vstack([upper,lower])
    covariates = np.random.normal(size=(n,P.shape[0]))
    idiosyncratics = np.random.normal(loc=means,size=(n,P.shape[1]))
    covariate_loadings = np.random.uniform(size=P.shape[0])*3-1
    return covariates@np.diagflat(covariate_loadings)@P+idiosyncratics
def calc_layers(x,weights,linking_function):
    z = x.copy()
    layers = []
    for i in weights:
        xmat = np.hstack([np.ones((z.shape[0],1)),z])
        z = linking_function(xmat@i)
        layers += [z]
    return layers
def calc_gradient(x,y,layers,weights,dlinking_funcs):
    inside = np.diagflat((y-layers[-1]).reshape(-1))
    wmat = np.ones((1,1))
    gradvec = []
    for i in range(len(weights)-1,-1,-1):
        w = weights[i]
        link = dlinking_funcs[i]
        layer = layers[i-1] if i-1>=0 else x
        xmat = np.hstack([np.ones((layer.shape[0],1)),layer])
        result1 = link(xmat@w)
        result2 = result1@wmat
        wmat = w[1:,:]@wmat
        grad = -2*xmat.T@inside@result1/y.shape[0]
        inside = inside@np.diagflat(result2.reshape(-1))
        gradvec = [grad]+gradvec
    return gradvec


In [5]:
# Build a random neural network
# Data generating process

r_x = 3
r_e = 1
r_xe = r_x+r_e
r_z = (20,10,5)
r_y = 1
n = 10000

x = zscore(generate_mvt_normal(n,r_x,np.random.uniform(size=r_x)))
e = zscore(generate_mvt_normal(n,r_e,np.zeros(r_e)))
xe = np.hstack([x,e])
layers,weights = generate_layers(xe,r_z,relu)
fin_layer,fin_weight = generate_layers(layers[-1],(r_y,),identity)
y = zscore(fin_layer[0])
y

array([[ 0.52136691],
       [ 0.5710091 ],
       [ 0.1962905 ],
       ...,
       [-0.24479498],
       [-1.16471404],
       [ 0.0525731 ]])

In [6]:
# Build an estimated neural network
# Initial guess

r_ez = (10,)

est_layers,est_weights = generate_layers(x,r_ez,relu)
est_fin_layer,est_fin_weight = generate_layers(est_layers[-1],(r_y,),identity)
mspe = ((y-est_fin_layer[0])**2).mean()
print(-1,np.sqrt(mspe))

-1 1.1887537305548945


In [7]:
# Gradient descent

learning_rate = 0.01
iterations = 20

for j in range(iterations):
    grad = calc_gradient(x,y,
                  est_layers+est_fin_layer,
                  list(est_weights)+list(est_fin_weight),
                  [drelu]*len(est_weights)+[didentity])
    for i in range(len(grad)-1):
        est_weights[i] -= grad[i]*learning_rate
    est_fin_weight -= grad[-1]*learning_rate
    est_layers = calc_layers(x,est_weights,relu)
    est_fin_layer = calc_layers(est_layers[-1],est_fin_weight,identity)
    mspe = ((y-est_fin_layer[0])**2).mean()
    print(j,np.sqrt(mspe))

0 1.1755167847320904
1 1.1635780448459108
2 1.152844203439707
3 1.1431157385936455
4 1.134274580682947
5 1.1262339681273814
6 1.1188908959544517
7 1.1121031514673048
8 1.1058826985460721
9 1.1001716421789995
10 1.0949323131637885


KeyboardInterrupt: 