In [1]:
import numpy as np
import matplotlib.pyplot as plt

# Problem 1
## Dataset Generation

Write a function to **generate a training set** of size $m$
- randomly generate a weight vector $w \in \mathbb{R}^{10}$, normalize length
- generate a training set $\{(x_i , y_i)\}$ of size m
  - $x_i$: random vector in $\mathbb{R}^{10}$ from $\textbf{N}(0, I)$
  - $y_i$: $\{0, +1\}$ with $P[y = +1] = \sigma(w \cdot x_i)$ and $P[y = 0] = 1 - \sigma(w \cdot x_i)$

In [74]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def generate_data(m):
    # returns the true w as well as X, Y data

    w = np.random.rand(10)
    norm = np.linalg.norm(w)
    normalized_w = w / norm

    training_set = np.zeros((m, 11))
    
    for i in range(m):
        x = np.random.normal(0, 1, 10)
        sigma = sigmoid(np.dot(normalized_w, x))
        y_range = [0, 1]
        y_prob = [1-sigma, sigma]
        y = np.random.choice(a = y_range, p = y_prob)
        print(np.append(x, y))
        training_set[i] = np.append(x,y)
    return training_set

m_array = [50, 100, 150, 200, 250]

[ 0.19989964 -1.83914429 -0.45465115  1.61865369 -1.17998095 -0.53371301
  0.2239383  -0.58357109  0.55821271  1.11819061  0.        ]
[-0.89142469  0.39409405  0.04025024 -0.30972668 -0.52991956 -0.47736486
  0.23075343 -0.76914679 -0.33691911 -2.27781303  1.        ]
[-0.4485122  -1.36012714  1.21600698  1.16810022  1.38558693  0.13671733
  1.68092886  1.5917676  -0.23250359  0.32911066  1.        ]
[ 0.59131877 -2.09049138  0.18891728 -0.58902606  0.52466882  1.54016992
  0.83657185 -1.29084362  1.08230734  0.32400509  1.        ]
[ 0.41392346  0.76000167 -1.67098984  0.6888266   1.1105941  -0.22035751
  0.28614712 -0.03754744  1.44540026  1.09590197  1.        ]
[ 0.15496343  1.04188345 -0.58051138  1.20562373  0.59515086  1.64347527
 -0.1139459   1.7121362  -0.97157952 -1.96810163  1.        ]
[-0.09001957 -0.46575946  0.04621684  0.34583348  0.11648095 -0.21898948
 -2.03145783  0.72134005  0.43135723  1.5562934   1.        ]
[-0.08520929  1.84504769 -0.18384071  1.02266497 -0.050

## Algorithm 1: logistic regression

The goal is to learn $w$.  Algorithm 1 is logistic
  regression (you may use the built-in method LogisticRegression for this. Use max_iter=1000).

## Algorithm 2: gradient descent with square loss

Define square loss as
$$L_i(w^{(t)}) = \frac{1}{2} \left( \sigma(w^{(t)} \cdot x) - y_i \right)^2$$

  Algorithm 2 is
  gradient descent with respect to square loss (code this
  up yourself -- run for 1000 iterations, use step size eta = 0.01).

## Algorithm 3: stochastic gradient descent with square loss
Similar to gradient descent, except we use the gradient at a single random training point every iteration.

## Evaluation

Measure error $\|w - \hat{w}\|_2$ for each method at different sample size. For any
  fixed value of $m$, choose many different $w$'s and average the
  values $\|w - 
  \hat{w}\|_2$ for Algorithms 1, 2 and 3.  Plot the results
  for for each algorithm as you make $m$ large (use $m=50, 100, 150, 200, 250$).
  Also record, for each algorithm, the time taken to run the overall experiment.

# Problem 2

In [3]:
from sklearn import datasets

In [4]:
cancer = datasets.load_breast_cancer()

For each depth in $1, \dots, 5$, instantiate an AdaBoost classifier with the base learner set to be a decision tree of that depth (set `n_estimators=10` and `learning_rate=1`), and then record the 10-fold cross-validated error on the entire breast cancer data set. Plot the resulting curve of accuracy against base classifier depth. Use $101$ as your random state for both the base learner as well as the AdaBoost classifier every time.