# Machine Learning Problem Set #2: Discriminative and Generative Classifiers

In this problem set, you will explore discriminative and generative classifiers.

## Problem 1: Generate an interesting data set

In class, we considered a synthetic problem in which a training data set was sampled from two classes. In this problem, you will generate a data set with similar characteristics to the dataset discussed in class.

**Class 1:** Two features $x_1$ and $x_2$ jointly distributed as a two-dimensional spherical Gaussian with parameters

$$\mu = \begin{bmatrix} x_{1c} \\ x_{2c} \end{bmatrix},
\Sigma = \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_1^2 \end{bmatrix}.$$

**Class 2:** Two features $x_1$ and $x_2$ in which the data are generated by first sampling an angle $\theta$ according to a uniform distribution, sampling a distance $d$ according to a one-dimensional Gaussian with a mean of $(3\sigma_1)^2$ and a variance of $(\frac{1}{2}\sigma_1)^2$, then outputting the point $$\textbf{x} = \begin{bmatrix} x_{1c} + d \cos\theta \\ x_{2c} + d \sin\theta \end{bmatrix}$$.

Place your code to generate 100 samples from the each of the classes and plot them in the cell below.


In [15]:
import numpy as np
import matplotlib.pyplot as plt
sig1 = 1

# mean = [2,2]
# cov=[[1,0],[0,1]]
# mean2 = [9,9]
# cov2 = [[0.25,0], [0,0.25]]

# t=360

# c1x1 ,c1x2 = np.random.multivariate_normal(mean, cov, 100 ).T
# c2x1, c2x2 = np.random.multivariate_normal(mean, cov2, 100 ).T
# c2x1 += *np.cos(2*np.pi * t)
# c2x2 += 2*np.sin(2*np.pi * t)

# plt.plot(c1x1, c1x2, 'r.')

# plt.plot(c2x1, c2x2, 'b.')
# plt.show()

mu_1 = np.array([1.0, 3.0])
sigma_1 = 1
num_sample = 100
cov_mat = np.matrix([[sigma_1,0],[0,sigma_1]])
X1 = np.random.multivariate_normal(mean= mu_1, cov=cov_mat, size = num_sample)

angle = np.random.uniform(0, 2*np.pi, num_sample)
d =  np.random.normal(np.square(3*sigma_1),np.square(.5*sigma_1),num_sample)

X2 = np.matrix([X1[:,0] + d*np.cos(angle), X1[:,1] + d*np.sin(angle)]).T

## Problem 2: Discriminative classification with logistic regression

Split the dataset into 80 patterns per class for training and 20 patterns per class for
validation/testing.

Perform three experiments with logistic regression: batch gradient ascent on the log likelihood,
stochastic gradient ascent on the log likelihood, and batch Newton's method.

For each method, plot log likelihood and classification accuracy on the test set and
training set as a function of iteration (one batch or pass through the training set per iteration).

After showing your code and results, in the cell(s) below, briefly discuss your results. Are all
three methods converging to the same maximum? Is the Hessian always invertible? Plot the test data
and decision boundary for at least one of the solutions.

In [26]:
X = np.r_[X1,X2]
Y = np.array([np.r_[np.zeros(len(X1)), np.ones(len(X2))]]).T
X.shape
# Y

(200, 2)

In [30]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# train = np.c_[X_train, y_train]
# train.shape

(160, 3)

## Problem 3: Transformation of the feature space

Perform a polar transform on the data and re-run one of your logistic regression models (whichever you prefer).
As before, plot the log likelihood and accuracy on the training set and test set as a function of iteration.

Comment on your results. Obviously, this is an example contrived to illustrate the importance of data representation
and how much of a difference a simple transformation of the feature space can make. But can you think of some
real world problems that would be similarly difficult?


## Problem 4: Derive maximum likelihood parameter estimation method

We already know that the maximum likelihood estimates of the mean and covariance of the
Gaussian-distributed class are the mean and covariance of the sample. But let's derive
a maximum likelihood estimator for the parameters of the second class.

In class, we outlined a procedure for estimating the parameters $x_{1c}$, $x_{2c}$, $r$, and $\sigma$ of
a generative model for points in the annulus shaped class. Complete the exercise. What are the maximum likelihood
estimates of the four parameters for a particular data set?


## Problem 5: Generative classifier

Based on the parameter estimation method you derived in Problem 4, build a maximum a posteriori classifier for the data based on the generative model
$$p(y \mid \textbf{x}) \propto p(\textbf{x} \mid y) p(y).$$
Show your results and compare to the results of Problems 2 and 3. Which approach is best for this data set?