In [7]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import math
import torch.nn
import torch.optim
import torch.utils.data

# Problem 1

Use `np.histogram` to calculate the [probability density](https://en.wikipedia.org/wiki/Probability_density_function) that values in an arbitrary input data array fall within user-specified bins. Hint: `np.histogram` does all the work for you with the correct arguments.

In [2]:
def estimate_probability_density(data, bins):
    """Estimate the probability density of arbitrary data.
    
    Parameters
    ----------
    data : array
        1D numpy array of random values.
    bins : array
        1D numpy array of N+1 bin edges to use. Must be increasing.

    Returns
    -------
    array
        1D numpy array of N probability densities.
    """
    assert np.all(np.diff(bins) > 0) 

    probabilities, bins_edge = np.histogram(data,bins,density=True) #Find prob dens & bin edge of each bin
    return probabilities #Return prob dens store as rho

    raise NotImplementedError() #How does this work? 

In [3]:
# A correct solution should pass these tests.
generator = np.random.RandomState(seed=123)
data = generator.uniform(size=100)
bins = np.linspace(0., 1., 11)
rho = estimate_probability_density(data, bins)
assert np.allclose(rho, [ 0.6,  0.8,  0.7,  1.7,  1.1,  1.3,  1.6,  0.9,  0.8,  0.5]) #measure
data = generator.uniform(size=1000)
bins = np.linspace(0., 1., 101)
rho = estimate_probability_density(data, bins)
dx = bins[1] - bins[0]
assert np.allclose(dx * rho.sum(), 1.)

# Problem 2

Define a function to calculate the [entropy](https://en.wikipedia.org/wiki/Entropy_estimation) $H(\rho)$ of a binned probability density, defined as:

$$H(\rho) \equiv -\sum_i \rho_i \log(\rho_i) \Delta w_i$$

where $\rho_i$ is the binned density in bin $i$ with width $w_i$.

In [4]:
def binned_entropy(rho, bins):
    """Calculate the binned entropy.
    
    Parameters
    ----------
    rho : array
        1D numpy array of densities, e.g., calculated by the previous function.
    bins : array
        1D numpy array of N+1 bin edges to use. Must be increasing.

    Returns
    -------
    float
        Value of the binned entropy.
    """
    assert np.all(np.diff(bins) > 0)
    
    width_i = np.diff(bins) #Array w/ width of each bin
    H_entropy = -(np.sum(rho*np.log(rho)*width_i)) #Calculate entropy
    
    return H_entropy #Return calculated entropy value

    raise NotImplementedError()

In [5]:
# A correct solution should pass these tests.
generator = np.random.RandomState(seed=123)
data1 = generator.uniform(size=10000)
data2 = generator.uniform(size=10000) ** 4
bins = np.linspace(0., 1., 11)
rho1 = estimate_probability_density(data1, bins)
rho2 = estimate_probability_density(data2, bins)
H1 = binned_entropy(rho1, bins)
H2 = binned_entropy(rho2, bins)
assert np.allclose(H1, -0.000801544)
assert np.allclose(H2, -0.699349908)

# Problem 3

We're going to implement a network that will do multi-category classification.  We'll base this on the tools we developed in the [NeuralNetworks2](https://git.ucsc.edu/mhance/phys152/-/blob/master/Notebooks/NeuralNetworks2.ipynb) notebook from class.

First let's load in the data file.  You can find the data file in the HW area of my gitlab repo: [HW2_data.h5](https://git.ucsc.edu/mhance/phys152/-/blob/master/HW/HW2_data.h5)

In [20]:
import h5py
h5f = h5py.File('HW2_data.h5','r')
x_train= h5f['x_train'][:]
y_train= h5f['y_train'][:]
x_test= h5f['x_test'][:]
y_test= h5f['y_test'][:]
h5f.close()

torch.manual_seed(123)

<torch._C.Generator at 0x1279a0e10>

In [21]:
x_train = torch.Tensor(x_train) #ndarray to tensor
y_train = torch.Tensor(y_train) #ndarray to tensor
x_test = torch.Tensor(x_test) #ndarray to tensor
y_test = torch.Tensor(y_test) #ndarray to tensor

In [22]:
y_train.type

<function Tensor.type>

In [23]:
net = torch.nn.Sequential(
    torch.nn.Linear(4, 20),
    torch.nn.ReLU(),
    torch.nn.Linear(20, 25),
    torch.nn.ReLU(),
    torch.nn.Linear(25, 3),
) 

torch.save(net.state_dict(), 'net_HW2.pth')

In [24]:
loss_fn = torch.nn.CrossEntropyLoss() #Define Loss Function
optimizer = torch.optim.Adam(net.parameters(), lr=0.1) #Define Adam Optimizer

In [25]:
xy_train = torch.utils.data.TensorDataset(x_train, y_train)
#loader = torch.utils.data.DataLoader(xy_train, batch_size=5, shuffle=True)

In [16]:
net.load_state_dict(torch.load('net_HW2.pth'))
loader = torch.utils.data.DataLoader(xy_train, batch_size=200, shuffle=True) 
losses = [] #stores loss values for training data
losses_t = [] #stores loss values for test data

net.train() #train in batches of 5
for epoch in range(1000):
    #net.train()
    for x_batch, y_batch in loader:
        y_batch = y_batch.type(torch.LongTensor)
        y_pred = net(x_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    losses.append(loss.data)
    
    net.eval() #evaluate using current weight
    y_test = y_test.type(torch.LongTensor)
    y_pred = net(x_test)
    losses_t.append(loss_fn(y_pred, y_test).data)
    
plt.plot(losses, '.', label="TRAIN (B=200,L.R.=0.1)")
plt.plot(losses_t, '.',label="TEST")
plt.legend()
plt.xlabel('Training Epoch')
plt.ylabel('CrossEntropyLoss')
plt.title("Loss vs Epoch")
plt.yscale('log')

NameError: name 'net' is not defined

In [12]:
import pandas as pd
prediction = net(torch.Tensor(x_test)) #Use trained network to calculate outputs
pairs = list((zip(torch.argmax(prediction,dim=1).numpy(),y_test))) 
#argmax find index of the largest number in each tensor. Since we have class 0,1,2, index=class
#Create pairs. Format: Pred Data,True Data

table = np.zeros((3,3)) #Create a 3x3 table of zero
for pair in pairs: #Add one to their respective element when their respective pair is identify
    if pair == tuple([0,0]):
        table[0,0] += 1
    elif pair == tuple([1,0]):
        table[1,0] += 1
    elif pair == tuple([2,0]):
        table[2,0] += 1
    elif pair == tuple([0,1]):
        table[1,0] += 1
    elif pair == tuple([1,1]):
        table[1,1] += 1
    elif pair == tuple([2,1]):
        table[2,1] += 1
    elif pair == tuple([0,2]):
        table[0,2] += 1
    elif pair == tuple([1,2]):
        table[1,2] += 1
    elif pair == tuple([2,2]):
        table[2,2] += 1

df = pd.DataFrame(table,columns=[0,1,2],index=[0,1,2]) #Create DataFrame for Confusion Matrx
Acc = (df[0][0]+df[1][1]+df[2][2])/(df.to_numpy().sum())
Err_Rate = (df[0][1]+df[0][2]+df[1][0]+df[1][2]+df[2][0]+df[2][1])/(df.to_numpy().sum())
print("Confusion Matrix \nX-Axis = True Value \nY-Axis = Predicted Value \nAccuracy = {0} \nError Rate = {1}".format(Acc,Err_Rate)) #Print axis label
df

Confusion Matrix 
X-Axis = True Value 
Y-Axis = Predicted Value 
Accuracy = 0.78 
Error Rate = 0.22


Unnamed: 0,0,1,2
0,32.0,0.0,4.0
1,4.0,24.0,7.0
2,2.0,5.0,22.0


Now the fun part.  Let's construct a similar network to what we used in class: 
- 2 hidden layers, with 20 and 25 nodes respectively, using ReLU activation functions.  
- The number of input layers should correspond to the number of features in the training/test data, and the number of output layers should correspond to the number of categories in the training/test data.  In this case, the `y_train` and `y_test` lists will show the category label as a number between `0` and `C`, where `C` is the total number of categories.  You should inspect the training/test data to figure out the number of features and the number of categories.
- For the loss function, we should use `torch.nn.CrossEntropyLoss()`.  This implements the `softmax` activation function on the output layer that we'll use for multi-category classification.
- Let's use the Adam optimizer, starting with a learning rate of 0.1
- Break the training data into 5 batches and run for 1000 epochs

Then you can train the network!  Keep track of the loss values for the training and test data, and plot them vs the epoch number after the training is completed.

The predictions of the network will be a set of `C` numbers for each event, corresponding to the probability that the event is classified as each of the `C` different categories.  We should take the maximum value for each set of `C` numbers to determine the category label.  So if the output for event 0 looks like `[0.8, 0.6, 0.95, 0.10]` and the categories are numbered `0..3` then the predicted label would be `2`.

Once the network is trained, let's inspect the output.  Using the predictions for the training data, construct a "confusion matrix", showing the true labels on one axis and the predicted labels on the other.

I can offer more clarifications on this as the HW period goes on, so feel free to ask questions as you start on this one!