## Mini Project_00: 1/0

In [None]:
import numpy as np
from keras.utils import np_utils
import tensorflow as tf
'''
from tensorflow.python.ops import control_flow_ops
tensorflow.python.ops.control_flow_ops = tf #????????????????????????????????????????
'''

In [None]:
# Set random seed
np.random.seed(42)
# Our data
X = np.array([[0,0],[0,1],[1,0],[1,1]]).astype('float32')
y = np.array([[0],[1],[1],[0]]).astype('float32') 

In [None]:
# One-hot encoding the output
y = np_utils.to_categorical(y); y

In [None]:
# Initial Setup for Keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.optimizers import SGD
from keras.utils import np_utils

In [None]:
# Building the model
xor = Sequential()
xor.add(Dense(8, input_dim=2))
xor.add(Activation("sigmoid"))
xor.add(Dense(2))
xor.add(Activation("sigmoid"))

xor.compile(loss="categorical_crossentropy", optimizer="adam", metrics = ['accuracy'])

In [None]:
# print the model architecture
xor.summary()

In [None]:
# Fitting the model
history = xor.fit(X, y, nb_epoch=50, verbose=0)

# Scoring the model
score = xor.evaluate(X, y)
print("\nAccuracy: ", score[-1])

# Checking the predictions
print("\nPredictions:")
print(xor.predict_proba(X))

In [None]:
score

Out of 4 input points, we're correctly classifying only 3 of them? Let's try to change some parameters around to improve. For example, you can increase the number of epochs. Can you reach 100% ?

---
## Mini Project_01: Student Admissions
Build a neural network which analyzes the dataset of student admissions at UCLA 

we predict student admissions to graduate school at UCLA based on three pieces of data:
- GRE Scores (Test)
- GPA Scores (Grades)
- Class rank (1-4)

The dataset originally came from here: http://www.ats.ucla.edu/

## Loading the data
To load the data and format it nicely, we will use two very useful packages called Pandas and Numpy. You can read on the documentation here:
- https://pandas.pydata.org/pandas-docs/stable/
- https://docs.scipy.org/

In [None]:
# Importing pandas and numpy
import pandas as pd
import numpy as np

# Reading the csv file into a pandas DataFrame
data = pd.read_csv('student_data.csv')

# Printing out the first 10 rows of our data
data[:10]
# Here we can see that the first column is the label y, which corresponds to acceptance/rejection. Namely, a label of 1 means 
#the student got accepted, and a label of 0 means the student got rejected.

In [None]:
# First let's make a plot of our data to see how it looks. In order to have a 2D plot, let's ingore the rank.
import matplotlib.pyplot as plt
%matplotlib inline

# Function to help us plot
def plot_points(data):
    X = np.array(data[["gre","gpa"]])
    y = np.array(data["admit"])
    admitted = X[np.argwhere(y==1)]
    rejected = X[np.argwhere(y==0)]
    plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
    plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')
    plt.xlabel('Test (GRE)')
    plt.ylabel('Grades (GPA)')
    
# Plotting the points
plot_points(data)
plt.show()
# the data is not as nicely separable as we'd hope...

In [None]:
# Roughly, it looks like the students with high scores in the grades and test passed, while the ones with low scores didn't, 
#but the data is not as nicely separable as we hoped it would. Maybe it would help to take the rank into account? Let's make 
#4 plots, each one for each rank. So we make this one graph for each of the 4 ranks.

# Separating the ranks
data_rank1 = data[data["rank"]==1]
data_rank2 = data[data["rank"]==2]
data_rank3 = data[data["rank"]==3]
data_rank4 = data[data["rank"]==4]

# Plotting the graphs
#plot_points(data_rank1)
#plt.title("Rank 1")
#plt.show()
#plot_points(data_rank2)
#plt.title("Rank 2")
#plt.show()
#plot_points(data_rank3)
#plt.title("Rank 3")
#plt.show()
#plot_points(data_rank4)
#plt.title("Rank 4")
#plt.show()

fig = plt.figure(figsize = (11,8))

for i, rank in enumerate([data_rank1,data_rank2,data_rank3,data_rank4]):
    ax = fig.add_subplot(2, 2, i+1)
    plot_points(rank)
    ax.set_title('Rank-%s'%(i+1), fontsize = 14)
    
fig.tight_layout()
fig.show()


# This looks more promising, as it seems that the lower the rank, the higher the acceptance rate. Let's use the rank as one of
#our inputs. In order to do this, we should one-hot encode it.

## TODO: One-hot encoding the rank
 It seems like the better grades and test the student has, the more likely they are to be accepted. And the rank has something to do with it. So what we'll do is, we'll one-hot encode the rank, and our 6 input variables will be: 
 - Test (GPA)
 - Grades (GRE)
 - Rank 1
 - Rank 2
 - Rank 3
 - Rank 4
#### The last 4 inputs will be binary variables that have a value of 1 if the student has that rank, or 0 otherwise.

In [None]:
# Use the get_dummies function in numpy in order to one-hot encode the data.

# TODO:  Make dummy variables for rank
#one_hot_data = pd.concat([data, pd.get_dummies(data['rank'], prefix='rank')], axis=1)
one_hot_data = pd.get_dummies(data, columns=['rank'])

# TODO: Drop the previous rank column
#one_hot_data = one_hot_data.drop('rank', axis=1)

# Print the first 5 rows of our data
one_hot_data[:5]

## TODO: Scaling the data
The next step is to scale the data. So, first things first, let's notice that the **test scores** have a range of 800, while the **grades** have a range of 4. This is a huge discrepancy: We notice that the range for grades is 1.0-4.0, whereas the range for test scores is roughly 200-800, which is much larger. This means our data is skewed, and that makes it hard for a neural network to handle and it will affect our training. Normally, the best thing to do is to normalize the scores so **they are between 0 and 1**. Let's fit our two features into a range of 0-1, by dividing the grades by 4.0, and the test score by 800.

In [None]:
# Making a copy of our data
processed_data = one_hot_data[:]

# TODO: Scale the columns
processed_data['gre'] = processed_data['gre']/800
processed_data['gpa'] = processed_data['gpa']/4

# Printing the first 10 rows of our procesed data
processed_data[:5]

## Splitting the data into Training and Testing
In order to test our algorithm, we'll split the data into a Training and a Testing set. The size of the testing set will be 10% of the total data.

In [None]:
sample_90 = np.random.choice(processed_data.index, size=int(len(processed_data)*0.9), replace=False)
train_data, test_data = processed_data.iloc[sample_90], processed_data.drop(sample_90) 

print("Number of training samples is", len(train_data))
print("Number of testing samples is", len(test_data))
print(train_data[:5])
print(test_data[:5])

## Splitting the data into features and targets (labels)
Now, we split our data input into X, and the labels y , and one-hot encode the output, so it appears as two classes (accepted and not accepted). As a final step before the training, we'll split the data into features (X) and targets (y). Also, in Keras, we need to one-hot encode the output. We'll do this with the "`to_categorical function`".

In [None]:
#X = np.array(processed_data)[:,1:]
#y = np_utils.to_categorical(np.array(processed_data["admit"]))

# Separate data and one-hot encode the output
# Note: We're also turning the data into numpy arrays, in order to train the model in Keras
train_X = np.array(train_data.drop('admit', axis=1))
train_y = np.array(np_utils.to_categorical(train_data['admit'], 2))

test_X = np.array(test_data.drop('admit', axis=1))
test_y = np.array(np_utils.to_categorical(test_data['admit'], 2))

#train_X = train_data.drop('admit', axis=1)
#train_y = train_data['admit']

#test_X = test_data.drop('admit', axis=1)
#test_y = test_data['admit']

print(test_X[:5])
print(test_y[:5])

## Training the 2-layer Neural Network
The following function trains the 2-layer neural network. First, we'll write some helper functions.

In [None]:

'''
model = Sequential()
model.add(Dense(128, input_dim=6))
model.add(Activation('sigmoid'))
model.add(Dense(32))
model.add(Activation('sigmoid'))
model.add(Dense(2))
model.add(Activation('sigmoid'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
'''

# Building the model
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(6,)))
model.add(Dropout(.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(.1))
model.add(Dense(2, activation='sigmoid'))  ## why not softmax ??

# Compiling the model
model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

## Training and Scoring the model

In [None]:
# Training the model
model.fit(train_X, train_y, nb_epoch=200, batch_size=100, verbose=0)

# Evaluating the model on the training and testing set
score = model.evaluate(train_X, train_y)
print("\n Training Accuracy:", score[1])
print("\nPredictions:")
print("\nProb:", model.predict_proba(train_X)[:8])

score = model.evaluate(test_X, test_y)
print("\n Testing Accuracy:", score[1])
print("\nPredictions:")
print("\nProb:", model.predict_proba(test_X)[:8])

## Challenge: Play with the parameters!
You can see that we made several decisions in our training. For instance, the number of layers, the sizes of the layers, the number of epochs, etc.
It's your turn to play with parameters! Can you improve the accuracy? The following are other suggestions for these parameters. We'll learn the definitions later in the class:
- Activation function: relu and sigmoid
- Loss function: categorical_crossentropy, mean_squared_error
- Optimizer: rmsprop, adam, ada

# W/O keras
## TODO: Backpropagate the error
Now it's your turn to shine. Write the error term. Remember that this is given by the equation $$ -(y-\hat{y}) \sigma'(x) $$

In [None]:
# Activation (sigmoid) function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_prime(x):
    return sigmoid(x) * (1-sigmoid(x))
def error_formula(y, output):
    return - y*np.log(output) - (1 - y) * np.log(1-output)


# TODO: Write the error term formula
def error_term_formula(y, output):
    pass

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

# Training function
def train_nn(features, targets, epochs, learnrate):
    
    # Use to same seed to make debugging easier
    np.random.seed(42)

    n_records, n_features = features.shape
    last_loss = None

    # Initialize weights
    weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

    for e in range(epochs):
        del_w = np.zeros(weights.shape)
        for x, y in zip(features.values, targets):
            # Loop through all records, x is the input, y is the target

            # Activation of the output unit
            #   Notice we multiply the inputs and the weights here 
            #   rather than storing h as a separate variable 
            output = sigmoid(np.dot(x, weights))

            # The error, the target minus the network output
            error = error_formula(y, output)

            # The error term
            #   Notice we calulate f'(h) here instead of defining a separate
            #   sigmoid_prime function. This just makes it faster because we
            #   can re-use the result of the sigmoid function stored in
            #   the output variable
            error_term = error_term_formula(y, output)

            # The gradient descent step, the error times the gradient times the inputs
            del_w += error_term * x

        # Update the weights here. The learning rate times the 
        # change in weights, divided by the number of records to average
        weights += learnrate * del_w / n_records

        # Printing out the mean square error on the training set
        if e % (epochs / 10) == 0:
            out = sigmoid(np.dot(features, weights))
            loss = np.mean((out - targets) ** 2)
            print("Epoch:", e)
            if last_loss and last_loss < loss:
                print("Train loss: ", loss, "  WARNING - Loss Increasing")
            else:
                print("Train loss: ", loss)
            last_loss = loss
            print("=========")
    print("Finished training!")
    return weights
    
weights = train_nn(features, targets, epochs, learnrate)



# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

---
## Mini Project_02. Analyzing IMDB Data in Keras
we will analyze a dataset from IMDB and use it to predict the sentiment analysis of a review.

In [None]:
# Imports
import numpy as np
import keras
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.preprocessing.text import Tokenizer
import matplotlib.pyplot as plt
%matplotlib inline

np.random.seed(42)

## 1. Loading the data
This lab uses a dataset of 25,000 IMDB reviews. Each review, comes with a label. A label of 0 is given to a negative review, and a label of 1 is given to a positive review. The goal of this lab is to create a model that will predict the sentiment of a review, **based on the words** on it. 

Now, the input already comes preprocessed for us for convenience. Each review is encoded as a sequence of indexes, corresponding to the words in the review. **The words are ordered by frequency**, so the integer 1 corresponds to the most frequent word ("the"), the integer 2 to the second most frequent word, etc. By convention, the integer 0 corresponds to unknown words.

Then, the sentence is turned into a vector by simply concatenating these integers. For instance, if the sentence is "To be or not to be." and the indices of the words are as follows:
- "to": 5
- "be": 8
- "or": 21
- "not": 3

Then the sentence gets encoded as the vector [5,8,21,3,5,8].

This dataset comes preloaded with Keras, so one simple command will get us training and testing data. There is a parameter for how many words we want to look at. We've set it at 1000, but feel free to experiment.
- **num_words**: Top most frequent words to consider. This is useful if you don't want to consider very obscure words such as "Ultracrepidarian"
- **skip_top**: Top words to ignore. This is useful if you don't want to consider the most common words. For example, the word "the" would add no information to the review, so we can skip it by setting skip_top to 2 or higher.

In [None]:
#(x_train, y_train), (x_test, y_test) = imdb.load_data(path="imdb.npz",
#                                                     num_words=None,
#                                                     skip_top=0,
#                                                     maxlen=None,
#                                                     seed=113,
#                                                     start_char=1,
#                                                     oov_char=2,
#                                                     index_from=3)

# Loading the data (it's preloaded in Keras)
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=1000)

print(x_train.shape)
print(x_test.shape)

## 2. Examining the data
Notice that the data has been already pre-processed, where all the words have numbers, and the reviews come in as a vector with the words that the review contains. For example, if the word 'the' is the first one in our dictionary, and a review contains the word 'the', then there is a 1 in the corresponding vector.

The output comes as a vector of 1's and 0's, where 1 is a positive sentiment for the review, and 0 is negative.

In [None]:
print(x_test[:2])
y_test[0:2]

In [None]:
print(x_train[0])
print(y_train[0])

## 3. One-hot encoding the input / output
Here, we'll turn the **input** vectors into (0,1)-vectors. For example, if the pre-processed vector contains the number 14, then in the processed vector, the 14th entry will be 1.

In [None]:
# One-hot encoding the output into vector mode, each of length 1000
'''
keras.preprocessing.text.Tokenizer(num_words=None, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~ 
', lower=True, split=' ', char_level=False, oov_token=None)

turning each text into either a sequence of integers (each integer being the index of a token in a dictionary) or into a
vector where the coefficient for each token could be binary, based on word count, based on tf,.... 

- num_words: the maximum number of words to keep, based on word frequency. Only the most common num_words words will be kept.
- filters: a string where each element is a character that will be filtered from the texts. The default is all punctuation, 
plus tabs and line breaks, minus the ' character.
- lower: boolean. Whether to convert the texts to lowercase.
- split: str. Separator for word splitting.
- char_level: if True, every character will be treated as a token.
- oov_token: if given, it will be added to word_index and used to replace out-of-vocabulary words during text_to_sequence calls

By default, all punctuation is removed, turning the texts into space-separated sequences of words (words maybe include the 
' character). These sequences are then split into lists of tokens. They will then be indexed or vectorized.

0 is a reserved index that won't be assigned to any word.
'''
tokenizer = Tokenizer(num_words=1000)

x_train = tokenizer.sequences_to_matrix(x_train, mode='binary')
x_test = tokenizer.sequences_to_matrix(x_test, mode='binary')
print(x_train[0])
print(x_test[0])

And we'll also one-hot encode the **output**.

In [None]:
# One-hot encoding the output
num_classes = 2

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print(y_train.shape)
print(y_test.shape)

## 4. Building the  model architecture
Build a model here using sequential. Feel free to experiment with different layers and sizes! Also, experiment adding dropout to reduce overfitting.

In [None]:
# TODO: Build the model architecture
model = Sequential()
model.add(Dense(512, activation='relu', input_dim=1000))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='sigmoid')) ## why not 'softmax'??
model.summary()

# TODO: Compile the model using a loss function and an optimizer.
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

## 5. Training the model
Run the model here. Experiment with different batch_size, and number of epochs!

In [None]:
# TODO: Run the model. Feel free to experiment with different batch sizes and number of epochs.
hist = model.fit(x_train, y_train,
          batch_size=32,
          epochs=10,
          validation_data=(x_test, y_test), 
          verbose=2)

## 6. Evaluating the model
This will give you the accuracy of the model, as evaluated on the testing set. Can you get something over 85%?

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Accuracy: ", score[1])