<a href="https://colab.research.google.com/github/sudosadia/demo-repo2/blob/master/Keras_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Keras Tutorial Project**

In this project, we create a tutorial to learn keras. We have used the dataset Poker Hand that is availavle in this link: http://archive.ics.uci.edu 
The tutorial includes the following topics: 

1. Loading Data
2. Creating Model using Keras
3. Compiling Model
4. Training Model
5. Evaluating Model
6. Creating Complex Keras Model
7. Tuning Parameters


 **Data Description:** 
 
 The Poker Hand database consists of 1,025,010 instances of poker hands. Each instance is an example of a poker hand consisting of five cards drawn from a standard deck of 52 cards. Each card is described using two attributes (suit and rank), for a total of 10 features. There is one Class attribute that describes the Poker Hand. The order of cards is important, which is why there are 480 possible Royal Flush hands as compared to 4 (one for each suit  explained in more detail below):

*   S1 - Suit of card 1: Ordinal (1-4) representing: Hearts=1, Spades=2, Diamonds=3, Clubs=4
*   C1 - Rank of card 1: Numerical (1-13) representing: Ace=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , Jack=11, Queen=12, King=13
* S2 - Suit of card 2: Ordinal (1-4) representing: Hearts=1, Spades=2, Diamonds=3, Clubs=4
* C2 - Rank of card 2: Numerical (1-13) representing: Ace=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , Jack=11, Queen=12, King=13
*S3 - Suit of card 3: Ordinal (1-4) representing: Hearts=1, Spades=2, Diamonds=3, Clubs=4
*C3 - Rank of card 3: Numerical (1-13) representing: Ace=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , Jack=11, Queen=12, King=13
* S4 - Suit of card 4: Ordinal (1-4) representing: Hearts=1, Spades=2, Diamonds=3, Clubs=4
* C4 - Rank of card 4: Numerical (1-13) representing: Ace=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , Jack=11, Queen=12, King=13
* S5 - Suit of card 5: Ordinal (1-4) representing: Hearts=1, Spades=2, Diamonds=3, Clubs=4
* C5 - Rank of card 5: Numerical (1-13) representing: Ace=1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , Jack=11, Queen=12, King=13

CLASS Poker Hand:  Ordinal (0-9)
* 0 - Nothing in hand; not a recognized poker hand 
* 1 - One pair; one pair of equal ranks within five cards
* 2 - Two pairs; two pairs of equal ranks within five cards
* 3 - Three of a kind; three equal ranks within five cards
* 4 - Straight; five cards, sequentially ranked with no gaps
* 5 - Flush; five cards with the same suit
* 6 - Full house; pair + different rank three of a kind
* 7 - Four of a kind; four equal ranks within five cards
* 8 - Straight flush; straight + flush
* 9 - Royal flush; {Ace, King, Queen, Jack, Ten} + flush

 

**How to load data:** 

There are multiple ways to load data. We have used panda library to read data. First we retrieve the CSV files from the url using urlretrive() method. Then we read the csv files using panda and add column headers. Column headers represent Suit and Rank names and Poker hand. We have total 11 columns. We can see how the matrix looks like by using head() method.

The next step is to prepare the training and testing data. The training and testing data do not need the last column which represents the poker hand. We drop that column and construct trainX and testX. The trainY and testY represents the output which consists only of the last column(poker hand). Later we convert output vector to one hot vector since its a multiclass classification. 

In [None]:
# load libraries
from pandas import Series, DataFrame
import pandas as pd
import urllib.request

# read poker training and test data from the url and save the file to current directory
urllib.request.urlretrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand-training-true.data", "poker_train.csv")
urllib.request.urlretrieve("http://archive.ics.uci.edu/ml/machine-learning-databases/poker/poker-hand-testing.data", "poker_test.csv")

# read the data in and add column names
poker_train = pd.read_csv("poker_train.csv", header=None, names=['S1', 'C1', 'S2', 'C2', 'S3', 'C3','S4', 'C4', 'S5', 'C5', 'hand'])
poker_test = pd.read_csv("poker_test.csv", header=None, names=['S1', 'C1', 'S2', 'C2', 'S3', 'C3','S4', 'C4', 'S5', 'C5', 'hand'])

#Explore the data
poker_train.head()

Unnamed: 0,S1,C1,S2,C2,S3,C3,S4,C4,S5,C5,hand
0,1,10,1,11,1,13,1,12,1,1,9
1,2,11,2,13,2,10,2,12,2,1,9
2,3,12,3,11,3,13,3,10,3,1,9
3,4,10,4,11,4,1,4,13,4,12,9
4,4,1,4,13,4,12,4,11,4,10,9


In [None]:
#Separating features and output from train and test data
trainY=poker_train['hand']
testY=poker_test['hand']

#Create one hot vector of poker hands 
trainY=pd.get_dummies(trainY)
testY=pd.get_dummies(testY)
trainX = poker_train.drop(['hand'],axis=1)
testX = poker_test.drop(['hand'],axis=1)

#Show the column and row numbers train and test data
print('Shape of Training Set:',trainX.shape)
print('Shape of Testing Set:',testX.shape)

trainX.head()

Shape of Training Set: (25010, 10)
Shape of Testing Set: (1000000, 10)


Unnamed: 0,S1,C1,S2,C2,S3,C3,S4,C4,S5,C5
0,1,10,1,11,1,13,1,12,1,1
1,2,11,2,13,2,10,2,12,2,1
2,3,12,3,11,3,13,3,10,3,1
3,4,10,4,11,4,1,4,13,4,12
4,4,1,4,13,4,12,4,11,4,10


In [None]:
trainY.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,1
3,0,0,0,0,0,0,0,0,0,1
4,0,0,0,0,0,0,0,0,0,1


**2. CREATING MODEL**

We will create a simple neural network model with Keras. A neural network is inspired by a biological neural network but here the connections between neurons are modeled by weights. Neural networks are organized into layers of nodes. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data.

**2.1 A Simple Neural Network Model:**

We will create a simple neural network with two hidden layers using and neurons. First we create a sequential model which means that the output of each layer is added as input to the next layer. Layers can be added by using model.add() method. Adding layers are like stacking lego blocks one by one. We have used dense layer which represents a fully connected layer. We need to mention input dimension in the input layer, numebr of neurons in hidden layer, output dimension in the output layer. Our simple neural netowork consists of 10 neurons in the input layer, 15 neurons in 1st and 2nd hidden layer and 10 neurons in the output layer, one for each poker hand. The illustration is shown below:

Besides, we need to specify the activation function which is relu (rectified linear unit). The output layer has different activation function called softmax since this is a multiclass classification. 
Later we will show a complex neural network with more layers and neurons and how that impact the accuracy of the model.


In [None]:
import keras
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Dense

#Model 1 with one hidden layer, 15 neurons in the hidden layer
poker_model1 = Sequential()
poker_model1.add(Dense(15,input_shape=(10,), activation='relu')) #input layer
poker_model1.add(Dense(15, activation='relu')) #hidden layer
poker_model1.add(Dense(10, activation='softmax')) #output layer

**3. COMPILING MODEL**

Now we will compile our model by using model.compile() method of keras. We have to specifiy the loss function, optimizer and metrics to be used. We use Categorical_crossentropy loss since we have multiple classes (10 poker hand classes). 

The optimizer that we have used is adam. Optimizers are methods used to change the attributes of neural network such as weights and learning rate in order to reduce the loss. Adam is an adaptive learning rate method. 

Finally, we have used accuracy as metric to judge the performance of our model. model.summary() method shows the summary of the whole model including the shape of all layers. 

References: Diederik P. Kingma and Jimmy Lei Ba. Adam : A method for stochastic optimization. 2014. arXiv:1412.6980v9


In [None]:
#Compile model
poker_model1.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy']) 
#Show model summary 
poker_model1.summary()

Model: "sequential_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_83 (Dense)             (None, 15)                165       
_________________________________________________________________
dense_84 (Dense)             (None, 15)                240       
_________________________________________________________________
dense_85 (Dense)             (None, 10)                160       
Total params: 565
Trainable params: 565
Non-trainable params: 0
_________________________________________________________________


**4. FITTING MODEL**

model.fit() method is used to train the model with data input and label output. We can further mention number of epochs, bacth_size() as arguments to the fit method. Epochs means how many iterations we want to train the model. If the dataset is big, we can divide it by batches where each batch size equal to batch_size and train the model batch by batch. If Shuffle is argument is set to True, if shuffles the dataset before creating each batch. Verbose represents if want to see the status of each epoch or not. If verbose is et to 0, nothing will be shown, if verbose=1 or 2, each epoch's loss and accuracy will be shown. 

Model performance depends on diffferent parameters and hypermaters that we will see in section 6. 

In [None]:
#Fit model and save hostory in h1
h1=poker_model1.fit(trainX, trainY, epochs = 50, batch_size = 100, verbose=0,shuffle=True)

**5. EVALUATING MODEL**

We can get testing or training accuracy by model.evaluate() method. This method takes data input and output as arguments. If we use any batch_size, we have to mention that here too. The evaluate method returns loss and accruacy of the model. Loss specifies how poorly or well a model behaves after each iteration of optimization. An accuracy metric measures the model's performance. Here, we see that our model achieves about 58.7% accuracy. The loss is about 90%

In [None]:
#Calculate loss and accuracy of model 1
loss1,accuracy1= poker_model1.evaluate(testX,testY,batch_size=100)
print()
print("Model 1: one hidden layer, adam optimizer, 100 batch size and 50 epochs")
print("Testing loss=%f accuracy=%f with " % (loss1,accuracy1))


Model 1: one hidden layer, adam optimizer, 100 batch size and 50 epochs
Testing loss=0.916785 accuracy=0.576310 with 


**6. TUNING MODEL**

Model can be tuned in multiple ways. Here we, will show how different parameters affect a neural model's performance. We will tune our model by changing the following parameters:
* Number of hidden layers and neurons
* Batch size
* Number of epochs
* Optimizer

We will compare each model's accuracy with the base model that we created earlier in section 2. 


**6.1 Changing No. of Hidden Layers:**

We will create a new model with two more hidden layers. This time we will use more neurons in the hidden layers. We call this model as model2 which is sequential as before, input layer has 10 neurons, output layers has 10 neurons for each poker hand class, each hdiden layer has 50 neurons. The activaton functions are same as our first model. The batch size, optimizer and epochs are same. We want to compare with respect to number of hidden layers and neurons only. 

We see that the model's accuracy change drastically. Our base model with one hidden layer consisting of 15 neurons has about 57% accuracy where the second model with two hidden layers each consisting of 50 neurons has about 74% accuracy. 

In [None]:
#Model 2
poker_model2 = Sequential()
poker_model2.add(Dense(50,input_shape=(10,), activation='relu')) #input layer
poker_model2.add(Dense(50, activation='relu')) #1st hidden layer
poker_model2.add(Dense(50, activation='relu')) #2nd hidden layer
poker_model2.add(Dense(10, activation='softmax')) #output layer
poker_model2.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy']) 
#Save model history in h2 
h2=poker_model2.fit(trainX, trainY, epochs = 50, batch_size = 100, verbose=0,shuffle=True)
#Evaluate model
loss2,accuracy2 = poker_model2.evaluate(testX,testY,batch_size=100) 
print()
print("Model 2: two hidden layers, adam optimizer, 100 batch size and 50 epochs")
print("Testing loss=%f accuracy=%f with " % (loss2,accuracy2))


Model 2: two hidden layers, adam optimizer, 100 batch size and 50 epochs
Testing loss=0.640213 accuracy=0.739647 with 


**6.2 Changing Batch size:**

Depending on the model and dataset, different parameters can have diffrent impact. Now we will see how batch size affects model's performance. Remember that our base model (model 1) has accuracy 57% with batch size 100. Here we use same hidden layers and parameters as base model except batch size which is 50 now. So, we divide our dataset into smaller batches than previous one. 

We see that the model's accuracy improve from % to % and the loss is changed to . 

In [None]:
#Model 3
poker_model3 = Sequential()
poker_model3.add(Dense(15,input_shape=(10,), activation='relu')) #input layer
poker_model3.add(Dense(15, activation='relu')) #1st hidden layer
poker_model3.add(Dense(10, activation='softmax')) #output layer
poker_model3.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])  
h3=poker_model3.fit(trainX, trainY, epochs = 50, batch_size = 50, verbose=0,shuffle=True)
loss3,accuracy3 = poker_model3.evaluate(testX,testY,batch_size=50) 

print()
print("Model 3: one hidden layer, adam optimizer, 50 batch size and 50 epochs")
print("Testing loss=%f accuracy=%f with " % (loss3,accuracy3))


Model 3: one hidden layer, adam optimizer, 50 batch size and 50 epochs
Testing loss=0.879772 accuracy=0.603581 with 


**6.3 Changing Epochs:**

We create our 4th model where number of epochs is 500 and all the other parameters are same as our first model. 

We see that the model's accuracy improve to 75% from 57%. 

In [None]:
#Model 4
poker_model4 = Sequential()
poker_model4.add(Dense(15,input_shape=(10,), activation='relu')) #input layer
poker_model4.add(Dense(15, activation='relu')) #1st hidden layer
poker_model4.add(Dense(10, activation='softmax')) #output layer
poker_model4.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])  
h4=poker_model4.fit(trainX, trainY, epochs = 500, batch_size = 100, verbose=0,shuffle=True)
loss4,accuracy4 = poker_model4.evaluate(testX,testY,batch_size=100) 

print()
print("Model 4: one hidden layer, adam optimizer, 50 batch size and 500 epochs")
print("Testing loss=%f accuracy=%f with " % (loss4,accuracy4))


Model 4: one hidden layer, adam optimizer, 50 batch size and 500 epochs
Testing loss=0.713568 accuracy=0.702335 with 


**6.4 Changing Optimizer:**

We will use a different optimizer in our 5th model which SGD( Stochastic Gradient Descent) optimizer. Gradient descent method can update each parameter of a model, observe how a change would affect the objective function, choose a direction that would lower the error rate, and continue iterating until the objective function converges to the minimum. SGD is a variant of gradient descent. SGD computes on a small batch of data instead on cosidering the whole dataset. 

We see that the model's accuracy improve to 75% from 57%. 

Reference: https://medium.com/syncedreview/iclr-2019-fast-as-adam-good-as-sgd-new-optimizer-has-both-78e37e8f9a34

In [None]:
from tensorflow.python.keras.optimizers import SGD

#Model 5
poker_model5 = Sequential()
poker_model5.add(Dense(15,input_shape=(10,), activation='relu')) #input layer
poker_model5.add(Dense(15, activation='relu')) #1st hidden layer
poker_model5.add(Dense(10, activation='softmax')) #output layer
poker_model5.compile(optimizer = 'SGD', loss = 'categorical_crossentropy', metrics = ['accuracy'])  
h5=poker_model5.fit(trainX, trainY, epochs = 50, batch_size = 100, verbose=0,shuffle=True)
loss5,accuracy5 = poker_model5.evaluate(testX,testY,batch_size=100) 

print()
print("Model 5: one hidden layer, SGD optimizer, 100 batch size and 50 epochs")
print("Testing loss=%f accuracy=%f with " % (loss5,accuracy5))


Model 5: one hidden layer, SGD optimizer, 100 batch size and 50 epochs
Testing loss=0.974669 accuracy=0.523946 with 
