# Welcome to A Gentle Introduction to Keras 

This course focuses on a specific sub-field of machine learning called **predictive modeling.**

Within predicitve modeling is a speciality or another sub-field called **deep learning.**

We will be crafting deep learning models with a library called Keras. 

>**Predictive modeling** is focused on developing models that make accurate predictions at the expense of explaining why predictions are made. 

You and I don't need to be able to write a binary classification model. We need to know how to use and interpret the results of the model. 

**Where does machine learning fit into data science?**

Data science is a much broader discipline. 

> Data Scientists take the raw data, analyse it, connect the dots and tell a story often via several visualizations. They usually have a broader range of skill-set and may not have too much depth into more than one or two. They are more on the creative side. Like an Artist. An Engineer, on the other hand, is someone who looks at the data as something they have to take in and churn out an output in some appropriate form in the most efficient way possible. The implementation details and other efficiency hacks are usually on the tip of their fingers. There can be a lot of overlap between the two but it is more like A Data Scientist is a Machine Learning Engineer but not the other way round. -- Ria Chakraborty, Data Scientist





# Step 1. Import our modules

Two important points here. Firstly, the **from** means we aren't importing the entire library, only a specific module. Secondly, notice we **are** imporing the entire numpy library. 

> If you get a message that states: WARNING (theano.configdefaults): g++ not detected, blah... blah. Run this in your Anaconda prompt. 

conda install mingw libpython



In [0]:
from keras.models import Sequential
from keras.layers import Dense
import numpy

# Step 2.  Set our random seed


Run an algorithm on a dataset and you've built a great model. Can you produce the same model again given the same data?
You should be able to. It should be a requirement that is high on the list for your modeling project.

> We achieve reproducibility in applied machine learning by using the exact same code, data and sequence of random numbers.

Random numbers are created using a random number generator. It’s a simple program that generates a sequence of numbers that are random enough for most applications.

This math function is deterministic. If it uses the same starting point called a seed number, it will give the same sequence of random numbers.

Hold on... what's **deterministic** mean? 

> "a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states"

Let's apply an English translator to this: 

> The **only purpose of seeding** is to make sure that you get the **exact same result** when you run this code many times on the exact same data.

In [0]:
seed = 9
numpy.random.seed(seed)

# Step 3.  Import our data set


Let's import the object called read_csv. 

We define a variable called filename and put our data set in it. 

The last line does the work. It using the function called **read_csv** to put the contents of our data set into a variable called dataframe. 

In [0]:
from pandas import read_csv
filename = 'BBCN.csv'
dataframe = read_csv(filename)

# Step 4.  Split the Output Variables


The first thing we need to do is put our data in an array. 

> An array is a data structure that stores values of **same data type**. 

In Python, this is the main difference between arrays and lists. While python lists can contain values corresponding to different data types, arrays in python can only contain values corresponding to same data type.

In [0]:
array = dataframe.values

The code below is the trickest part of the exercise. Now, we are assinging X and y as output variables.

> That looks pretty easy but keep in mind that an array starts at 0. 

If you take a look at the shape of our dataframe (shape means the number of columns and rows) you can see we have 12 rows. 

On the X array below we saying... include all items in the array from 0 to 11. 

On the y array below we are saying... just use the column in the array mapped to the **11th row**. The **BikeBuyer** column. 

> Before we split X and Y out we are going to put them in an array. 




In [0]:
X = array[:,0:11] 
Y = array[:,11]

In [0]:
dataframe.head()

# Step 4.  Build the Model


We can piece it all together by adding each layer. 

> The first layer has 11 neurons and expects 11 input variables. 

The second hidden layer has 8 neurons.

The third hidden layer has 8 neurons. 

The output layer has 1 neuron to predict the class. 

How many hidden layers are in our model? 

In [0]:
model = Sequential()
model.add(Dense(12, input_dim=11, init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))

# Step 5.  Compile the Model

A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the  metrics parameter when a model is compiled.

>  Lastly, we set the cost (or loss) function to categorical_crossentropy. The (binary) cross-entropy is just the technical term for the **cost function** in logistic regression, and the categorical cross-entropy is its generalization for multi-class predictions via softmax

Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign.

> Accuracy is perhaps the most intuitive performance measure. **It is simply the ratio of correctly predicted observations.**

Using accuracy is only good for symmetric data sets where the class distribution is 50/50 and the cost of false positives and false negatives are roughly the same. In our case our classes are balanced. 

Whenever you train a model with your data, you are actually producing some new values (predicted) for a specific feature. However, that specific feature already has some values which are real values in the dataset. 

> We know the the closer the predicted values to their corresponding real values, the better the model.

We are using cost function to measure **how close the predicted values are to their corresponding real values.**

So, for our model we choose binary_crossentropy. 

In [0]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Step 5.  Fit the Model

**Epoch:** A full pass over all of your training data.

For example, let's say you have 1213 observations. So an epoch concludes when it has finished a training pass over all 1213 of your observations.

> What you'd expect to see from running fit on your Keras model, is a decrease in loss over n number of epochs.

batch_size denotes the subset size of your training sample (e.g. 100 out of 1000) which is going to be used in order to train the network during its learning process. 

Each batch trains network in a successive order, taking into account the updated weights coming from the appliance of the previous batch. 

>Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.



In [0]:
model.fit(X, Y, nb_epoch=200, batch_size=30)

# Step 6.  Score the Model

In [0]:
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
