# Introdución a Keras ( acc 81.20%)

# The Perceptron
El bloque de construcción de una red neuronal es la neurona artificial. En el nivel más básico tiene entradas de datos, una etapa de procesamiento (el gran círculo gris) y una salida.

Este es un ejemplo de cómo la neurona procesa las entradas. Las entradas son puntos de datos. (flechas naranjas)

1) Todas las entradas están reducidas. Cuando llega una señal (punto de datos), se multiplica por un valor de peso. Si una neurona tiene tres entradas, como la nuestra, tiene tres pesos. Durante la fase de aprendizaje, la red ajustará los pesos en función de los errores de los últimos resultados.

2) Durante la siguiente fase se resumen todas las señales. Esa mirada fresca es un objeto de suma. Las señales de entrada modificadas se suman a un solo valor. En este paso, también se agrega un desplazamiento a la suma. Este desplazamiento se llama sesgo. La red neuronal también ajusta el sesgo durante la fase de aprendizaje.

Aquí es donde el aprendizaje toma lugar. Al comienzo, todas las neuronas tienen pesos aleatorios y sesgos aleatorios. Después de cada iteración de aprendizaje, los pesos y los sesgos se cambian gradualmente, de modo que el próximo resultado se acerca un poco más al resultado deseado. De esta manera, la red neuronal se mueve gradualmente hacia un estado donde los patrones deseados se "aprenden".

3) Finalmente, el resultado del cálculo de la neurona se convierte en una señal de salida. Esto se hace alimentando el resultado a una función de activación (también llamada función de transferencia).

Nota: esta distinción se debe hacer entre la neurona artificial y el perceptrón; sin embargo, la mayoría de la literatura utilizará los términos como sinónimos. La respuesta que tomamos de una pregunta de Quora sobre el tema.

"Aquí está la mejor manera que he encontrado para describir la diferencia:

Los perceptrones son los primeros en la década de 1950, y utiliza una función de activación frágil para hacer la clasificación, por lo que si w * x es mayor que algún valor, predice positivo, de lo contrario negativo.

Las neuronas utilizan una función de activación más suave al introducir una función sigmoidea, una función de tanh u otras funciones de activación para pasar valores a otras neuronas en la red.

Por lo tanto, los perceptrones no se usan en una configuración de red, se clasifican por sí mismos, por lo que no pueden clasificar XOR, sin embargo las neuronas pueden hacerlo porque todas contribuyen a la salida final, utilizando una estructura más complicada (es decir, múltiples capas en la red). son capaces de clasificar XOR y otros problemas complicados ".

- Bexian Xiong, Licenciado en Ciencias de la información, Aprendizaje automático y práctica de minería de datos

# Step 1. Import our modules

Two important points here. Firstly, the **from** means we aren't importing the entire library, only a specific module. Secondly, notice we **are** imporing the entire numpy library. 

> If you get a message that states: WARNING (theano.configdefaults): g++ not detected, blah... blah. Run this in your Anaconda prompt. 

conda install mingw libpython



In [7]:
from keras.models import Sequential
from keras.layers import Dense 
## or witn tensorflow
from tensorflow.python.keras import optimizers
# permite crear redes neuronales secuenciales
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dropout, Flatten, Dense, Activation

import numpy

# Step 2.  Set our random seed


Run an algorithm on a dataset and you've built a great model. Can you produce the same model again given the same data?
You should be able to. It should be a requirement that is high on the list for your modeling project.

> We achieve reproducibility in applied machine learning by using the exact same code, data and sequence of random numbers.

Random numbers are created using a random number generator. It’s a simple program that generates a sequence of numbers that are random enough for most applications.

This math function is deterministic. If it uses the same starting point called a seed number, it will give the same sequence of random numbers.

Hold on... what's **deterministic** mean? 

> "a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states"

Let's apply an English translator to this: 

> The **only purpose of seeding** is to make sure that you get the **exact same result** when you run this code many times on the exact same data.

In [None]:
seed = 9
numpy.random.seed(seed)

# Step 3.  Import our data set


Let's import the object called read_csv. 

We define a variable called filename and put our data set in it. 

The last line does the work. It using the function called **read_csv** to put the contents of our data set into a variable called dataframe. 

In [2]:
from pandas import read_csv
filename = 'BBCN.csv'
dataframe = read_csv(filename)

# Step 4.  Split the Output Variables


The first thing we need to do is put our data in an array. 

> An array is a data structure that stores values of **same data type**. 

In Python, this is the main difference between arrays and lists. While python lists can contain values corresponding to different data types, arrays in python can only contain values corresponding to same data type.

In [3]:
array = dataframe.values

The code below is the trickest part of the exercise. Now, we are assinging X and y as output variables.

> That looks pretty easy but keep in mind that an array starts at 0. 

If you take a look at the shape of our dataframe (shape means the number of columns and rows) you can see we have 12 rows. 

On the X array below we saying... include all items in the array from 0 to 11. 

On the y array below we are saying... just use the column in the array mapped to the **11th row**. The **BikeBuyer** column. 

> Before we split X and Y out we are going to put them in an array. 




In [4]:
X = array[:,0:11] 
Y = array[:,11]

In [5]:
dataframe.head()

Unnamed: 0,MaritalStatus,Gender,YearlyIncome,TotalChildren,NumberChildrenAtHome,EnglishEducation,HouseOwnerFlag,NumberCarsOwned,CommuteDistance,Region,Age,BikeBuyer
0,5,1,9.0,2,0,5,1,0,2,2,5,1
1,5,1,6.0,3,3,5,0,1,1,2,4,1
2,5,1,6.0,3,3,5,1,1,5,2,4,1
3,5,2,7.0,0,0,5,0,1,10,2,5,1
4,5,2,8.0,5,5,5,1,4,2,2,5,1


# Step 4.  Build the Model


We can piece it all together by adding each layer. 

> The first layer has 11 neurons and expects 11 input variables. 

The second hidden layer has 8 neurons.

The third hidden layer has 8 neurons. 

The output layer has 1 neuron to predict the class. 

How many hidden layers are in our model? 

In [27]:
model = Sequential()
model.add(Dense(12, input_dim=11,   activation='relu'))
model.add(Dense(8,   activation='relu'))
#model.add(Dense(4,   activation='relu')) #74%
model.add(Dense(1,  activation='sigmoid'))

# Step 5.  Compile the Model

A metric is a function that is used to judge the performance of your model. Metric functions are to be supplied in the  metrics parameter when a model is compiled. 

>  Lastly, we set the cost (or loss) function to categorical_crossentropy. The (binary) cross-entropy is just the technical term for the **cost function** in logistic regression, and the categorical cross-entropy is its generalization for multi-class predictions via softmax

Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign.

> Accuracy is perhaps the most intuitive performance measure. **It is simply the ratio of correctly predicted observations.**

Using accuracy is only good for symmetric data sets where the class distribution is 50/50 and the cost of false positives and false negatives are roughly the same. In our case our classes are balanced. 

Whenever you train a model with your data, you are actually producing some new values (predicted) for a specific feature. However, that specific feature already has some values which are real values in the dataset. 

> We know the the closer the predicted values to their corresponding real values, the better the model.

We are using cost function to measure **how close the predicted values are to their corresponding real values.**

So, for our model we choose binary_crossentropy. 

In [23]:
# learning rate ,menor es mejor
lr = 0.0004

In [28]:
#model.compile(loss='binary_crossentropy', optimizer=optimizers.Adam(lr=lr), metrics=['accuracy']) ##70~74
model.compile(loss='binary_crossentropy', optimizer="adam", metrics=['accuracy']) # 80

# Step 5.  Fit the Model

**Epoch:** A full pass over all of your training data.

For example, let's say you have 1213 observations. So an epoch concludes when it has finished a training pass over all 1213 of your observations.

> What you'd expect to see from running fit on your Keras model, is a decrease in loss over n number of epochs.

batch_size denotes the subset size of your training sample (e.g. 100 out of 1000) which is going to be used in order to train the network during its learning process. 

Each batch trains network in a successive order, taking into account the updated weights coming from the appliance of the previous batch. 

>Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.



In [32]:
#model.fit(X, Y, epochs=1000, batch_size=30)

# Step 6.  Score the Model

In [30]:
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))


acc: 81.20%


> Great course. Managed to get to 85.8% using 'rmsprop' optimizer and epoch of 20 000 with batch_size as 1000. This is computationally expensive though.