# Introduction à Python
L'objectif est de découvrir les principes de bases de Python.
Python est un langage interprété, structurée, orientée objet et il existe une très large documentation car c'est un dès plus populaire langage.

In [None]:
import pandas as pd
import numpy as np

# Créer un dataframe
colonne1 = range(4)
colonne2 = ["a"]*2+["b"]*2

df = pd.DataFrame({"col1" : colonne1, "col2": colonne2})

print(colonne1, "\n", colonne2, "\n")
print(df)

range(0, 4) 
 ['a', 'a', 'b', 'b'] 

   col1 col2
0     0    a
1     1    a
2     2    b
3     3    b


In [None]:
# Obtenir la dimension du dataframe


In [None]:
# Obtenir la structure du dataframe


In [None]:
# Creation d'une fonction
def verifier_pair_impair(num):
  if (num % 2) == 0:
    text = "est pair"
  else:
    text = "est impair"
  return num, text

In [None]:
verifier_pair_impair(5)

(5, 'est impair')

### MNIST dataset (sample)

### Loading the MNIST dataset in Keras

In [None]:
# MNIST comes pre-loaded with Keras
from keras.datasets import mnist

# We load it into four Numpy arrays
# train_labels[i] is the label of train_images[i]
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Let’s take a look at the training data.

In [None]:
print("Shape of training images:", train_images.shape)
print("Length of training labels:", len(train_labels))
print("Sample of training labels:", train_labels)
print("Same using Numpy repr:", repr(train_labels))

Shape of training images: (60000, 28, 28)
Length of training labels: 60000
Sample of training labels: [5 0 4 ... 5 6 8]
Same using Numpy repr: array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)


Similar look at the test data:

In [None]:
print("Shape of test images:", test_images.shape)
print("Length of test labels:", len(test_labels))
print("Sample of test labels:", repr(test_labels))

Shape of test images: (10000, 28, 28)
Length of test labels: 10000
Sample of test labels: array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)


## Basic workflow

### The network architecture

The code below builds our network. Don't worry of you don't understand everything about this example yet.

### Preparing for training: the compilation step

Loss functions and optimizers will be discussed more later on.

In [None]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

In [None]:
network.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy'])

### Preparing the image data

I have modified the book code a bit so the preprocessed data is stored separate arrays (instead of overwriting the original ones). In my opinion it is usually a good idea to keep the original data untouched --- unless memory becomes an issue.

In [None]:
train_data = train_images.reshape((60000, 28 * 28))
train_data = train_data.astype('float32') / 255
test_data = test_images.reshape((10000, 28 * 28))
test_data = test_data.astype('float32') / 255
# some sanity checks
print(train_data.shape, test_data.shape)

(60000, 784) (10000, 784)


### Preparing the labels

Keras routine `to_categorical` takes care of encoding our labels to so called one-hot -vectors. 

In [None]:
from tensorflow.keras.utils import to_categorical

train_labels_enc = to_categorical(train_labels)
test_labels_enc = to_categorical(test_labels)
# sanity checks
print(train_labels.shape, train_labels_enc.shape)
print(train_labels[0], 'is encoded as ', train_labels_enc[0])

(60000,) (60000, 10)
5 is encoded as  [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]


### Training the network

`batch_size`defines the size of data lumps and `epochs`the number of iterations.

Note the quantities displayed during training:
- epoch counter / number of epochs
- data counter / size of data
- training time per epoch
- average training time per "data lump"
- average loss (over one epoch)
- average accuracy (ditto)

Call returns a `History`object consisting of our measurements (defined at earlier `compile` call).

In [None]:
history = network.fit(train_data, train_labels_enc, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
print(history.history)

{'loss': [0.25307697057724, 0.10016775131225586, 0.06752956658601761, 0.04908774420619011, 0.03677215799689293], 'accuracy': [0.9273666739463806, 0.9704833626747131, 0.9796833395957947, 0.9852499961853027, 0.9884166717529297]}


### Testing the network

A call to `evaluate`returns the metrics we defined when compiling the model (averaged over the test set).

In [None]:
test_loss, test_acc = network.evaluate(test_data, test_labels_enc)
print('Test accuracy:', test_acc)

Test accuracy: 0.9761999845504761


### Some testing on preprocessing 

In [None]:
# stupid test: do the same without scaling the data

train_data = train_images.reshape((60000, 28 * 28))
train_data = train_data.astype('float32')
test_data = test_images.reshape((10000, 28 * 28))
test_data = test_data.astype('float32')
# some sanity checks
print(train_data.shape, test_data.shape)

# rebuild network (we want to start training from scratch)

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

(60000, 784) (10000, 784)


In [None]:
# We here train for 20 epochs with bad values at start and not much progress
history = network.fit(train_data, train_labels_enc, epochs=20, batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
test_loss, test_acc = network.evaluate(test_data, test_labels_enc)
print('Test accuracy:', test_acc)

Test accuracy: 0.975600004196167


In [None]:
# less stupid test: scale input to [-1, 1]

train_data = train_images.reshape((60000, 28 * 28))
train_data = train_data.astype('float32') / 255
train_data = (2 * train_data - 1)

test_data = test_images.reshape((10000, 28 * 28))
test_data = test_data.astype('float32') / 255
test_data = (2 * test_data - 1)

# some sanity checks
print(train_data.shape, test_data.shape)

# rebuild network (we want to start training from scratch)

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

(60000, 784) (10000, 784)


In [None]:
history = network.fit(train_data, train_labels_enc, epochs=5, batch_size=128)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
test_loss, test_acc = network.evaluate(test_data, test_labels_enc)
print('Test accuracy:', test_acc)

Test accuracy: 0.9717000126838684
