# How to use `dlt` package

## Who created this?

[David Waltz](https://github.com/DavidWalz/dlipr) at RWTH Aachen University originally created `dlipr` package. I took his lecture, 'Deep Learning in Physics Research' and studied the deep learning. The software used in the lecture was supposed to run only on the cluster of the university but I personally customized it as `dlt` (Deep Learning Tools) so that everyone can use on their own environment. 

#### About this note
In this article, I explained how to use `dlt`. Following, I took an example of the dataset, **[Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist/blob/master/README.ja.md)** and show you how dlt works effectively. Basically you do not have to understand the deep learning in advance, and if you encounter the parts where you cannot understand please skip them. If so, I would like you to have a look at the images which you will get.

- I use ★ mark where the package is used.

## How can we use it?

### Preparation

Before doing anything, I will show you the version of the libraries as follows:

In [1]:
import numpy
numpy.__version__

'1.13.3'

In [2]:
import matplotlib.pyplot
matplotlib.__version__

'2.0.2'

In [3]:
import tensorflow
tensorflow.__version__

  return f(*args, **kwds)


'1.4.1'

In [4]:
import keras
keras.__version__

Using TensorFlow backend.


'2.1.2'

To use `dlt` package, please run

```
pip install dlt
```

### Deep Learning - Fashion-MNIST

I refered to [the sample code of MNIST](https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py)

In [5]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import os 
import numpy as np

import dlt

#### ★ Reading the dataset

Fashion MNIST dataset can be read as follows:

In [6]:
data = dlt.fashion_mnist.load_data()

Downloading Fashion-MNIST dataset


In dlt, the following dataset is available:`CIFAR-10`, `CIFAR-100`, `MNIST`, `FASHION-MNIST`. Each file has own `load_data` method like above example.

You can access the raw data as follows. Fashion-MNIST dataset is provided through the same format as well-known MNIST.

##### Training data

In [7]:
X_train = data.train_images
y_train = data.train_labels

In [8]:
X_train.shape

(60000, 28, 28)

In [9]:
y_train.shape

(60000,)

##### Test data

In [10]:
X_test = data.test_images
y_test = data.test_labels

X_test.shape

(10000, 28, 28)

In [11]:
y_test.shape

(10000,)

In [12]:
print(data.classes)

['T-short/top' 'Trouser' 'Pullover' 'Dress' 'Coat' 'Sandal' 'Shirt'
 'Sneaker' 'Bag' 'Ankle boot']


★ You can examine how the true labels are distributed in `y_train`, `y_test`. The typical datasets are basically prepared the same number of the labels so that there is not bias of them. When we set up our own datasets, we need to do like this as much as possible in order to obtain correct results.

In [13]:
dlt.utils.plot_distribution_data(Y=data.train_labels, # set the target dataset
                                 dataset_name='y_train', # its name
                                 classes=data.classes, # class label
                                 fname='dist_train.png') # output filename

Mean Value: 6000
Median Value: 6000.0
Variance: 0
Standard Deviation: 0.0


<img src="dist_train.png">

In [14]:
dlt.utils.plot_distribution_data(Y=data.test_labels, 
                                 dataset_name='y_test', 
                                 classes=data.classes, 
                                 fname='dist_test.png')

Mean Value: 6000
Median Value: 6000.0
Variance: 0
Standard Deviation: 0.0


<img src="dist_test.png">

#### ★Visualize the sample images

Before the deep learning actually, you may want to look at some sample images. This is like this:

In [15]:
dlt.utils.plot_examples(data=data, 
                          num_examples=5, # How many images show on the column (#row corresponds #category)
                         fname='fashion_mnist_examples.png' # filename
                       )

<img src="fashion_mnist_examples.png">

In [16]:
X_train = X_train.reshape([-1, 28, 28, 1])
X_test = X_test.reshape([-1, 28, 28, 1])
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(28, 28, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

fit = model.fit(X_train, y_train,
          batch_size=128,
          epochs=12,
          verbose=1,
          validation_data=(X_test, y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


In [17]:
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.224214684319
Test accuracy: 0.9207


★ The graph of the loss function and accurary which shows the learning process is given by

In [18]:
dlt.utils.plot_loss_and_accuracy(fit,  # instance of model.fit
                                   fname='loss_and_accuracy_graph.png' #filename and path to save image
                                  )

<img src="loss_and_accuracy_graph.png">

In [19]:
# predicted probabilities for the test set
Yp = model.predict(X_test)
yp = np.argmax(Yp, axis=1)

#### ★The classification accuracy for Test dataset

In the classification task, we sometimes want to know how much accuracy each images are classified. In that case, we often confused how to realize it. `dlt` has the convenient method to give good image:

In [20]:
# 10 images
for i in range(10):
    dlt.utils.plot_prediction(
        Yp=Yp[i], # the predicted label for Test dataset
        X=data.test_images[i], # the image on each label
        y=data.test_labels[i], # the correct label 
        classes=data.classes, # the label name
        top_n=False, # How many images from the top. If False, it shows all category
        fname='test-%i.png' % i) # filepath

<img src="test-0.png">
<img src="test-1.png">
<img src="test-2.png">
<img src="test-3.png">
<img src="test-4.png">

The orange (blue) bar shows the accuracy on the correct (wrong) label.

Taking the last image as asn example, the calculation classifies `Shirt` with the accuracy over 90%, but `T-shirt/top` with the one around 5%.

If you want to know the result on the whole, Confusion Matrix will help you understand well.

####  ★Confusion Matrix

In [21]:
dlt.utils.plot_confusion_matrix(test_labels=data.test_labels, # the correct label (before converting one-hot vector)
                                  y_pred=yp, # Yp after np.argmax
                                  classes=data.classes, # the label name
                                  title='confusion matrix', # title of the graph
                                  fname='confusion_matrix.png' # filename
                               )

<img src="confusion_matrix.png">

The vertical (horizontal) axis represents the correct (predicted) label. Looking from the horizontal axis, if we see `Shirt` label, for example, it was classified to `Shirt` correctly with 67.80%, but also wrongly classified to `T-shirt/top` with 6.50 %.