## Fashion-MNIST

Вам предлагается решить задачу классификации на наборе данных Fashion-MNIST, который вы можете скачать по следующей ссылке [Fashion-MNIST on Kaggle](https://www.kaggle.com/zalando-research/fashionmnist)

Данные проект является учебным и его цель – научиться работать с различными моделями машинного обучения. Вам необязательно строить самую лучшую модель и получать лучшее качество на тестовых данных. Самое важное – это научиться обучать модели и анализировать полученные результаты. В результате, вы научитесь использовать и подготавливать данные для решения задачи классификации. Так же вы изучите на практике различные алгоритмы машиного обучения, такие как логистическая регрессия, полносвязные нейронные сети и сверточные нейронные сети. Более того, вы научитесь обучать эти модели и анализировать результаты работы этих алгоритмов на новых данных

Fashion-MNIST – это датасет состоящий из 70000 черно-белых изображений одежды 28х28 пикселей каждое. 60000 из них содержатся в тренировочной выборке, и 10000 – в тестовой. Этот набор данных представляет из себя альтернативу обычному датасету рукописных цифр MNIST. Существовало несколько предпосылок для создания такого набора данных.

В Fashion-MNIST содержится 10 классов разной одежды по аналогии с 10 цифрами из MNIST. Подробное описание датасета можно найти далее

--------

## Fashion-MNIST description

**Content**

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. The training and test data sets have 785 columns. The first column consists of the class labels (see above), and represents the article of clothing. The rest of the columns contain the pixel-values of the associated image.

To locate a pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27. The pixel is located on row i and column j of a 28 x 28 matrix.
For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.


**Labels**

Each training and test example is assigned to one of the following labels:

0. T-shirt/top
1. Trouser
2. Pullover
3. Dress
4. Coat
5. Sandal
6. Shirt
7. Sneaker
8. Bag
9. Ankle boot


**TL;DR**

Each row is a separate image
Column 1 is the class label.
Remaining columns are pixel numbers (784 total).
Each value is the darkness of the pixel (1 to 255)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

dir_path = "/content/drive/My Drive/Colab Notebooks/Part-5/data/"

Mounted at /content/drive


## 1. Загружаем исходные данны и готовим их для классификации

In [None]:
#import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf

#%matplotlib inline
np.set_printoptions(precision=3)
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (8, 6)

In [None]:
train = pd.read_csv(dir_path+"fashion-mnist_train.csv", sep=',', header= 0)
# train = pd.read_csv("./data/fashion-mnist_train.csv", sep=',')
train.shape

(60000, 785)

In [None]:
test = pd.read_csv(dir_path+"fashion-mnist_test.csv", sep=',', header= 0)
# test = pd.read_csv("./data/fashion-mnist_test.csv", sep=',', header= 0)
test.shape

(10000, 785)

In [None]:
# type(test)
# test.iloc[:,0
# test.drop(labels=[0,1], axis=1)
test.drop(labels="label", axis=1)

## 2. Подготовка данных для обработки
----------

In [None]:
# разделение меток и картинок на обучающей выборке
# X_train = train.drop(labels=0, axis=1)
X_train = train.drop(labels="label", axis=1)
y_train = train.iloc[:, 0]
X_train.shape, y_train.shape

((60000, 784), (60000,))

In [None]:
X_train

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,5,0,0,...,0,0,0,30,43,0,0,0,0,0
3,0,0,0,1,2,0,0,0,0,0,...,3,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59995,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
59996,0,0,0,0,0,0,0,0,0,0,...,73,0,0,0,0,0,0,0,0,0
59997,0,0,0,0,0,0,0,0,0,0,...,160,162,163,135,94,0,0,0,0,0
59998,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [None]:
# разделение меток и картинок на обучающей выборке
# X_test = test.drop(labels=0, axis=1)
X_test = test.drop(labels="label", axis=1)
y_test = test.iloc[:, 0]
X_test.shape, y_test.shape

((10000, 784), (10000,))

In [None]:
X_test

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0,0,0,0,0,0,0,9,8,0,...,103,87,56,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,34,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,14,53,99,17,...,0,0,0,0,63,53,31,0,0,0
3,0,0,0,0,0,0,0,0,0,161,...,137,126,140,0,133,224,222,56,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,0,0,0,0,0,0,0,0,0,37,...,32,23,14,20,0,0,1,0,0,0
9996,0,0,0,0,0,0,0,0,0,0,...,0,0,0,2,52,23,28,0,0,0
9997,0,0,0,0,0,0,0,0,0,0,...,175,172,172,182,199,222,42,0,1,0
9998,0,1,3,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0


In [None]:
# нормировка данных - здесь ВАШ КОД !!!

x_train = X_train.astype('float32')
x_test = X_test.astype('float32')

x_train /= 255
x_test /= 255

print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train

60000 train samples
10000 test samples


Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.019608,0.0,0.0,...,0.000000,0.000000,0.000000,0.117647,0.168627,0.000000,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.003922,0.007843,0.0,0.0,0.000000,0.0,0.0,...,0.011765,0.000000,0.000000,0.000000,0.000000,0.003922,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59995,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
59996,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.286275,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
59997,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.627451,0.635294,0.639216,0.529412,0.368627,0.000000,0.0,0.0,0.0,0.0
59998,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0


In [None]:
x_train.head(3)

Unnamed: 0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,pixel10,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,...,0.0,0.0,0.0,0.117647,0.168627,0.0,0.0,0.0,0.0,0.0




------

In [None]:
# переформатируем метки для решения задачи классификации ...
num_classes = 10
print(y_train[0])

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)


print(y_train[0])

2
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]


In [None]:
y_train[0]

array([0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

## 2. Строим полносвязную нейронную сеть
-------------

### **Моделируем нейросеть**

Смоделируем нейросеть с полносвязными слоями 

--------

In [None]:
from tensorflow.keras.layers import Dense, Dropout

In [None]:
# v1.0
# инициируем модель нейросети
model = tf.keras.models.Sequential()

# Конструируем нейросеть из 2-х полносвязных слоев (+ входной)
model.add(Dense(256, activation='relu', input_shape=(784,)))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

In [None]:
# v2.0 - with dropout and 512 neurons in the first layer
# инициируем модель нейросети
model = tf.keras.models.Sequential()

# Конструируем нейросеть из 2-х полносвязных слоев (+ входной)
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.25))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 512)               401920    
                                                                 
 dropout_3 (Dropout)         (None, 512)               0         
                                                                 
 dense_9 (Dense)             (None, 10)                5130      
                                                                 
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.summary()

In [None]:
# Готовим модель для обучения
model.compile(
    loss='categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    metrics=['accuracy']
)

### **Обучаем и анализируем точность нейросети**

Зададим параметры обучения и обучим нейросеть

--------

In [None]:
model.reset_metrics()
# обучаем модель
batch_size = 256
epochs = 20

_ = model.fit(
    x_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(x_test, y_test) 
)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


на 17-й итерации (это 2-я сессия обучения) началось переобучение - точность на тестовой выборке начала падать и колебаться, а на обучающей - расти. ПОэтому далее нет смысла проводить обучение.

-----------

In [None]:
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

Test loss: 0.26948121190071106
Test accuracy: 0.9003999829292297


**ВЫВОД**:



------------

## 3. Строим сверточную нейронную сеть
-------------