![](https://i.ibb.co/3vF9yD8/Screenshot-from-2019-05-29-21-23-47.png)

**MNIST ("Modified National Institute of Standards and Technology")** is the de facto “hello world” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.


Using CNN Keras and Tensorflow (v1) DNN Baseline with GPU. You can try to add more layers, you can try to change learning rate. Thank you for reading


In [None]:
import pandas as pd
import numpy as np


from tensorflow.keras.optimizers import Nadam
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, AvgPool2D, BatchNormalization, Dropout
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.model_selection import train_test_split
from datetime import datetime

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()


from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")
sns.set_style ('darkgrid')
%matplotlib inline
np.random.seed(42)

In [None]:
device_name = tf.test.gpu_device_name()
if "GPU" not in device_name:
    print("GPU device not found")
print('Found GPU at: {}'.format(device_name))

In [None]:
train = pd.read_csv('../input/digit-recognizer/train.csv')

In [None]:
train.head()

In [None]:
test = pd.read_csv('../input/digit-recognizer/test.csv')

In [None]:
test.head()

In [None]:
features = train.drop('label', axis=1)

In [None]:
target = train['label']

In [None]:
X_ = np.array(features)

In [None]:
X_test = np.array(test)

In [None]:
X_train = X_.reshape(X_.shape[0], 28, 28)

In [None]:
X_train.shape

Let us take a look to out objects

In [None]:
fig = plt.figure(figsize=(10,5))

for i in range(16):
    fig.add_subplot(4, 4, i+1)
    
    plt.imshow(X_train[i], cmap='gray')
    
    plt.xticks([])
    plt.yticks([])
    plt.tight_layout()
    plt.title('Digit: ' + str(target[i]))


Now we have to check the count of values for our output layer

In [None]:
target.value_counts(normalize=True)

In [None]:
len(target.value_counts())

## CNN


We have to reshape our data

In [None]:
X_train = X_.reshape(X_.shape[0], 28, 28, 1)
X_test_reshape = X_test.reshape(X_test.shape[0], 28, 28, 1)
X_tr, X_val, y_tr, y_val = train_test_split(X_train, target, test_size=0.25, 
                                            random_state=42)

Let us build our CNN. We do not need millions of params

In [None]:
model = Sequential()


model.add(Conv2D(filters = 64, 
                 kernel_size = (5,5), 
                 padding = 'same', 
                 activation ='elu', 
                 input_shape = (28,28,1)))

model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(filters = 64, 
                 kernel_size = (5,5), 
                 padding = 'same', 
                 activation ='elu', 
                 input_shape = (28,28,1)))

model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(AvgPool2D(pool_size=(2,2)))

model.add(Conv2D(filters = 64, 
                 kernel_size = (5,5), 
                 padding = 'valid', 
                 activation ='elu'))

model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Conv2D(filters = 32, 
                 kernel_size = (3,3), 
                 padding = 'valid', 
                 activation ='elu'))

model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(AvgPool2D(pool_size=(2,2)))

model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Flatten())

model.add(Dense(300, activation = "elu"))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(200, activation = "elu"))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(100, activation = "elu"))
model.add(BatchNormalization())
model.add(Dropout(0.3))

model.add(Dense(10, activation = "softmax"))


model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', 
              metrics=['acc'])

In [None]:
model.summary()

In [None]:
plot_model(model, show_shapes=True, show_layer_names=False)

In [None]:
early_stopping = EarlyStopping(
    min_delta=0.0002,
    mode='min', 
    patience=20,
    restore_best_weights=True,
)

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
history = model.fit(X_tr, y_tr, 
          validation_data=(X_val, y_val),
          verbose=1, epochs=75, batch_size=16,
          callbacks=[early_stopping])

history_df = pd.DataFrame(history.history)
history_df.loc[:, ['loss', 'val_loss']].plot()
print("Minimum Validation Loss: {:0.4f}".format(history_df['val_loss'].min()))

In [None]:
X_test_reshape.shape

In [None]:
preds = np.argmax(model.predict(X_test_reshape), axis=1)

In [None]:
submission = pd.read_csv('../input/digit-recognizer/sample_submission.csv')

In [None]:
submission.shape

In [None]:
submission['Label'] = preds
submission.to_csv('my_submission_keras.csv',index=False)


submission.head()

## TensorFlow

Let us try to do it via only Tensorflow. Just for practice - want to share some hints.  

Thanks to my favourite book: Hands-On Machine Learning with Scikit-Learn and TensorFlow - I love this book

We indicate the number of neurons in layers, inputs and outputs

In [None]:
tf.reset_default_graph()

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
INPUTS = 28 * 28 # MNIST size

HIDDEN_1 = 300
HIDDEN_2 = 100
OUTPUTS = 10

Create placeholder nodes

In [None]:
X = tf.placeholder(tf.float32, shape=(None, INPUTS), name='X')
y = tf.placeholder(tf.int32, shape=(None), name='y')

Now we can create layers

In [None]:
with tf.name_scope('DNN'):
  hidden1 = tf.layers.dense(X, HIDDEN_1, name='hidden_1', 
                    activation = tf.nn.leaky_relu)
  hidden2 = tf.layers.dense(hidden1, HIDDEN_2, name='hidden_2', 
                    activation = tf.nn.leaky_relu)
  logits = tf.layers.dense(hidden2, OUTPUTS, name='outputs')

logits - output of neurons before passing through softmax

In [None]:
with tf.name_scope('loss_func'):
  cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, 
                                                              logits=logits)
  loss = tf.reduce_mean(tf.cast(cross_entr, tf.float32))

In [None]:
with tf.name_scope('train'):
  optimizer = tf.train.RMSPropOptimizer(learning_rate=0.01)
  training = optimizer.minimize(loss)

In [None]:
with tf.name_scope('eval'):
  valid = tf.nn.in_top_k(logits, y, 1)
  accuracy_score = tf.reduce_mean(tf.cast(valid, tf.float32))

Training (with GPU) (let us use only 5 epochs to compute result faster)

In [None]:
epochs = 5
batch_size = 50

In [None]:
X_train = features.to_numpy().astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = test.to_numpy().astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = target.to_numpy().astype(np.int32)
X_train.shape, y_train.shape

In [None]:
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]

In [None]:
X_valid.shape

This func will help us to make batches for training

In [None]:
def shuffle(X, y, batch_size):
    idx = np.random.permutation(len(X))
    rnd_batches = len(X) // batch_size
    for batch_idx in np.array_split(idx, rnd_batches):
        X_batch, y_batch = X[idx], y[idx]
        yield X_batch, y_batch

In [None]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

to make sure we use GPU

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

In [None]:
%%time

now = datetime.utcnow().strftime('Y%m%d %H%M%S')
root_logdir = 'tf_logs'
logdir = '{}/run-{}/'.format(root_logdir, now)

tf.debugging.set_log_device_placement(True)
file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

with tf.Session() as sess:

  init.run()

  for epoch in range(epochs):

      for X_batch, y_batch in shuffle(X_train, y_train, batch_size):
        sess.run(training, feed_dict={X: X_batch, y: y_batch})

      if epoch % 1 == 0:

        acc_train = accuracy_score.eval(feed_dict={X: X_batch, y: y_batch})
        acc_valid = accuracy_score.eval(feed_dict={X: X_valid, y: y_valid})

        print(epoch, 'Accuracy on training', acc_train, 
                     'Accuracy on validation', acc_valid)

  save_path = saver.save(sess, './model.ckpt')

Results from Google Collab:

![](https://i.ibb.co/4s4K6Xf/2021-09-03-23-04-10.png)

In [None]:
file_writer.close()

It is for runnig tensorboard. But now support is disabled

In [None]:
# %load_ext tensorboard
# %tensorboard --logdir logs --bind_all

Restore our model and predict our Test

In [None]:
with tf.Session() as sess:
    saver.restore(sess, './model.ckpt')
    Z = logits.eval(feed_dict={X: X_test})
    y_pred = np.argmax(Z, axis=1)

In [None]:
submission = pd.read_csv('../input/digit-recognizer/sample_submission.csv')

In [None]:
submission['Label'] = y_pred
submission.to_csv('my_submission_tensorflow.csv',index=False)


submission.head()

Thank you for reading. Good luck with learning. You can add some layers to CNN and it will improve score