# Introduction


For this project we need to classify images of hand written digits which are in Kannada. Kannada is a language spoken predominantly by people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script (see Wikipedia).

The format is similar to MNIST in terms of how the data is structured.

Packages and Libraries:

For this project, we will make use of the tensorflow, keras, and scikit-learn libraries for machine learning. Numpy and Pandas for scientific computing and data manipulation. Seaborn and Matplotlib will be used for visualizations and Exploratory Data Analysis (EDA).



*   tensorflow==2.2.0-rc2
* keras==2.3.0
*   scikit-learn==0.22.1
*   numpy==1.18.2
*   pandas==0.25.3
* seaborn==0.10.0
*   matplotlib==3.2.1

# Exploratory Data Analysis

In [0]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import sklearn
import tensorflow as tf


from sklearn import cluster
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import scale
from sklearn.preprocessing import LabelEncoder
from sklearn.cluster import AgglomerativeClustering
from sklearn.linear_model import LogisticRegression
from pandas.plotting import scatter_matrix
from tensorflow.keras.preprocessing.image import ImageDataGenerator



from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score



# load training & test datasets
train = pd.read_csv('../input/Kannada-MNIST/train.csv')
test = pd.read_csv('../input/Kannada-MNIST/test.csv')
train.columns

In [0]:
train.head()

In [0]:
test.head()

In [0]:
#Getting the shape of the pixel data
print( 'Shape of the data is :' , train.loc[2,'pixel0':].shape)

print('Max pixel value:' ,train.loc[2,'pixel0':].max())

print('Min pixel value:' ,train.loc[2,'pixel0':].min())

Exploring the training data set we have a label column which shows the label of the pixel data in the following columns.
There are 784 columns for pixel data (28*28) with values ranging from 0-255

### Visualizing Kannada digits

In [0]:
plt.figure()
plt.imshow(train.loc[2,'pixel0':].values.reshape((28,28)))
plt.colorbar()
plt.grid(False)

In [0]:
plt.figure(figsize=(10,10))
for i in range(25):
	plt.subplot(5,5,i+1)
	plt.xticks([])
	plt.yticks([])
	plt.grid(False)
	plt.imshow(train.loc[i,'pixel0':].values.reshape((28,28)) ,  cmap=plt.cm.binary)
	plt.xlabel(train.loc[i,'label'])

### Visualizing Embeddings with T-SNE 

t-SNE is a dimensionality reduction algorithm which is often used for visualization. It learns a mapping from a set of high-dimensional vectors, to a space with a smaller number of dimensions (usually 2), which is hopefully a good representation of the high-dimensional space.

In [0]:
from sklearn.manifold import TSNE

# Sample from the training set
sample_size = 8000

np.random.seed(2018)
idx = np.random.choice(60000, size=sample_size, replace=False)
train_sample = train.loc[idx,'pixel0':]
label_sample = train.loc[idx,'label']

train_sample
# Generate 2D embedding with TSNE
embeddings = TSNE(verbose=2).fit_transform(train_sample)

In [0]:

# Visualize TSNE embedding
vis_x = embeddings[:, 0]
vis_y = embeddings[:, 1]

plt.figure(figsize=(10,7))
plt.scatter(vis_x, vis_y, c=label_sample, cmap=plt.cm.get_cmap("jet", 10), marker='.')
plt.colorbar(ticks=range(10))
plt.clim(-0.5, 9.5)
plt.show()

In [0]:
y_train = train['label']

y_train

In [0]:
train['label'].value_counts()

In [0]:
g = sns.countplot(y_train)

y_train.value_counts()

**Classification Problem**

* For this problem, we decided to use Convolutional Neural Network (CNN) which are a class of deep neural networks that have several proven applications in analyzing visual imagery particularly in image classification. The name "Convolutional Neural Network" indicates that the network uses convolution which is a specialized kind of linear operation instead of general matrix multiplication in at least one of their layers.
* A neural network is clasified as Convolutional if it has more than one convolutional layer. A typical architecure includes convolutional layers, pooling layers, fully connected layers and normalization layers.
* The filters are used to extract the features of the image and the fully connected layer is responsible for classification according to extracted features.

In [0]:
import pandas as pd
import numpy as np
import seaborn as sns
%matplotlib inline

import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical

np.random.seed(2)

from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import confusion_matrix
import itertools

sns.set(style='white', context='notebook', palette='deep')

In [0]:
train

In [0]:
test

In [0]:
#Seperating all the features and target for training data
train_data = train.loc[:,'pixel0':]
train_label = train.loc[:,'label']
print(f"train_data shape :{train_data.shape}")
print(f"train_label shape :{train_label.shape}")

In [0]:
#Normalize
X = train_data.values / 255.0
X_test = test.loc[:,'pixel0':].values / 255.0
y = train_label

In [0]:
X_train,X_val,y_train,y_val = train_test_split(X,y,test_size = 0.1)

In [0]:
X_train.shape

In [0]:
#input reshape
input_shape = (-1,28,28,1)
X_train = X_train.reshape(input_shape)
X_val = X_val.reshape(input_shape)

In [0]:
X_test = X_test.reshape(-1,28,28,1)
# result = model.predict_classes(X_test)

In [0]:
#Now let us encode our labels
y_train = to_categorical(y_train)
y_val = to_categorical(y_val)

In [0]:
#Now we have categoricaly encoded our labels
print(y_train.shape)

**Building the Model**

We decided to follow an experimental approach to tackle this problem. we tried different filter sizes introduced in different layers of the network, several combinations of activations, optimizers and paddings.

In [0]:
def DL_Model(filter1_size=64, filter2_size=32 , activation='relu', optimizer='Adam', padding='Same'):
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Conv2D(filter1_size,(3,3),padding = padding,activation = activation,
                                  input_shape = (28,28,1)))
  model.add(tf.keras.layers.Conv2D(filter1_size,(3,3),padding = padding,activation=activation))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
  model.add(tf.keras.layers.Conv2D(filter2_size,(3,3),padding = padding,activation=activation))
  model.add(tf.keras.layers.Conv2D(filter2_size,(3,3),padding = padding,activation=activation))
  model.add(tf.keras.layers.Dropout(0.25))
  model.add(tf.keras.layers.MaxPool2D(pool_size=(2,2)))
  model.add(tf.keras.layers.Flatten())
  model.add(tf.keras.layers.Dense(128,activation=activation))
  model.add(tf.keras.layers.Dropout(0.25))
  model.add(tf.keras.layers.Dense(256,activation=activation))
  model.add(tf.keras.layers.Dropout(0.25))
  model.add(tf.keras.layers.Dense(10,activation='softmax'))
  model.compile(optimizer = optimizer , loss = "categorical_crossentropy", metrics=["accuracy"])
  return model

In [0]:
# activation = ['relu','softmax', 'tanh', 'sigmoid', 'linear']
# padding = ['Valid','Same']
# optimizer = ['Adam', 'SGD',  'Adamax','RMSprop']

filter1_size = [64]
filter2_size = [64]
activation = ['relu','tanh','sigmoid']
padding = ['Same']
optimizer = ['Adam','RMSprop']

GridSearch was used to find the model combination that gave the best accuracy in the training data. Each model was trained for 10 epochs using a batch size of 128. 
The outcome of the evaluation will present the best combination of paramters that gives the best score for that model. This is done by calculating the Mean Absolute Error (MAE) for each model after the epochs.

In [0]:
# param_grid = dict(filter1_size = filter1_size , filter2_size = filter2_size , activation = activation, padding = padding, optimizer = optimizer)

# clf = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn= DL_Model, epochs= 10, batch_size=128, verbose= 3)
# model = GridSearchCV(estimator= clf, param_grid=param_grid, n_jobs=-1, verbose=3)

# epochs = 10
# batch_size = 128
# model.fit(X_train,y_train,
#           validation_data=(X_val,y_val),
#           epochs=epochs,
#           batch_size=batch_size)

# print("Max Accuracy Registred: {} using {}".format(round(model.best_score_,3), 
#                                                    model.best_params_))

Using **relu** activation, filter size of **64** across all convolutional layers, an **Adam** optimizer and the **Same** padding gave the highest accuracy in the specified number of epochs which shows that this combination has potential to produce accurate predictions and should be advanced for further training.

In [0]:
model=DL_Model(64,64,'relu','Adam','Same')

In [0]:
epochs = 40
batch_size = 128
model.fit(X_train,y_train,
          validation_data=(X_val,y_val),
          epochs=epochs,
          batch_size=batch_size)

Running the training for more iterations did not improve the training accuracy and only caused it to oscillate between 0.998 and 0.9992. similarly the validation accuracy did not exceed 0.9978 at any point.

In [0]:
#lets just evaluate the model
model.evaluate(X_val,y_val)

# Data Augmentation and Reducing Learning rate


**Data Augmentation**
* On classification tasks on image datasets data augmentation is a common way to increase the generalization of the model.
* With the ImageDataGenerator on Keras, we can handle this objective easily.


 **Learning Rate reduction**

* Included a feature that reduces the learning rate as the training advances whenever validation accuracy decreases beyond a certain threshold by a factor. This callback monitors a quantity and if no improvement is seen for a 'patience' number of epochs, the learning rate is reduced.

In [0]:
model=DL_Model(64,64,'relu','Adam','Same')

In [0]:
early_stopping_monitor = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    min_delta=0,
    patience=6,
    verbose=0,
    mode='auto',
    baseline=None,
    restore_best_weights=True
)

In [0]:
#This function reduces the learning rate as the training advances whenever validation accuracy decrease.
learning_rate_reduction = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_accuracy', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

In [0]:
epochs = 30 
batch_size = 86

In [0]:
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images


datagen.fit(X_train)

In [0]:
X_train.shape

In [0]:
history = model.fit(datagen.flow(X_train,y_train, batch_size=batch_size),
                              epochs = epochs, validation_data = (X_val,y_val),
                              verbose = 2, steps_per_epoch=X_train.shape[0] // batch_size
                              , callbacks=[learning_rate_reduction])

In [0]:
model.evaluate(X_val,y_val)

In [0]:
%matplotlib inline
def PlotLoss(his, epoch):
    plt.style.use("ggplot")
    plt.figure()
    plt.plot(np.arange(0, epoch), his.history["loss"], label="train_loss")
    plt.plot(np.arange(0, epoch), his.history["val_loss"], label="val_loss")
    plt.title("Training Loss")
    plt.xlabel("Epoch #")
    plt.ylabel("Loss")
    plt.legend(loc="upper right")
    plt.show()

def PlotAcc(his, epoch):
    plt.style.use("ggplot")
    plt.figure()
    plt.plot(np.arange(0, epoch), his.history["accuracy"], label="train_acc")
    plt.plot(np.arange(0, epoch), his.history["val_accuracy"], label="val_accuracy")
    plt.title("Training Accuracy")
    plt.xlabel("Epoch #")
    plt.ylabel("Accuracy")
    plt.legend(loc="upper right")
    plt.show()

### Plotting Loss and Accuracy

A smooth improvement of the loss and accuracy functions over the number of epochs can be observed. The model to converges gradually and 
plateau near the end

In [0]:
PlotLoss(history, 30)
PlotAcc(history, 30)

In [0]:
from sklearn.metrics import confusion_matrix
y_val_predicted = model.predict_classes(X_val)
y_val_actual=np.argmax(y_val, axis=-1)
cm = confusion_matrix(y_val_actual, y_val_predicted)

In [0]:
f, ax = plt.subplots(figsize=(10,10))
sns.heatmap(cm,fmt=".0f", annot=True,linewidths=0.1, linecolor="purple", ax=ax)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

In [0]:
result = model.predict_classes(X_test)

In [0]:
result

In [0]:
sub_df = test[['id']]
sub_df['label'] = result

In [0]:
sub_df.to_csv('submission.csv',index=False)

# **Results of the final model**
* The accuracy of the model on the validation dataset was 99.80%
* It's performance on the unseen kaggle dataset gave an accuracy of 98.16%

The generalization of this model proved to be adequate and it did not overfit the traning dataset.

# Conclusion

Convolutional Neural Networks proved to be a suitable choice for this type of image classification task. Based on the experimentation using GridSearch and adjusting the hyperparameter like activation, filter size and optimizer combination I was able to get a high accuracy on the training data. Using data augmentation and learning rate reduction led to an increase in the validation accuracy from 99.78% to 99.80%.

However, I believe that the model's performance could be enhanced if it's possible to restore weights obtained during the epoch that gave the highest validation accuracy instead of calling the model after running all the epochs due to avoidance of overfitting the training data.