# Transfer Learning MNIST

* Train a simple convnet on the MNIST dataset the first 5 digits [0..4].
* Freeze convolutional layers and fine-tune dense layers for the classification of digits [5..9].

## 1. Import necessary libraries for the model

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
from keras.datasets import mnist

Using TensorFlow backend.


## 2. Import MNIST data and create 2 datasets with one dataset having digits from 0 to 4 and other from 5 to 9 

In [3]:
(trainx,trainy),(testx,testy) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [4]:
trainy

array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

In [0]:
trainx0_4 = trainx[trainy<5]
trainy0_4 = trainy[trainy<5]
testx0_4 = testx[testy<5]
testy0_4 = testy[testy<5]

In [6]:
print(trainx0_4.shape)
print(trainy0_4.shape)
print(testx0_4.shape)
print(testy0_4.shape)

(30596, 28, 28)
(30596,)
(5139, 28, 28)
(5139,)


In [0]:
trainx5_9 = trainx[trainy>=5]
trainy5_9 = trainy[trainy>=5] - 5
testx5_9 = testx[testy>=5]
testy5_9 = testy[testy>=5] - 5

In [43]:
print(trainx5_9.shape)
print(trainy5_9.shape)
print(testx5_9.shape)
print(testy5_9.shape)

(29404, 28, 28)
(29404,)
(4861, 28, 28)
(4861,)


## 3. Print x_train, y_train, x_test and y_test for both the datasets

In [9]:
print(trainx0_4.shape)
print(trainy0_4.shape)
print(testx0_4.shape)
print(testy0_4.shape)

(30596, 28, 28)
(30596,)
(5139, 28, 28)
(5139,)


In [10]:
print(trainx5_9.shape)
print(trainy5_9.shape)
print(testx5_9.shape)
print(testy5_9.shape)

(29404, 28, 28)
(29404,)
(4861, 28, 28)
(4861,)


## ** 4. Let us take only the dataset (x_train, y_train, x_test, y_test) for Integers 0 to 4 in MNIST **
## Reshape x_train and x_test to a 4 Dimensional array (channel = 1) to pass it into a Conv2D layer

In [0]:
trainx0_4 = trainx0_4.reshape(trainx0_4.shape[0],28,28,1)

In [0]:
testx0_4 = testx0_4.reshape(testx0_4.shape[0],28,28,1)

## 5. Normalize x_train and x_test by dividing it by 255

In [0]:
trainx0_4 = trainx0_4.astype('float32')
testx0_4 = testx0_4.astype('float32')

In [0]:
trainx0_4 = trainx0_4/255
testx0_4 = testx0_4/255

## 6. Use One-hot encoding to divide y_train and y_test into required no of output classes

In [0]:
import keras
numclass = 5
import keras.utils

In [0]:
trainy0_4 = keras.utils.to_categorical(trainy0_4,numclass)
testy0_4 = keras.utils.to_categorical(testy0_4,numclass)


## 7. Build a sequential model with 2 Convolutional layers with 32 kernels of size (3,3) followed by a Max pooling layer of size (2,2) followed by a drop out layer to be trained for classification of digits 0-4  

In [0]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.callbacks import ModelCheckpoint, EarlyStopping
input_shape = (28,28,1)

In [18]:
#Initialize the model
model = Sequential()

#Add a Convolutional Layer with 32 filters of size 3X3 and activation function as 'ReLU' 
model.add(Conv2D(32, kernel_size=(3,3),
                 activation='relu',
                 input_shape=input_shape))

#Add a Convolutional Layer with 64 filters of size 3X3 and activation function as 'ReLU' 
model.add(Conv2D(64, (3, 3), activation='relu'))

#Add a MaxPooling Layer of size 2X2 
model.add(MaxPooling2D(pool_size=(2, 2)))

#Apply Dropout with 0.25 probability 
model.add(Dropout(0.25))

W0617 05:59:04.838269 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0617 05:59:04.877989 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0617 05:59:04.886312 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0617 05:59:04.932292 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3976: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

W0617 05:59:04.937067 140641308530560 deprecation_wrapp

## 8. Post that flatten the data and add 2 Dense layers with 128 neurons and neurons = output classes with activation = 'relu' and 'softmax' respectively. Add dropout layer inbetween if necessary  

In [0]:
#Flatten the layer
model.add(Flatten())

#Add Fully Connected Layer with 128 units and activation function as 'ReLU'
model.add(Dense(128, activation='relu'))

#Apply Dropout with 0.5 probability 
model.add(Dropout(0.5))

#Add Fully Connected Layer with 10 units and activation function as 'softmax'
model.add(Dense(numclass, activation='softmax'))

In [20]:
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

W0617 05:59:05.062612 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0617 05:59:05.094696 140641308530560 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.



## 9. Print the training and test accuracy

In [21]:
model.fit(trainx0_4, trainy0_4,batch_size=128, nb_epoch=5,verbose=1,validation_data=(testx0_4, testy0_4))

  """Entry point for launching an IPython kernel.
W0617 05:59:05.251488 140641308530560 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 30596 samples, validate on 5139 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fe9567e52e8>

In [0]:
score = model.evaluate(testx0_4, testy0_4, verbose=0)

In [23]:
#Score
print(score[0])

0.061142996826346316


In [24]:
#Accuracy
print(score[1])

0.9805409613577024


## 10. Make only the dense layers to be trainable and convolutional layers to be non-trainable

In [25]:
model.layers

[<keras.layers.convolutional.Conv2D at 0x7fe95715d780>,
 <keras.layers.convolutional.Conv2D at 0x7fe95715d978>,
 <keras.layers.pooling.MaxPooling2D at 0x7fe957175550>,
 <keras.layers.core.Dropout at 0x7fe95715db38>,
 <keras.layers.core.Flatten at 0x7fe95715d4a8>,
 <keras.layers.core.Dense at 0x7fe95715d400>,
 <keras.layers.core.Dropout at 0x7fe95715d5f8>,
 <keras.layers.core.Dense at 0x7fe9568ddf60>]

In [26]:
for layers in model.layers:
  if('dense' not in layers.name):
    layers.trainable = False
  if('dense' in layers.name):
    print(layers.name)

dense_1
dense_2


## 11. Use the model trained on 0 to 4 digit classification and train it on the dataset which has digits 5 to 9  (Using Transfer learning keeping only the dense layers to be trainable)

In [0]:
trainx5_9 = trainx5_9.reshape(trainx5_9.shape[0],28,28,1)
testx5_9 = testx5_9.reshape(testx5_9.shape[0],28,28,1)

In [0]:
trainx5_9 = trainx5_9.astype('float32')
testx5_9 = testx5_9.astype('float32')

trainx5_9 = trainx5_9/255
testx5_9 = testx5_9/255

In [46]:
np.unique(trainy5_9)

array([0, 1, 2, 3, 4], dtype=uint8)

In [0]:
trainy5_9 = keras.utils.to_categorical(trainy5_9,numclass)
testy5_9 = keras.utils.to_categorical(testy5_9,numclass)

## 12. Print the accuracy for classification of digits 5 to 9

In [48]:
model.fit(trainx5_9, trainy5_9,batch_size=128, nb_epoch=10,verbose=1,validation_data=(testx5_9, testy5_9))

Train on 29404 samples, validate on 4861 samples
Epoch 1/10
  896/29404 [..............................] - ETA: 4s - loss: 1.8498 - acc: 0.4007

  """Entry point for launching an IPython kernel.
  'Discrepancy between trainable weights and collected trainable'


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fe97ea900f0>

In [49]:
model.evaluate(testx5_9, testy5_9)



[0.08406314167591236, 0.9720222177855691]

## Sentiment analysis <br> 

The objective of the second problem is to perform Sentiment analysis from the tweets data collected from the users targeted at various mobile devices.
Based on the tweet posted by a user (text), we will classify if the sentiment of the user targeted at a particular mobile device is positive or not.

### 13. Read the dataset (tweets.csv) and drop the NA's while reading the dataset

In [0]:
df_tweets = pd.read_excel('tweets.xls')

In [76]:
df_tweets.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


### 14. Preprocess the text and add the preprocessed text in a column with name `text` in the dataframe.

In [0]:
def preprocess(text):
    try:
        return text.decode('ascii')
    except Exception as e:
        return ""

In [0]:
#df_tweets['text'] = [preprocess(text) for text in df_tweets.tweet_text]

In [78]:
df_tweets.shape

(9093, 3)

### 15. Consider only rows having Positive emotion and Negative emotion and remove other rows from the dataframe.

In [0]:
df_tweets = df_tweets[(df_tweets['is_there_an_emotion_directed_at_a_brand_or_product'] == 'Positive emotion') | (df_tweets['is_there_an_emotion_directed_at_a_brand_or_product'] == 'Negative emotion')]

In [80]:
df_tweets.shape

(3548, 3)

### 16. Represent text as numerical data using `CountVectorizer` and get the document term frequency matrix

#### Use `vect` as the variable name for initialising CountVectorizer.

In [0]:
from sklearn.feature_extraction.text import CountVectorizer

In [0]:
cv = CountVectorizer()

In [83]:
cv.fit(df_tweets['tweet_text'])

CountVectorizer(analyzer='word', binary=False, decode_error='strict',
                dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
                lowercase=True, max_df=1.0, max_features=None, min_df=1,
                ngram_range=(1, 1), preprocessor=None, stop_words=None,
                strip_accents=None, token_pattern='(?u)\\b\\w\\w+\\b',
                tokenizer=None, vocabulary=None)

In [0]:
dtm = cv.transform(df_tweets['tweet_text'])

In [85]:
dtm.shape

(3548, 6020)

In [0]:
dtm1 = dtm.toarray()

In [87]:
dtm1.shape

(3548, 6020)

### 17. Find number of different words in vocabulary

In [89]:
print(dtm)

  (0, 91)	1
  (0, 252)	1
  (0, 463)	2
  (0, 1391)	1
  (0, 2461)	1
  (0, 2609)	1
  (0, 2831)	1
  (0, 2855)	1
  (0, 3547)	1
  (0, 3972)	1
  (0, 4433)	1
  (0, 4939)	1
  (0, 5096)	1
  (0, 5351)	1
  (0, 5484)	1
  (0, 5574)	1
  (0, 5728)	1
  (0, 5775)	1
  (1, 173)	1
  (1, 312)	1
  (1, 389)	1
  (1, 410)	1
  (1, 463)	1
  (1, 524)	1
  (1, 1455)	1
  :	:
  (3546, 318)	1
  (3546, 338)	2
  (3546, 389)	1
  (3546, 869)	1
  (3546, 871)	1
  (3546, 1958)	1
  (3546, 2090)	2
  (3546, 2450)	1
  (3546, 2671)	1
  (3546, 2821)	1
  (3546, 2831)	1
  (3546, 2855)	1
  (3546, 3444)	1
  (3546, 3521)	1
  (3546, 4494)	1
  (3546, 4915)	1
  (3546, 5033)	1
  (3546, 5096)	1
  (3546, 5110)	1
  (3546, 5594)	1
  (3546, 5625)	1
  (3547, 1833)	1
  (3547, 2821)	1
  (3547, 3121)	1
  (3547, 5096)	1


#### Tip: To see all available functions for an Object use dir

In [90]:
dir(cv)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_char_ngrams',
 '_char_wb_ngrams',
 '_check_stop_words_consistency',
 '_check_vocabulary',
 '_count_vocab',
 '_get_param_names',
 '_get_tags',
 '_limit_features',
 '_more_tags',
 '_sort_features',
 '_stop_words_id',
 '_validate_custom_analyzer',
 '_validate_params',
 '_validate_vocabulary',
 '_white_spaces',
 '_word_ngrams',
 'analyzer',
 'binary',
 'build_analyzer',
 'build_preprocessor',
 'build_tokenizer',
 'decode',
 'decode_error',
 'dtype',
 'encoding',
 'fit',
 'fit_transform',
 'fixed_vocabulary_',
 'get_feature_names',
 'get_params',
 'get_stop_words',
 'input',
 'inverse_transf

### 18. Find out how many Positive and Negative emotions are there.

Hint: Use value_counts on that column

In [91]:
pd.value_counts(df_tweets.is_there_an_emotion_directed_at_a_brand_or_product)

Positive emotion    2978
Negative emotion     570
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

### 19. Change the labels for Positive and Negative emotions as 1 and 0 respectively and store in a different column in the same dataframe named 'Label'

Hint: use map on that column and give labels

In [0]:
df_tweets['label'] = df_tweets.is_there_an_emotion_directed_at_a_brand_or_product.map({'Positive emotion':1, 'Negative emotion':0})

In [94]:
df_tweets.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,label
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion,0
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion,1
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion,1
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion,0
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion,1


### 20. Define the feature set (independent variable or X) to be `text` column and `labels` as target (or dependent variable)  and divide into train and test datasets

In [0]:
X = df_tweets['tweet_text']
Y = df_tweets['label']

In [0]:
from sklearn.model_selection import train_test_split

In [0]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=1)

In [111]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(2661,)
(887,)
(2661,)
(887,)


## 21. **Predicting the sentiment:**


### Use Naive Bayes and Logistic Regression and their accuracy scores for predicting the sentiment of the given text

In [0]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [0]:
vectorizer = CountVectorizer()

In [0]:
x_train_dtm = vectorizer.fit_transform(x_train)
x_test_dtm = vectorizer.transform(x_test)

In [115]:
print(x_train_dtm.shape)
print(x_test_dtm.shape)

(2661, 5200)
(887, 5200)


In [0]:
nb = MultinomialNB()

In [117]:
nb.fit(x_train_dtm,y_train)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [0]:
y_pred = nb.predict(x_test_dtm)

In [120]:
print(metrics.accuracy_score(y_test, y_pred))

0.8669673055242391


In [121]:
lr = LogisticRegression()
lr.fit(x_train_dtm,y_train)
y_pred_lr = lr.predict(x_test_dtm)



In [122]:
print(metrics.accuracy_score(y_test, y_pred_lr))

0.8624577226606539


## 22. Create a function called `tokenize_predict` which can take count vectorizer object as input and prints the accuracy for x (text) and y (labels)

In [0]:
def tokenize_test(vect):
    x_train_dtm = vect.fit_transform(x_train)
    print('Features: ', x_train_dtm.shape[1])
    x_test_dtm = vect.transform(x_test)
    nb = MultinomialNB()
    nb.fit(x_train_dtm, y_train)
    y_pred_class = nb.predict(x_test_dtm)
    print('Accuracy: ', metrics.accuracy_score(y_test, y_pred_class))

### Create a count vectorizer function which includes n_grams = 1,2  and pass it to tokenize_predict function to print the accuracy score

In [128]:
cvect = CountVectorizer(ngram_range=(1,2))
tokenize_test(cvect)

Features:  26625
Accuracy:  0.874859075535513


### Create a count vectorizer function with stopwords = 'english'  and pass it to tokenize_predict function to print the accuracy score

In [129]:
cvect = CountVectorizer(stop_words='english')
tokenize_test(cvect)

Features:  4959
Accuracy:  0.8658399098083427


### Create a count vectorizer function with stopwords = 'english' and max_features =300  and pass it to tokenize_predict function to print the accuracy score

In [130]:
cvect = CountVectorizer(stop_words='english',max_features=300)
tokenize_test(cvect)

Features:  300
Accuracy:  0.8083427282976324


### Create a count vectorizer function with n_grams = 1,2  and max_features = 15000  and pass it to tokenize_predict function to print the accuracy score

In [131]:
cvect = CountVectorizer(ngram_range=(1,2),max_features=15000)
tokenize_test(cvect)

Features:  15000
Accuracy:  0.8804960541149943


### Create a count vectorizer function with n_grams = 1,2  and include terms that appear at least 2 times (min_df = 2)  and pass it to tokenize_predict function to print the accuracy score

In [132]:
cvect = CountVectorizer(ngram_range=(1,2),min_df=2)
tokenize_test(cvect)

Features:  8674
Accuracy:  0.8680947012401353
