# Sentiment Classification Using DistilBERT 

We will use IMDB Movie Reviews Dataset

# What is `DistilBERT`

BERT is designed to pretrain deep bidirectional representations from
unlabeled text by jointly conditioning on both
left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer
to create state-of-the-art models for a wide
range of tasks, such as question answering and
language inference, without substantial taskspecific architecture modifications.

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of Bert’s performances as measured on the GLUE language understanding benchmark.

![alt text](https://miro.medium.com/max/2000/1*IFVX74cEe8U5D1GveL1uZA.png)

## Why `DistilBERT`



*   Accurate as much as Original BERT Model
*   60% faster 
*   40% less parameters
*   It can run on CPU







### Additional Reading

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

https://arxiv.org/abs/1910.01108

Video Lecture: BERT NLP Tutorial 1- Introduction | BERT Machine Learning | KGP Talkie

https://www.youtube.com/watch?v=h_U27jBNYI4

Ref BERT:  **Pre-training of Deep Bidirectional Transformers for
Language Understanding**

https://arxiv.org/abs/1810.04805

Understanding searches better than ever before:

https://www.blog.google/products/search/search-language-understanding-bert/

Good Resource to Read More About the BERT: 

http://jalammar.github.io/illustrated-bert/

Visual Guide to Using BERT:
 
http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/

---------------------------------------------------

# What is `ktrain`

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

ktrain uses tf.keras in TensorFlow instead of standalone Keras.) Inspired by the fastai library, with only a few lines of code, ktrain allows you to easily:


*   estimate an optimal learning rate for your model given your data using a learning rate finder
*   employ learning rate schedules such as the triangular learning rate policy, 1cycle policy, and SGDR to more effectively train your model
*   employ fast and easy-to-use pre-canned models for both text classification (e.g., NBSVM, fastText, GRU with pretrained word embeddings) and image classification (e.g., ResNet, Wide Residual Networks, Inception)
*   load and preprocess text and image data from a variety of formats

*   inspect data points that were misclassified to help improve your model
*   leverage a simple prediction API for saving and deploying both models and data-preprocessing steps to make predictions on new raw data






ktrain GitHub: https://github.com/amaiya/ktrain

# Notebook Setup

In [3]:
!pip install ktrain



In [4]:
pwd

'C:\\Users\\mnadd\\Falco'

In [5]:
!git clone https://github.com/laxmimerit/IMDB-Movie-Reviews-Large-Dataset-50k.git

fatal: destination path 'IMDB-Movie-Reviews-Large-Dataset-50k' already exists and is not an empty directory.


In [6]:
# /content/IMDB-Movie-Reviews-Large-Dataset-50k

In [7]:
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf

In [8]:
data_test = pd.read_excel('C:\\Users\\mnadd\\Falco\\IMDB-Movie-Reviews-Large-Dataset-50k\\test.xlsx', dtype= str)
data_train = pd.read_excel('C:\\Users\\mnadd\\Falco\\IMDB-Movie-Reviews-Large-Dataset-50k\\train.xlsx', dtype = str)

In [9]:
pwd

'C:\\Users\\mnadd\\Falco'

In [10]:
data_train.sample(5)

Unnamed: 0,Reviews,Sentiment
6503,"Tierney's an authentic tough guy, but this mov...",neg
21523,I created my own reality by walking out of the...,neg
1202,I have seen about a thousand horror films. (my...,neg
22919,Wow! Only a movie this ludicrously awful could...,neg
7634,i thought this movie was wonderfully plotted i...,pos


In [11]:
text.print_text_classifiers()

fasttext: a fastText-like model [http://arxiv.org/pdf/1607.01759.pdf]
logreg: logistic regression using a trainable Embedding layer
nbsvm: NBSVM model [http://www.aclweb.org/anthology/P12-2018]
bigru: Bidirectional GRU with pretrained fasttext word vectors [https://fasttext.cc/docs/en/crawl-vectors.html]
standard_gru: simple 2-layer GRU with randomly initialized embeddings
bert: Bidirectional Encoder Representations from Transformers (BERT) from keras_bert [https://arxiv.org/abs/1810.04805]
distilbert: distilled, smaller, and faster BERT from Hugging Face transformers [https://arxiv.org/abs/1910.01108]


In [12]:
(train, val, preproc) = text.texts_from_df(train_df=data_train, text_column='Reviews', label_columns='Sentiment',
                   val_df = data_test,
                   maxlen = 400,
                   preprocess_mode = 'distilbert')

['neg', 'pos']
   neg  pos
0  1.0  0.0
1  1.0  0.0
2  1.0  0.0
3  1.0  0.0
4  1.0  0.0
['neg', 'pos']
   neg  pos
0  0.0  1.0
1  0.0  1.0
2  1.0  0.0
3  0.0  1.0
4  1.0  0.0
preprocessing train...
language: en
train sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913


Is Multi-Label? False
preprocessing test...
language: en
test sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913


In [13]:
model = text.text_classifier(name = 'distilbert', train_data = train, preproc=preproc)

Is Multi-Label? False
maxlen is 400
done.


In [14]:
learner = ktrain.get_learner(model = model,
                             train_data = train,
                             val_data = val,
                             batch_size = 6)

In [15]:
learner.fit_onecycle(lr = 2e-5, epochs=2)



begin training using onecycle policy with max lr of 2e-05...
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x23079752be0>

In [16]:
predictor = ktrain.get_predictor(learner.model, preproc)

In [17]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google.colab'

In [None]:
predictor.save('/content/drive/My Drive/distilbert')

In [None]:
data = ['this movie was really bad. acting was also bad. I will not watch again',
        'the movie was really great. I will see it again', 'another great movie. must watch to everyone']

In [None]:
predictor.predict(data)

In [None]:
predictor.get_classes()

In [None]:
predictor.predict(data, return_proba=True)