# Sentiment Classification Using DistilBERT 

# What is `DistilBERT`

BERT is designed to pretrain deep bidirectional representations from
unlabeled text by jointly conditioning on both
left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer
to create state-of-the-art models for a wide
range of tasks, such as question answering and
language inference, without substantial taskspecific architecture modifications.

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of Bert’s performances as measured on the GLUE language understanding benchmark.

# What is `ktrain`

ktrain is a library to help build, train, debug, and deploy neural networks in the deep learning software framework, Keras.

# Notebook Setup

In [None]:
!pip install ktrain

Collecting ktrain
  Downloading ktrain-0.18.4.tar.gz (25.2 MB)
[K     |████████████████████████████████| 25.2 MB 3.2 MB/s 
[?25hCollecting tensorflow==2.1.0
  Downloading tensorflow-2.1.0-cp37-cp37m-manylinux2010_x86_64.whl (421.8 MB)
[K     |████████████████████████████████| 421.8 MB 22 kB/s 
Collecting scikit-learn==0.21.3
  Downloading scikit_learn-0.21.3-cp37-cp37m-manylinux1_x86_64.whl (6.7 MB)
[K     |████████████████████████████████| 6.7 MB 40.2 MB/s 
Collecting keras_bert>=0.81.0
  Downloading keras-bert-0.85.0.tar.gz (26 kB)
Collecting langdetect
  Downloading langdetect-1.0.8.tar.gz (981 kB)
[K     |████████████████████████████████| 981 kB 7.3 MB/s 
Collecting cchardet==2.1.5
  Downloading cchardet-2.1.5-cp37-cp37m-manylinux1_x86_64.whl (241 kB)
[K     |████████████████████████████████| 241 kB 39.9 MB/s 
Collecting seqeval
  Downloading seqeval-0.0.12.tar.gz (21 kB)
Collecting syntok
  Downloading syntok-1.3.1.tar.gz (23 kB)
Collecting whoosh
  Dow

In [None]:
!git clone https://github.com/sarthak-sriw/IMDB-Movie-Reviews-Large-Dataset-50k.git

Cloning into 'IMDB-Movie-Reviews-Large-Dataset-50k'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 10 (delta 1), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (10/10), done.


In [None]:
# /content/IMDB-Movie-Reviews-Large-Dataset-50k

In [None]:
import pandas as pd
import numpy as np
import ktrain
from ktrain import text
import tensorflow as tf



In [None]:
data_test = pd.read_excel('./IMDB-Movie-Reviews-Large-Dataset-50k/train.xlsx', dtype= str)
data_train = pd.read_excel('./IMDB-Movie-Reviews-Large-Dataset-50k/test.xlsx', dtype = str)

In [None]:
data_train.sample(5)

Unnamed: 0,Reviews,Sentiment
18490,"This movie is bizarre. Better put, it's ""freak...",neg
20313,Hobgoblins... what a concept. Rick Sloan was a...,neg
7200,"Wonderful film, one of the best horror films o...",pos
10818,"I may be getting ahead of myself here, but alt...",pos
12539,This film is wonderful in every way that moder...,pos


In [None]:
text.print_text_classifiers()

fasttext: a fastText-like model [http://arxiv.org/pdf/1607.01759.pdf]
logreg: logistic regression using a trainable Embedding layer
nbsvm: NBSVM model [http://www.aclweb.org/anthology/P12-2018]
bigru: Bidirectional GRU with pretrained fasttext word vectors [https://fasttext.cc/docs/en/crawl-vectors.html]
standard_gru: simple 2-layer GRU with randomly initialized embeddings
bert: Bidirectional Encoder Representations from Transformers (BERT) [https://arxiv.org/abs/1810.04805]
distilbert: distilled, smaller, and faster BERT from Hugging Face [https://arxiv.org/abs/1910.01108]


In [None]:
(train, val, preproc) = text.texts_from_df(train_df=data_train, text_column='Reviews', label_columns='Sentiment',
                   val_df = data_test,
                   maxlen = 400,
                   preprocess_mode = 'distilbert')

preprocessing train...
language: en
train sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




Is Multi-Label? False
preprocessing test...
language: en
test sequence lengths:
	mean : 234
	95percentile : 598
	99percentile : 913


In [None]:
model = text.text_classifier(name = 'distilbert', train_data = train, preproc=preproc)

Is Multi-Label? False
maxlen is 400


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=442.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=363423424.0, style=ProgressStyle(descri…


done.


In [None]:
learner = ktrain.get_learner(model = model,
                             train_data = train,
                             val_data = val,
                             batch_size = 6)

In [None]:
learner.fit_onecycle(lr = 2e-5, epochs=2)



begin training using onecycle policy with max lr of 2e-05...
Train for 4167 steps, validate for 782 steps
Epoch 1/2
Epoch 2/2


<tensorflow.python.keras.callbacks.History at 0x7fc8832f00d0>

In [None]:
predictor = ktrain.get_predictor(learner.model, preproc)

In [None]:
predictor.save('./')

In [None]:
data = ['this movie was really bad. acting was also bad. I will not watch again',
        'the movie was really great. I will see it again', 'another great movie. must watch to everyone']

In [None]:
predictor.predict(data)

['neg', 'pos', 'pos']

In [None]:
predictor.get_classes()

['neg', 'pos']

In [None]:
predictor.predict(data, return_proba=True)

array([[0.994142  , 0.00585801],
       [0.00469813, 0.99530184],
       [0.00349588, 0.9965042 ]], dtype=float32)