## Sentiment Classification using fastai

* fastai is a library built on top of PyTorch, which simplifies the task of training fast and accurate neural networks.
* fastai provides models such as text, vision, tabular, collab.

### `text` model
The text module of the fastai library contains all the necessary functions to define a Dataset suitable for the various NLP tasks
* `text.transform` contains all the scripts to preprocess your data, from raw text to token ids,
* `text.data contains` the definition of TextDataBunch, which the main class you'll need in NLP,
![](http://)* `text.learner contains` helper functions to quickly create a language model or an RNN classifier

In [1]:
import torch
import torch.nn as nn
import fastai
from fastai.text import *
import pandas as pd

import os
print(os.listdir('../input/'))

['IMDB Dataset.csv']


Dataset: Small sample of the IMDB dataset which contains 1,000 reviews of movies with labels (positive or negative).

In [2]:
path = untar_data(URLs.IMDB_SAMPLE)
path

PosixPath('/tmp/.fastai/data/imdb_sample')

In [3]:
df = pd.read_csv(path/'texts.csv')
df.head()

Unnamed: 0,label,text,is_valid
0,negative,Un-bleeping-believable! Meg Ryan doesn't even ...,False
1,positive,This is a extremely well-made film. The acting...,False
2,negative,Every once in a long while a movie will come a...,False
3,positive,Name just says it all. I watched this movie wi...,False
4,negative,This movie succeeds at being one of the most u...,False


## Preparing Data for modelling
* fastai has `TextDataBunch` class which takes care of loading, splitting the data intro test and train sets, preprocessing the data (Creating vocabulary, tokanizing etc.).
* `TextDataBunch` has two subclasses `TextLMDataBunch` (Language model data) and `TextClasDataBunch` (Text classifier data).

For the classifier, we also pass the vocabulary (mapping from ids to words) that we want to use: this is to ensure that `data_clas` will use the same dictionary as `data_lm`

In [4]:
# Language model data
data_lm = TextLMDataBunch.from_csv(path, 'texts.csv')
# Classifier model data
data_clas = TextClasDataBunch.from_csv(path, 'texts.csv', vocab=data_lm.train_ds.vocab, bs=32)

Save the data

In [5]:
data_lm.save('../data_lm_export.pkl')
data_clas.save('../data_clas_export.pkl')

In [6]:
data_lm = load_data(path, '../data_lm_export.pkl')
data_clas = load_data(path, '../data_clas_export.pkl', bs=16)

## Fine-tuning a language model
* fastai has AWD-LSTM architecture available.
* We can create a learner object that will directly create a model, download the pretrained weights and be ready for fine-tuning

In [7]:
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5, model_dir="../")
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,4.503142,3.99624,0.281744,06:29


### Evaluate the language model

In [8]:
learn.predict("This is a review about", n_words=10)

'This is a review about their training type , painful product difference and lack of'

Save the encoder to use it for classification.

In [9]:
learn.save_encoder('../ft_enc')

Use the 'data_clas' object we created earlier to build a classifier with our fine-tuned encoder.

In [10]:
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, model_dir="../")
learn.load_encoder('../ft_enc')

In [11]:
data_clas.show_batch()


text,target
"xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n \n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , steaming bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and gooey xxmaj raising xxmaj",negative
"xxbos xxmaj to review this movie , i without any doubt would have to quote that memorable scene in xxmaj tarantino 's "" xxmaj pulp xxmaj fiction "" ( xxunk ) when xxmaj jules and xxmaj vincent are talking about xxmaj mia xxmaj wallace and what she does for a living . xxmaj jules tells xxmaj vincent that the "" xxmaj only thing she did worthwhile was pilot "" .",negative
"xxbos xxmaj how viewers react to this new "" adaption "" of xxmaj shirley xxmaj jackson 's book , which was promoted as xxup not being a remake of the original xxunk movie ( true enough ) , will be based , i suspect , on the following : those who were big fans of either the book or original movie are not going to think much of this one",negative
"xxbos xxmaj the year 2005 saw no xxunk than 3 filmed productions of xxup h. xxup g. xxmaj wells ' great novel , "" xxmaj war of the xxmaj worlds "" . xxmaj this is perhaps the least well - known and very probably the best of them . xxmaj no other version of xxunk has ever attempted not only to present the story very much as xxmaj wells wrote",positive
"xxbos xxmaj well , what can i say . \n \n "" xxmaj what the xxmaj bleep do we xxmaj know "" has achieved the nearly impossible - leaving behind such masterpieces of the genre as "" xxmaj the xxmaj postman "" , "" xxmaj the xxmaj dungeon xxmaj master "" , "" xxmaj merlin "" , and so fourth , it will go down in history as the",negative


Fine-tune the model

In [12]:
learn.fit_one_cycle(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.645534,0.557595,0.756219,06:38


### Evaluate

In [13]:
learn.predict("This was a bad movie!")

(Category positive, tensor(1), tensor([0.2592, 0.7408]))