# ULMFiT
For more information I recommend this [excellent post](https://github.com/prrao87/tweet-stance-prediction/blob/master/ulmfit.ipynb).

UlmFit consists of 3 steps:
1. Training the language model on a general-domain corpus that captures high-level natural language features
1. Fine-tuning the pre-trained language model on target task data
1. Fine-tuning the classifier on target task data

![ulmfit 3 steps](https://miro.medium.com/max/1616/1*w_qNXVr7N2OPCK5iMnHAVQ.png)

Step 1 requires a lot of resources and was already performed by the creators of the FastAI package. For this task we will perform steps 2 and 3.

It puts together a few tricks for optimizing neural networks:
1. Detection of initial Learning Rate. This is done by changing the learning rate in a wide range during the first steps of a neural network training, and keeping the rate that leads to the fastest decrease of the network loss. 
1. Cyclical learning rate changes, that aim to improve convergence speed while reducing the risk of getting stuck in a local minimum
1. Transfer learning 
1. Integration of a large unlabeled dataset. This dataset is used to fine-tune the language model
1. A rather simple but effective network architecture for the classifier. This NN architecture of 3 fully connected layers is supposed to provide solid performance on a large array of text classification tasks

Using Ulmfit is rather simple

The Note that we provide the data to the language model 
`bs` is the batch size.  
We use the function `learner.lr_find()` to find the learning rate.  

The function `learner.fit_one_cycle(PARAMETERS)` 


In [1]:
from fastai import *
from fastai.text import *

In [2]:
def run_on_df(df_trn, df_val):
    df_trn = df_trn[['text', 'label']]
    df_val = df_val[['text', 'label']]

    df_trn['text'] = df_trn['text'].apply(lambda x: x.split())
    df_val['text'] = df_val['text'].apply(lambda x: x.split())

    # Language model data
    data_lm = TextLMDataBunch.from_df(
        train_df=df_trn, 
        valid_df=df_val, 
        path="", 
        label_cols='label', 
        text_cols='text')

    # Classifier model data
    data_clas = TextClasDataBunch.from_df(
        path="", 
        train_df=df_trn, 
        valid_df=df_val,
        label_cols='label', 
        text_cols='text', 
        vocab=data_lm.train_ds.vocab, 
        bs=32)

    learn = language_model_learner(
        data_lm, 
        drop_mult=0.7, 
        arch=AWD_LSTM)
    
    # train the learner object with learning rate = 1e-2
    learn.fit_one_cycle(1, 1e-2)
    # 
    learn.lr_find()
    # 
    learn.recorder.plot()
    # 
    learn.save_encoder('ft_enc')
    # 
    learn = text_classifier_learner(data_clas, drop_mult=0.7)
    # 
    learn.load_encoder('ft_enc')
    
    learn.lr_find()
    learn.recorder.plot()
    learn.fit_one_cycle(1, 1e-2)
    return learn

In [6]:
import pandas as pd
df_train = pd.read_csv('train.csv').set_index('index')
df_val = pd.read_csv('val.csv').set_index('index')
df_train.head()

Unnamed: 0_level_0,text,label
index,Unnamed: 1_level_1,Unnamed: 2_level_1
1073,some articles on the topic: rtw 12/23 0859 gul...,3
499,is kratz claiming that he can reliably visuall...,2
530,npr's morning edition aired a report this morn...,0
699,"i'm telling you , sam , three l's . call up mo...",0
1410,"why do you title this ""news you will miss"" ? t...",3


In [7]:
run_on_df(df_train, df_val)

epoch,train_loss,valid_loss,accuracy,time


KeyboardInterrupt: 