# Training a language model on a standalone dataset with fastai
- This notebook ingests the fastai curated IMDB_SAMPLE dataset
- Trains a language model using pre-trained model AWD_LSTM as a starting point and fine-tuning it with the IMDB movie reviews


In [1]:
#hide
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

Mounted at /content/gdrive


In [2]:
#hide
from fastbook import *
from fastai.text.all import *
import pickle 

In [3]:
modifier = 'ga_apr10'

# Ingest the dataset
- define the source of the dataset
- create a dataframe for the training dataset

In [4]:
path = untar_data(URLs.IMDB_SAMPLE)
path.ls()

(#1) [Path('/root/.fastai/data/imdb_sample/texts.csv')]

In [5]:
% pwd

'/content'

In [6]:
path

Path('/root/.fastai/data/imdb_sample')

In [None]:
%%time
'''
# create dataloaders object
path = URLs.path('/content')
path.ls()
'''

In [7]:
! pwd

/content


In [8]:
# read the training CSV into a dataframe - note that the encoding parameter is needed to avoid a decode error
#df_train = pd.read_csv(path/'train/Corona_NLP_train.csv',encoding = "ISO-8859-1")
df_train = pd.read_csv(path/'texts.csv')

In [9]:
df_train.head()

Unnamed: 0,label,text,is_valid
0,negative,"Un-bleeping-believable! Meg Ryan doesn't even look her usual pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff!",False
1,positive,"This is a extremely well-made film. The acting, script and camera-work are all first-rate. The music is good, too, though it is mostly early in the film, when things are still relatively cheery. There are no really superstars in the cast, though several faces will be familiar. The entire cast does an excellent job with the script.<br /><br />But it is hard to watch, because there is no good end to a situation like the one presented. It is now fashionable to blame the British for setting Hindus and Muslims against each other, and then cruelly separating them into two countries. There is som...",False
2,negative,"Every once in a long while a movie will come along that will be so awful that I feel compelled to warn people. If I labor all my days and I can save but one soul from watching this movie, how great will be my joy.<br /><br />Where to begin my discussion of pain. For starters, there was a musical montage every five minutes. There was no character development. Every character was a stereotype. We had swearing guy, fat guy who eats donuts, goofy foreign guy, etc. The script felt as if it were being written as the movie was being shot. The production value was so incredibly low that it felt li...",False
3,positive,"Name just says it all. I watched this movie with my dad when it came out and having served in Korea he had great admiration for the man. The disappointing thing about this film is that it only concentrate on a short period of the man's life - interestingly enough the man's entire life would have made such an epic bio-pic that it is staggering to imagine the cost for production.<br /><br />Some posters elude to the flawed characteristics about the man, which are cheap shots. The theme of the movie ""Duty, Honor, Country"" are not just mere words blathered from the lips of a high-brassed offic...",False
4,negative,"This movie succeeds at being one of the most unique movies you've seen. However this comes from the fact that you can't make heads or tails of this mess. It almost seems as a series of challenges set up to determine whether or not you are willing to walk out of the movie and give up the money you just paid. If you don't want to feel slighted you'll sit through this horrible film and develop a real sense of pity for the actors involved, they've all seen better days, but then you realize they actually got paid quite a bit of money to do this and you'll lose pity for them just like you've alr...",False


# Create language model

In [10]:
df_train.shape

(1000, 3)

In [11]:
%%time
# create TextDataLoaders object
dls = TextDataLoaders.from_df(df_train, path=path, 
                              text_col='text',
                              is_lm=True)
dls.show_batch(max_n=3)

Unnamed: 0,text,text_
0,"xxbos xxmaj this movie had good intentions and a good story to work with . xxmaj the director and screenwriter of this movie failed miserably and created a dull , boring xxunk that made me feel like i was back in xxmaj mr . xxmaj xxunk 's 8th grade xxmaj social xxmaj studies class -- way back in xxunk . \n\n xxmaj what a waste , will somebody please take this story","xxmaj this movie had good intentions and a good story to work with . xxmaj the director and screenwriter of this movie failed miserably and created a dull , boring xxunk that made me feel like i was back in xxmaj mr . xxmaj xxunk 's 8th grade xxmaj social xxmaj studies class -- way back in xxunk . \n\n xxmaj what a waste , will somebody please take this story and"
1,"showed up in the xxmaj french town where xxmaj jimmy , now fully xxunk from his wounds , was xxunk at things got very xxunk for both him and xxmaj rose who had already accepted xxmaj jimmy 's xxunk of marriage to her ! \n\n xxmaj with xxup wwi over and xxmaj jimmy marrying xxmaj rose left xxmaj fred , who 's still in love with her , a bitter and xxunk","up in the xxmaj french town where xxmaj jimmy , now fully xxunk from his wounds , was xxunk at things got very xxunk for both him and xxmaj rose who had already accepted xxmaj jimmy 's xxunk of marriage to her ! \n\n xxmaj with xxup wwi over and xxmaj jimmy marrying xxmaj rose left xxmaj fred , who 's still in love with her , a bitter and xxunk young"
2,! and was especially shattered not to know what happens to xxmaj jason ! ! i think they should make another one … . it i also think its silly that u have to xxunk ten lines to post a comment .. it makes your comment drag on .. and no one will read it ! ! i really want to know what would have happened between jason and xxunk … maybe,and was especially shattered not to know what happens to xxmaj jason ! ! i think they should make another one … . it i also think its silly that u have to xxunk ten lines to post a comment .. it makes your comment drag on .. and no one will read it ! ! i really want to know what would have happened between jason and xxunk … maybe they


CPU times: user 4.1 s, sys: 1.81 s, total: 5.91 s
Wall time: 15.5 s


In [12]:
%%time
# define and train model
learn = language_model_learner(dls,AWD_LSTM,
                               metrics=accuracy).to_fp16()
learn.fine_tune(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,4.483501,4.059144,0.272595,00:12


epoch,train_loss,valid_loss,accuracy,time
0,4.106837,3.93948,0.279619,00:12


CPU times: user 27.3 s, sys: 821 ms, total: 28.1 s
Wall time: 30.9 s


# Exercise and save language model
- try out the language model with a few examples
- save the language model and the encoder

In [13]:
# get prediction
learn.predict("what comes next", n_words=20)

'what comes next will been great ? Its reasons . The matter . Acting , not shot . \n\n'

In [14]:
!pwd


/content


In [17]:
! ls

MyDrive


In [18]:
% cd /content/gdrive/MyDrive/ga_nlp_test/

/content/gdrive/MyDrive


In [19]:
learn.export('/content/gdrive/MyDrive/ga_nlp_test/'+modifier)

In [21]:
keep_path = learn.path

In [22]:
# workaround to make path writeable
learn.path = Path('/content/gdrive/MyDrive/ga_nlp_test')

In [23]:
learn.path

Path('/content/gdrive/MyDrive/ga_nlp_test')

In [24]:
learn.model_dir

'models'

In [25]:
learn.save('lm_text'+modifier)

Path('/content/gdrive/MyDrive/ga_nlp_test/models/lm_standalonega_apr10.pth')

In [26]:
# workaround to save encoder - need to do this to later load encoder for classifier
learn.save_encoder('ft_standalone'+modifier)

In [None]:
learn.path = keep_path