<a href="https://colab.research.google.com/github/rahiakela/applied-nlp-in-enterprise/blob/main/2-transformers-and-transfer-learning/01_transfer_learning_with_fastai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Transformers and Transfer Learning

One of the most important ideas to implement if you want to get deep learning working in the real world is transfer learning, which is the process of taking a model that has already been trained on another dataset and fine-tuning it to fit your new dataset. For example, if you're training a language model to generate compelling short stories in the style of Hemingway, you could fine-tune a model trained on a wide variety of books instead of training on just the text samples of Hemingway, of which there may not be many.

A nice analogy in object-oriented programming is the concept of inheritance in classes.

By training on the larger dataset, the model essentially inherits a large amount of extra knowledge, which it can use to perform better on the task you care about. From a practical standpoint, transfer learning helps you get better performing models faster since fine-tuning, if done correctly, is often computationally cheaper than training from scratch.

>Assuming that the original dataset you're transferring *from* is much larger than the dataset you're using for fine-tuning. If your fine-tuning dataset is larger, perhaps you should be applying transfer learning the other way around! But in practice, it's very hard to natural language text datasets that are of comparable size to the ones used for pretraining.

The other big advancement we'll discuss is the use of a new kind of model architecture called the transformer. Training transformers can be complicated and does not always work well without some fine-tuning. So, instead of traning it from scratch, we'll show you the pretraining technique on another architecure, and the use a popular pre-trained transformer to perform inference.

##fastai

We're going to fine-tune a language model and then transform it into a text classifier that categorizes text based on sentiment. We'll start with the simplest working implementation, and progressively train our network using the [ULMFit](https://arxiv.org/abs/1801.06146) technique.

The dataset we're going to use here is the IMDB movie review datset. It's not very fun, but it's simple and small, which is what we want when starting off.

`fastai` is more more than your standard deep learning library. It includes tools that help you solve the problem at hand end-to-end as fast as possible. 

##Setup

In [None]:
!pip install fastai==2.2.5

In [2]:
from fastai.text.all import *

One of those tools is a built-in set of common datasets that can be easily downloaded.

In [3]:
path = untar_data(URLs.IMDB)

This particular instance of the IMDB dataset is organized just like ImageNet is (i.e. one directory per class). So in this case, the positive reviews are saved under `pos` and the negative reviews are saved under `neg`.

We can set up set up our dataset and prepare for training by using the `TextDataLoaders.from_folder` method built into `fastai`. The only thing we need to specify is the name of the validation folder, which is "test" (and not the default "valid").

In [4]:
dls = TextDataLoaders.from_folder(path, valid="test")

Another useful method is `show_batch`, which lets us take a quick glimpse at our data to make sure everything looks OK.

In [5]:
dls.show_batch()

Unnamed: 0,text,category
0,"xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero",pos
1,"xxbos xxmaj i 've rented and watched this movie for the 1st time on xxup dvd without reading any reviews about it . xxmaj so , after 15 minutes of watching xxmaj i 've noticed that something is wrong with this movie ; it 's xxup terrible ! i mean , in the trailers it looked scary and serious ! \n\n i think that xxmaj eli xxmaj roth ( mr . xxmaj director ) thought that if all the characters in this film were stupid , the movie would be funny … ( so stupid , it 's funny … ? xxup wrong ! ) xxmaj he should watch and learn from better horror - comedies such xxunk xxmaj night "" , "" the xxmaj lost xxmaj boys "" and "" the xxmaj return xxmaj of the xxmaj living xxmaj dead "" ! xxmaj those are funny ! \n\n """,neg
2,"xxbos i felt duty bound to watch the 1983 xxmaj timothy xxmaj dalton / xxmaj zelah xxmaj clarke adaptation of "" jane xxmaj eyre , "" because xxmaj i 'd just written an article about the 2006 xxup bbc "" jane xxmaj eyre "" for xxunk . \n\n xxmaj so , i approached watching this the way xxmaj i 'd approach doing homework . \n\n i was irritated at first . xxmaj the lighting in this version is bad . xxmaj everyone / everything is washed out in a bright white xxunk light that , in some scenes , casts shadows on the wall behind the characters . \n\n xxmaj and the sound is poorly recorded . i felt like i was listening to a high school play . \n\n xxmaj and the pancake make - up is way too heavy . \n\n xxmaj and the sets do n't fully",pos
3,"xxbos xxmaj to be a xxmaj buster xxmaj keaton fan is to have your heart broken on a regular basis . xxmaj most of us first encounter xxmaj keaton in one of the brilliant feature films from his great period of independent production : ' the xxmaj general ' , ' the xxmaj navigator ' , ' sherlock xxmaj jnr ' . xxmaj we recognise him as the greatest figure in the entire history of film comedy , and we want to see more of his movies . xxmaj here the heartbreak begins . xxmaj after ' steamboat xxmaj bill xxmaj jnr ' , xxmaj keaton 's brother - in - law xxmaj joseph xxmaj xxunk pressured him into signing a contract that put xxmaj keaton under the control of xxup mgm . xxmaj keaton became just one more actor for hire , performing someone else 's scripts . xxmaj",neg
4,"xxbos i have n't liked many xxup tv shows post 1990 , but xxup that 70s xxup show is great . xxmaj never seeing it during it 's first run , thinking a gimmicky period piece , i was wrong ! i started watching in reruns and the more i watched , the more i liked ! xxmaj now , it is the only show xxunk xxunk that i watch regularly . \n\n xxmaj although xxup that 70s xxup show mimics some of the styles , attitudes , music , and tastes of the 70s , it does not mire itself in that decade by going overboard with the references and look of the 70s . xxmaj it contains so much funny , witty , biting dialogue that is delivered with confidence and certainty by its main cast that it overcomes any 70s clichés by just being humorous . xxmaj",pos
5,"xxbos xxmaj office work , especially in this era of computers , multi - functional copy machines , e - mail , voice mail , snail mail and ` temps , ' is territory ripe with satirical possibilities , a vein previously tapped in such films as ` clockwatchers ' and ` office xxmaj space , ' and very successfully . xxmaj this latest addition to the temp / humor pool , however , ` haiku xxmaj tunnel , ' directed by xxmaj josh xxmaj kornbluth and xxmaj jacob xxmaj kornbluth , fails to live up to it 's predecessors , and leaves the laughs somewhere outside the door , waiting for a chance to sneak in . xxmaj unfortunately for the audience , that chance never comes ; so what you get is a nice try , but as the man once said , no cigar . \n\n\t xxmaj",neg
6,"xxbos xxmaj this film reminds me of 42nd xxmaj street starring xxmaj bebe xxmaj daniels and xxmaj ruby xxmaj keeler . xxmaj when i watch this film a lot of it reminded me of 42nd xxmaj street , especially the character xxmaj eloise who 's a temperamental star and she ends up falling and breaks her ankle , like xxmaj bebe xxmaj daniels did in 42nd xxmaj street and another performer gets the part and become a star . xxmaj this film , like most race films , keeps people watching because of the great entertainment . xxmaj race films always showed xxmaj black xxmaj entertainment as it truly was that was popular in that time era . xxmaj the xxmaj dancing xxmaj styles , xxmaj the xxmaj music , xxmaj dressing xxmaj styles , xxmaj you 'll xxmaj love xxmaj it . xxmaj this movie could of been big",pos
7,"xxbos xxmaj an xxmaj american xxmaj in xxmaj paris is an integrated musical , meaning that the songs and dances blend perfectly with the story . xxmaj the film was inspired by the 1928 orchestral composition by xxmaj george xxmaj gershwin . \n\n xxmaj the story of the film is interspersed with show - stopping dance numbers choreographed by xxmaj gene xxmaj kelly and set to popular xxmaj gershwin tunes . xxmaj songs and music include "" i xxmaj got xxmaj rhythm , "" "" 's xxmaj wonderful , "" and "" our xxmaj love is xxmaj here to xxmaj stay "" . xxmaj it set a new standard for the subgenre known as the "" songbook "" musical with dozens of xxmaj gershwin tunes buried in the underscore . xxmaj the climax is "" the xxmaj american in xxmaj paris "" ballet , an 18 minute dance featuring xxmaj",pos
8,"xxbos xxmaj streisand fans only familiar with her work from the xxup funny xxup girl film onwards need to see this show to see what a brilliant performer xxmaj streisand xxup was - xxup before she achieved her goal of becoming a xxmaj movie xxmaj star . xxmaj there had never been a female singer quite like her ever before , and there never would be again ( sorry , xxmaj celine - only in your dreams ! ) , but never again would xxmaj streisand sing with the vibrancy , energy , and , above all , the xxup enthusiasm and xxup vulnerability with which she performs here - by the time she gets to that xxmaj central xxmaj park concert only 2 or 3 years later , she 'd been filming xxup funny xxup girl in xxmaj hollywood and her performing style has become less spontaneous and more",pos


We can see that the library automatically processed all the texts to split then in *tokens*, adding some special tokens like:

- `xxbos` to indicate the beginning of a text
- `xxmaj` to indicate the next word was capitalized

## fastai Learner for text classification

`fastai` uses an object called a `Learner` for doing pretty much everything. We can construct one for text classification in one line of code:

In [None]:
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)

Instead of the transformer model that we've been raving about (and will continue to dicuss) throughout a vast majority of the book, we're going to use the [AWD LSTM](https://arxiv.org/abs/1708.02182) architecture instead for now, since it's easier and faster to train.

There are a few other details: `drop_mult` is a parameter that controls the magnitude of all dropouts in that model, and we use `accuracy` to track down how well we are doing.

With the `Learner` defined, we can now fine-tune our pretrained model, using a method with an unsurprising name:

In [None]:
learn.fine_tune(4, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.458047,0.412425,0.81352,03:26


epoch,train_loss,valid_loss,accuracy,time
0,0.307921,0.29766,0.87752,07:01
1,0.247349,0.20226,0.92068,07:00
2,0.192098,0.191875,0.92668,07:01
3,0.146141,0.192024,0.92952,07:01


93% accuracy look good! But let's see how well it's actually doing...

In [None]:
learn.show_results()

Unnamed: 0,text,category,category_
0,"xxbos xxmaj there 's a sign on xxmaj the xxmaj lost xxmaj highway that says : \n\n * major xxup spoilers xxup ahead * \n\n ( but you already knew that , did n't you ? ) \n\n xxmaj since there 's a great deal of people that apparently did not get the point of this movie , xxmaj i 'd like to contribute my interpretation of why the plot makes perfect sense . xxmaj as others have pointed out , one single viewing of this movie is not sufficient . xxmaj if you have the xxup dvd of xxup md , you can "" cheat "" by looking at xxmaj david xxmaj lynch 's "" top 10 xxmaj hints to xxmaj unlocking xxup md "" ( but only upon second or third viewing , please . ) ;) \n\n xxmaj first of all , xxmaj mulholland xxmaj drive is",pos,pos
1,"xxbos i really wanted to be able to give this film a 10 . xxmaj i 've long thought it was my favorite of the four modern live - action xxmaj batman films to date ( and maybe it still will be -- i have yet to watch the xxmaj schumacher films again ) . xxmaj i 'm also starting to become concerned about whether xxmaj i 'm somehow subconsciously being contrarian . xxmaj you see , i always liked the xxmaj schumacher films . xxmaj as far as i can remember , they were either 9s or 10s to me . xxmaj but the conventional wisdom is that the two xxmaj tim xxmaj burton directed films are far superior . i had serious problems with the first xxmaj burton xxmaj batman this time around -- i ended up giving it a 7 - -and apologize as i might ,",pos,pos
2,"xxbos "" buffalo xxmaj bill , xxmaj hero of the xxmaj far xxmaj west "" director xxmaj mario xxmaj costa 's unsavory xxmaj spaghetti western "" the xxmaj beast "" with xxmaj klaus xxmaj kinski could only have been produced in xxmaj europe . xxmaj hollywood would never dared to have made a western about a sexual predator on the prowl as the protagonist of a movie . xxmaj never mind that xxmaj kinski is ideally suited to the role of ' crazy ' xxmaj johnny . xxmaj he plays an individual entirely without sympathy who is ironically dressed from head to toe in a white suit , pants , and hat . xxmaj this low - budget oater has nothing appetizing about it . xxmaj the typically breathtaking xxmaj spanish scenery around xxmaj almeria is nowhere in evidence . xxmaj instead , xxmaj costa and his director of photography",pos,neg
3,"xxbos xxmaj this is , per se , an above average film but why in the name of xxmaj bog was it made ? xxmaj it 's impossible to treat it as a thing unto itself because it is an almost shot - for - shot remake of an xxmaj alfred xxmaj hitchcock classic of 1960 . xxmaj you ca n't watch it without the 1960 film nudging into your consciousness . \n\n xxmaj what does the word "" credit "" mean ? xxmaj how can we credit xxmaj van xxmaj xxunk and his associates with anything except deciding to use different actors , slightly different sets , and color ? \n\n xxmaj anne xxmaj heche is attractive but lacks xxmaj janet xxmaj leigh 's stolid determination to become a respectable middle - class woman . xxmaj and xxmaj heche is younger than xxmaj leigh , who brought to her",neg,neg
4,"xxbos xxmaj this is one of those films where it is easy to see how some people would n't like it . xxmaj my wife has never seen it , and when i just rewatched it last night , i waited until after she went to bed . xxmaj she might have been amused by a couple small snippets , but i know she would have had enough within ten minutes . \n\n xxmaj head has nothing like a conventional story . xxmaj the film is firmly mired in the psychedelic era . xxmaj it could be seen as filmic surrealism in a nutshell , or as something of a postmodern acid trip through film genres . xxmaj if you 're not a big fan of those things -- psychedelia , surrealism , postmodernism and the "" acid trip aesthetic "" ( assuming there 's a difference between them )",pos,pos
5,"xxbos xxmaj clayton xxmaj moore made his last official appearance on screen as the xxmaj masked xxmaj man in director xxmaj lesley xxmaj selander 's epic adventure "" the xxmaj lone xxmaj ranger and the xxmaj lost xxmaj city of xxmaj gold , "" co - starring xxmaj jay xxmaj silverheels as his faithful xxmaj indian scout xxmaj tonto . xxmaj selander was an old hand at helming westerns during his 40 years in films and television with over a 100 westerns to his directorial credit . xxmaj this fast - paced horse opera embraced a revisionist perspective in its depiction of xxmaj native xxmaj americans that had been gradually gaining acceptance since 1950 in xxmaj hollywood oaters after director xxmaj delmar xxmaj daves blazed the trail with the xxmaj james xxmaj stewart western "" broken xxmaj arrow . "" xxmaj racial intolerance figures as the primary theme in the",pos,pos
6,"xxbos "" stripperella "" is an animated series about a girl named xxmaj erotica xxmaj jones ( voiced by xxmaj pamela xxmaj anderson ) who lives a double life as a stripper at a gentleman 's club known as "" the xxmaj tender xxmaj loins "" and as a sexy crime - fighter known as xxmaj stripperella , a.k.a . xxmaj agent 69 who works for a government organization . xxmaj as xxmaj stripperella , xxmaj erotica fights crime and the forces of evil such as a plastic surgeon who gives women breast implants that either explode or make them fat and xxmaj cheapo , a criminal who steals from 99 cent stores and makes his two henchmen share a gun . xxmaj the creator of the character and the series is xxmaj stan xxmaj lee of xxmaj marvel fame ( and creator of spider - man ) . \n\n",pos,pos
7,"xxbos xxmaj in a world in which debatable and misunderstood subjects can be listed endlessly , this powerful 1995 film takes on one at the top of that list ; moreover , it does it objectively and realistically , and with a sensibility and sensitivity that makes it a truly great film by anyone 's measuring stick . xxmaj and to add some irony to it all , even the subject matter of this film has been widely misunderstood , as it is wrongly perceived that this is a film about the pros and cons of the death penalty ; it is not . xxmaj at the heart of ` dead xxmaj man xxmaj walking , ' directed by xxmaj tim xxmaj robbins , is a subject that in reality is possibly the most misunderstood of all , and with good reason , because it just may be the hardest",pos,pos
8,"xxbos a space ship cruising through the galaxy encounters a mysterious cargo ship apparently adrift in space . xxmaj the crew investigates , hoping to lay claim to its cargo and acquire the ship . xxmaj however , once aboard the ominous vessel , their own ship mysteriously xxunk , leaving them to fend for themselves and battle none other then xxmaj count xxmaj dracula or xxmaj orloff as this creature calls himself . \n\n xxmaj not a bad start . i mean it follows any number of typical sci - fi / horror plots . xxmaj the genres have been around enough that even the most original story will inevitably invoke comparison to some other film . xxmaj but , when you start with a fairly typical horror convention , the legend of xxmaj dracula and vampires in general , and combine it with a fairly typical sci -",neg,neg


We can also run prediction on individual sentences one at a time:

In [None]:
learn.predict("That movie was wicked cool!")

('pos', tensor(1), tensor([0.2645, 0.7355]))

Our model predicts that the review is positive, as expected.

##ULMFiT for Transfer Learning

The pretrained model we used in the previous section is called a language model. It was trained to guess the next word on a set of Wikipedia articles after reading all the words before. We got great results by directly fine-tuning this language model to a movie review classifier, but with one extra step, we can do even better.

The Wikipedia English is slightly different from the IMDb English. So instead of jumping directly to the classifier, we could fine-tune our pretrained language model to the IMDb dataset and then use *that* as the base for our classifier instead of the Wikipedia language model.

But beyond that, another very important reason this is useful is because we often have more data for our than we have *labelled* data. Labelling is expensive and generally requires human time and effort, so it's not uncommon to have a large database of text record where only a small subset of them are used for say, document tagging. But with this fine-tuning approach, we can still use the unlabelled data to fine-tune the *language model* even before we train the 

At the risk of dragging on a flawed analogy, this is almost like getting access to years of previous SAT passages. None of them will show up on the test *exactly*, but practicing them will help get a sense of what the SAT is like.

This approach is called ULMFiT, introducted by Jeremy Howard and Sebastian Ruder in 2018. The process is summarized in below.

![ULMFit](https://github.com/rahiakela/applied-nlp-in-enterprise/blob/main/2-transformers-and-transfer-learning/images/ulmfit.png?raw=1)

Since we already have the pretrained Wikipedia language model, we can start with step 2 of the piple in [[ulmfit]] - fine-tuning the IMDB language model.