# AI-generated Movie Reviews

In this blog post, we will create a language model that will generate its own movie reviews.

This blog post is basically a continuation of my previous post and you should definitely read that if you want to better understand the methodology behind the process used in this task.

The dataset we'll be using is the [IMDb Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/), which contains 25,000 highly polarized movie reviews for training, and 25,000 for testing.

In [1]:
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()

[K     |████████████████████████████████| 727kB 4.4MB/s 
[K     |████████████████████████████████| 204kB 37.9MB/s 
[K     |████████████████████████████████| 51kB 5.7MB/s 
[K     |████████████████████████████████| 1.2MB 38.5MB/s 
[K     |████████████████████████████████| 61kB 5.9MB/s 
[K     |████████████████████████████████| 61kB 7.2MB/s 
[?25hMounted at /content/gdrive


In [2]:
from fastbook import *

Let's download the dataset.

In [3]:
from fastai.text.all import *
path = untar_data(URLs.IMDB)

In [4]:
Path.BASE_PATH = path
path.ls()

(#7) [Path('imdb.vocab'),Path('train'),Path('README'),Path('tmp_clas'),Path('test'),Path('tmp_lm'),Path('unsup')]

We'll grab the text files using `get_text_files`, which gets all the text files in a pth. We can optionally pass `folders` to restrict the search to a particular list of subfolders.

In [5]:
files = get_text_files(path, folders=['train', 'test', 'unsup'])

Here's a review we can look at.

In [6]:
txt = files[0].open().read()
txt

"Dressed to Kill (1980) is a mystery horror film from Brian De Palma and it really works.The atmosphere is right there.The atmosphere that makes you scared.And isn't that what a horror film is supposed to do.All the actors are in the right places.Michael Caine is perfect as Dr. Robert Elliott, the shrink with a little secret.Angie Dickinson as Kate Miller, the sexually frustrated mature woman is terrific.Keith Gordon as her son Peter is brilliant.Nancy Allen as Liz Blake the call girl is fantastic.Dennis Franz does his typical detective role.His Detective Marino is one of the most colorful in this movie.There are plenty of creepy scenes in this movie.The elevator scene is one of them.There have been made comparisons between this and Alfred Hitchcock's Psycho (1960).There are some similarities between these two movies.Both of these movies may cause some sleepless nights."

---
## Training a Text Classifier

### Language Model using DataBlock
Fastai handles tokenization and numericalization automatically when `TextBlock` is passed to `DataBlock`.  
Let's create a language model using `TextBlock`.

In [7]:
get_imdb = partial(get_text_files, folders=['train', 'test', 'unsup'])

dls_lm = DataBlock(
    blocks=TextBlock.from_folder(path, is_lm=True),
    get_items=get_imdb, splitter=RandomSplitter(0.1)
).dataloaders(path, path=path, bs=128, seq_len=72)

The `from_folder` tells `TextBlock` how to access the texts for the initial preprocessing.

We can look at a couple of examples in the model.

In [8]:
dls_lm.show_batch(max_n=2)

Unnamed: 0,text,text_
0,"xxbos xxmaj being that i am not a fan of xxmaj snoop xxmaj dogg , as an actor , that made me even more anxious to check out this flick . i remember he was interviewed on "" jay xxmaj leno , "" and said that he turned down a role in the big - budget xxmaj adam xxmaj sandler comedy "" the xxmaj longest xxmaj yard "" to be in this","xxmaj being that i am not a fan of xxmaj snoop xxmaj dogg , as an actor , that made me even more anxious to check out this flick . i remember he was interviewed on "" jay xxmaj leno , "" and said that he turned down a role in the big - budget xxmaj adam xxmaj sandler comedy "" the xxmaj longest xxmaj yard "" to be in this film"
1,"viewer , the first number in the series does provide an unexpected element of suspense in addition to capable costuming from xxmaj ha xxmaj nguyen , fine stunt performing , and a polished turn from xxmaj carr . xxmaj an unrated version is available that seemingly promises to provide additional footage of the ardent romantic actions shared by the mismatched lovers . xxbos xxmaj the xxmaj minion is about … well ,",", the first number in the series does provide an unexpected element of suspense in addition to capable costuming from xxmaj ha xxmaj nguyen , fine stunt performing , and a polished turn from xxmaj carr . xxmaj an unrated version is available that seemingly promises to provide additional footage of the ardent romantic actions shared by the mismatched lovers . xxbos xxmaj the xxmaj minion is about … well , a"


Now that our data is ready, we can fine-tune the pretrained language model.


---
## Fine-tuning the Language Model

To convert the integer word indices into activations that we can use for our neural network, we will use embeddings. We'll feed those embeddings into a *recurrent neural network* (RNN), using an architecture called *AWD-LSTM*.  
The embeddings in the pretrained model are merged with random embeddings added for words that weren't in the pretraining vocabulary. This is handled automatically inside `language_model_learner`.

In [9]:
learn = language_model_learner(
    dls_lm, AWD_LSTM, drop_mult=0.3,
    metrics=[accuracy, Perplexity()]
).to_fp16()

In [10]:
learn.fit_one_cycle(3, 2e-2)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,4.128321,4.070849,0.2848,58.606724,29:54
1,3.995339,3.938066,0.296213,51.319229,29:57
2,3.860701,3.867283,0.303124,47.812309,30:00


In [11]:
learn.unfreeze()
learn.fit_one_cycle(10, 2e-3)

epoch,train_loss,valid_loss,accuracy,perplexity,time
0,3.675387,3.74669,0.317715,42.380569,32:10
1,3.645742,3.704438,0.322705,40.627209,32:08
2,3.605402,3.664308,0.327991,39.029121,31:54
3,3.535574,3.633687,0.331826,37.852131,31:51
4,3.451682,3.618303,0.334019,37.274242,31:41
5,3.417034,3.603825,0.336183,36.738476,31:49
6,3.359589,3.594853,0.337721,36.410355,31:44
7,3.26618,3.59285,0.338945,36.337505,31:36
8,3.213485,3.597207,0.339176,36.496162,31:34
9,3.178523,3.602469,0.339008,36.688713,31:36


---
## Text Generation
Let's use our model to generate random reviews. Since it is trained to guess what the next word of the sentence is, we can use the model to write new reviews.

In [27]:
TEXT = 'I like this movie because'
N_WORDS = 70
N_SENTENCES = 5
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75)
for _ in range(N_SENTENCES)]

In [28]:
print('\n\n'.join(preds))

i like this movie because it has an amazing cast and the story is what made it so funny . Tom Skerrit is wonderful in this movie and Gena Rowlands , who i honestly wished she would have been better recognized for her work in Love , Caution , Never Been Kissed . It is also one of the great movies i have seen in

i like this movie because it 's just a wonderful movie . But the acting is really a bit more than it should be . It 's like someone made a movie for the day . So do n't be afraid to watch a movie that is so good that you 'll be laughing so hard you 'll start laughing and trying to keep me laughing . It 's very funny

i like this movie because it is so great . i could not help but laugh at the same things throughout the movie . It is so funny . i am so happy to have the movie been made again . i could n't wait to see what i would see . It is the best movie i have ever seen . This movie was bad , and i thought it

i like this movie because it shows a side of British Realism and ho