# IMDB movie review text generation

Once you have fine-tuned your model you can test it interactively with this notebook.

In [9]:
from transformers import pipeline

#path_to_model = "/scratch/project_462000699/data/users/sabdulla/gpt-imdb-model/checkpoint-5000/"
path_to_model = "/scratch/project_462000699/data/users/mvsjober/gpt-imdb-model/checkpoint-65000/"
generator = pipeline("text-generation", model=path_to_model)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [10]:
def print_output(output):
    for item in output:
        text = item['generated_text']
        text = text.replace("<br />", "\n")
        print('-', text)
        print()

In [None]:
output = generator("This movie was")
print_output(output)

## Experiment with the generation strategy

You can play with the text generation if you wish. Text generation strategies are discussed here: https://huggingface.co/docs/transformers/generation_strategies

Note that we are here using the easy-to-use `TextGenerationPipeline` and its `generator()` function, but the link discusses the `model.generate()` method. The same parameters can be used, though, the pipeline just takes care of some of the pre- and post-processing.

In particular these parameters of the `generator()` function might be interesting:

- `max_new_tokens`: the maximum number of tokens to generate
- `num_beams`: activate Beam search by setting this > 1
- `do_sample`: activate multinomial sampling if set to True
- `num_return_sequences`: the number of candidate sentences to return (available only for beam search and sampling)

Here is a nice blog post explaining in more detail about the different generation strategies: https://huggingface.co/blog/how-to-generate

In [4]:
output = generator("This movie was awful because", num_return_sequences=1, max_new_tokens=100, do_sample=True)
print_output(output)

- This movie was awful because the script had so much potential as it did. The story is so good and it should go a long way for the cast. And because of that, the best acting and cinematography were done with a strong chemistry. The acting were excellent. The lead character is still fresh-faced and looks the whole world. The action is absolutely amazing.





Its not performing good, but i guess with more training we can achieve something really nice.

In [7]:
output = generator("interstellar is greatest movie of all time because", num_return_sequences=1, max_new_tokens=100, do_sample=True)
print_output(output)

- interstellar is greatest movie of all time because it's good not to waste your time with baddies. Don't get me wrong I love this film, the only redeeming characters is that they are real. All the dialogue is a little boring in my opinion. The characters are only an embarrassment in my opinion, but they are great. Not a "nice" person, not such a movie of a bad guy. It's good for your friends and kids.



## Compare with the original model without fine-tuning

We can also load the original `distilgpt2` model and see how it would have worked without fine-tuning.

In [5]:
generator_orig = pipeline("text-generation", model='distilgpt2')

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [6]:
output = generator_orig("This movie was awful because", num_return_sequences=1, max_new_tokens=100, do_sample=True)
print_output(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- This movie was awful because it did not have a great plot for this sequel.
At first it was pretty disappointing that I had to play a character in this second movie as I did not think that it really made sense. I thought that in my last film I loved it and that after the movie I was forced to give two more minutes of dialogue to another character's character because there was really no plot to make the movie great.
So what do you think about the screenplay for the sequel of the original? Do you



In [8]:
output = generator_orig("interstellar is greatest movie of all time because", num_return_sequences=1, max_new_tokens=100, do_sample=True)
print_output(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- interstellar is greatest movie of all time because of all its twists and turns. There are no exceptions, and in this case, an entire cast of actors have never really played it as a play it is, which has been a huge success in the past few years with a wide variety of character-types, storylines and characters. There are always moments, but no one really deserves to be told the story of that day.

On the other hand, I wouldn't be surprised if I saw a great adaptation for "The Shining" for

