<a href="https://colab.research.google.com/github/simulate111/Introduction-to-Human-Language-Technology/blob/main/Exercise%20task%2014.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Text generation example

This is a brief example of how to run text generation with a causal language model and `pipeline`.

Install [transformers](https://huggingface.co/docs/transformers/index) python package. This will be used to load the model and tokenizer and to run generation.

In [1]:
!pip install --quiet transformers

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

Load a generative model and its tokenizer. You can substitute any other generative model name here (e.g. [other TurkuNLP GPT-3 models](https://huggingface.co/models?sort=downloads&search=turkunlp%2Fgpt3)), but note that Colab may have issues running larger models.

In [3]:
MODEL_NAME = 'TurkuNLP/gpt3-finnish-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Instantiate a text generation pipeline using the tokenizer and model.

In [4]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

We can now call the pipeline with a text prompt; it will take care of tokenizing, encoding, generation, and decoding:

In [5]:
output = pipe('Terve, miten menee?', max_new_tokens=25)

print(output)

[{'generated_text': 'Terve, miten menee?”\n”Hyvin, kiitos.”\n”Kiva kuulla.”\n”Kuule, minulla on sinulle asiaa.”\n'}]


Just print the text

In [6]:
print(output[0]['generated_text'])

Terve, miten menee?”
”Hyvin, kiitos.”
”Kiva kuulla.”
”Kuule, minulla on sinulle asiaa.”



We can also call the pipeline with any arguments that the model `generate` function supports. For details on text generation using `transformers`, see e.g. [this tutorial](https://huggingface.co/blog/how-to-generate).

Example with sampling and a high `temperature` parameter to generate more chaotic output:

In [7]:
output = pipe(
    'Terve, miten menee?',
    do_sample=True,
    temperature=10.0,
    max_new_tokens=25
)

print(output[0]['generated_text'])

Terve, miten menee? Entä mitäs jos nyt kerrankin puhumattomuuden kierre katkea ja menittäisiin reilusti yli?
Meillä alkoi nyt uusi vuosi; tänään minä aloitin taas


Text classification

In [8]:
classifier = pipeline("text-classification", model="ivanlau/language-detection-fine-tuned-on-xlm-roberta-base")

In [9]:
texts = ["This movie have just released.",
         "Yesterday, I had the travel to other city.",
         "Kuule, minulla on sinulle asiaa.",
         'Terve, miten menee?',
         'Wie heiBen sie?']

for text in texts:
    result = classifier(text)
    print(f"Text: {text}")
    print("Label:", result[0]['label'])

Text: This movie have just released.
Label: English
Text: Yesterday, I had the travel to other city.
Label: English
Text: Kuule, minulla on sinulle asiaa.
Label: Estonian
Text: Terve, miten menee?
Label: Estonian
Text: Wie heiBen sie?
Label: German


Summarization

In [10]:
summarizer = pipeline("summarization", model="philschmid/bart-base-samsum", max_length=50)

In [11]:
texts2 = ['A sleet-strewn Senate square is pretty quiet, when two American tourists arrive pushing a stroller. Soon a bus unloads a group of Japanese tourists at perhaps the most recognisable Finnish tourist spot. April is the most difficult month for tourism businesses in Finland, and in Helsinki the travel season is yet to start. Expectations, however, are not high this year. Visitor numbers have collapsed by about a fifth since 2019. Travellers from Russia and Asia are thin on the ground now. The number of visitors from Russia, China and Japan is down by more than a million from 2019.',
          'GPS disruptions have prevented Finnair planes travelling from Helsinki to Tartu, Estonia, from landing for the past two nights. Instead of landing, the planes returned to Helsinki Airport. According to Finnair, landing at Tartu Airport requires the use of GPS signals. Finnair spokesperson Päivyt Tallqvist said that both flights departing around midnight on Thursday and Friday had to return to Helsinki Airport. GPS interference is a relatively common phenomenon, and it doesn\'t usually warrant flights turning back, according to Tallqvist. Airports generally use multiple systems for approach and don\'t necessarily rely on GPS signals.GPS disruptions have prevented Finnair planes travelling from Helsinki to Tartu, Estonia, from landing for the past two nights. Instead of landing, the planes returned to Helsinki Airport. According to Finnair, landing at Tartu Airport requires the use of GPS signals. Finnair spokesperson Päivyt Tallqvist said that both flights departing around midnight on Thursday and Friday had to return to Helsinki Airport. GPS interference is a relatively common phenomenon, and it doesn\'t usually warrant flights turning back, according to Tallqvist. Airports generally use multiple systems for approach and don\'t necessarily rely on GPS signals.',
          'Police on Saturday said they had arrested Finns Party MP Timo Vornanen over a nightclub shooting incident that occurred in the early hours of Friday in central Helsinki. While the police did not name Vornanen in their statement, the details they provided align with the previously known details of the case. According to the police, the suspected shooting incident began around 4am on Friday at the Ihku nightclub in downtown Helsinki. Vornanen and another group of people got into an altercation which culminated in Vornanen, once outside the club, producing a small-caliber firearm and firing a shot into the ground.']

for text2 in texts2:
    result = summarizer(text2)
    print(f"\n\nText: {text2}")
    print("\nSummary:", result[0]['summary_text'])



Text: A sleet-strewn Senate square is pretty quiet, when two American tourists arrive pushing a stroller. Soon a bus unloads a group of Japanese tourists at perhaps the most recognisable Finnish tourist spot. April is the most difficult month for tourism businesses in Finland, and in Helsinki the travel season is yet to start. Expectations, however, are not high this year. Visitor numbers have collapsed by about a fifth since 2019. Travellers from Russia and Asia are thin on the ground now. The number of visitors from Russia, China and Japan is down by more than a million from 2019.

Summary: A sleet-strewn Senate square is pretty quiet, when two American tourists arrive pushing a stroller. Soon a bus unloads a group of Japanese tourists at perhaps the most recognisable Finnish tourist spot. April is the most difficult


Text: GPS disruptions have prevented Finnair planes travelling from Helsinki to Tartu, Estonia, from landing for the past two nights. Instead of landing, the planes 