<h1>Table of Contents<span class="tocSkip"></span></h1>


# Introduction
<hr style="border:2px solid black"> </hr>


**What?** HuggingFace text generation



# Text generation
<hr style="border:2px solid black"> </hr>

    
- Now let’s see how to use a pipeline to generate some text.
- The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text.
- This is similar to the predictive text feature that is found on many phones.
- Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.



# Imports
<hr style="border:2px solid black"> </hr>

In [1]:
from transformers import pipeline

# Inference using a pre-trained model
<hr style="border:2px solid black"> </hr>

In [2]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [4]:
generator.__init__

<bound method TextGenerationPipeline.__init__ of <transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7f84a8e05f70>>

In [5]:
generator.__dict__

{'task': 'text-generation',
 'model': GPT2LMHeadModel(
   (transformer): GPT2Model(
     (wte): Embedding(50257, 768)
     (wpe): Embedding(1024, 768)
     (drop): Dropout(p=0.1, inplace=False)
     (h): ModuleList(
       (0): GPT2Block(
         (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
         (attn): GPT2Attention(
           (c_attn): Conv1D()
           (c_proj): Conv1D()
           (attn_dropout): Dropout(p=0.1, inplace=False)
           (resid_dropout): Dropout(p=0.1, inplace=False)
         )
         (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
         (mlp): GPT2MLP(
           (c_fc): Conv1D()
           (c_proj): Conv1D()
           (act): NewGELUActivation()
           (dropout): Dropout(p=0.1, inplace=False)
         )
       )
       (1): GPT2Block(
         (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
         (attn): GPT2Attention(
           (c_attn): Conv1D()
           (c_proj): Conv1D()
           (attn_dropo

In [6]:
generator("In this course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course, we will teach you how to code and create RESTful applications through a comprehensive API that spans all languages and formats.\n\nThe subject of the course isn't simple REST, it is even more so because it covers the fundamentals of"}]

In [7]:
generator("In this course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create and build apps that let you run in any programming language. You will learn how to run iOS applications using the open source language Go. We will discuss how to build apps from the ground up and'}]

In [8]:
generator("In this course about REST-API, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course about REST-API, we will teach you how to implement RESTful API for your web application. In this course it means not to know all of the basics like creating and deploying a website, building a landing page using data from Twitter'}]

# Tweaking the model
<hr style="border:2px solid black"> </hr>


- The previous examples used the default model for the task at hand, but you can also choose a particular model from the Hub to use in a pipeline for a specific task — say, text generation.
- Let’s try the `distilgpt2` model.
- Additionally, use the `num_return_sequences` and `max_length arguments` to generate two sentences of 15 words each.    
    


In [10]:
generator = pipeline("text-generation", model="distilgpt2")

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/353M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [11]:
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to set up a web application using simple commands:\n\n\n// In this tutorial we will show you'},
 {'generated_text': 'In this course, we will teach you how to identify the differences between different levels of the dopamine system, and how to identify the differences between the two'}]

In [12]:
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to implement a new system of user controls for your browser. The first step is to understand how to use'},
 {'generated_text': 'In this course, we will teach you how to implement both a functional and a functional framework and define a function that uses both a and a Haskell form'}]

# References
<hr style="border:2px solid black"> </hr>


- https://huggingface.co/course/chapter1/3?fw=pt
    


# Requirements
<hr style="border:2px solid black"> </hr>

In [13]:
%load_ext watermark
%watermark -v -iv -m

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Python implementation: CPython
Python version       : 3.9.7
IPython version      : 7.29.0

Compiler    : Clang 10.0.0 
OS          : Darwin
Release     : 21.4.0
Machine     : x86_64
Processor   : i386
CPU cores   : 12
Architecture: 64bit

autopep8: 1.6.0
json    : 2.0.9
numpy   : 1.22.2

