# Text Generation

Text Generation can be think of as Time Series Prediction, **why?** because in both we try to predict the future (next words) based on the past data (previous words). For example:
- predict whether a stock price will go up/down.
- predict sales demand to prepare inventory.
- covid counts (e.g hospital admissions per day).



## Autoregressive Time Series Models

**Time Series Analysis** is predicting the future using past values.

* **Autoregressive** predict next value in the time series using past values.
* **ARIMA** is a classic example of **Linear Autoregressive model**.

## Autoregressive Language Models
* **Language** is simply a **time series** of categorical objects. Both are **sequential** and same kind of approach can be applied on both.

* An **Autoregressive Language Model** is one where we find the distribution of the next word given past words.

## History of Language Models
* Markov assumption: x(t + 1) depends only on x(t), x(t) depends only on x(t - 1), ...
* One of the earliest and simplest approaches to language modeling is based on the Markov assumption, which states that the probability of a word or token depends only on a fixed number of previous words or tokens (n-grams).

* Markov models are convenient and easy to implement, but they have a very strong assumption that limits their ability to capture the long-range and global dependencies of natural language. They are not good at generating long, coherent, and meaningful strings of text, as they tend to repeat themselves or produce nonsensical sentences.

* Markov models are also limited by the data sparsity problem, which means that they cannot estimate the probabilities of rare or unseen n-grams, and they require a large amount of data to cover all possible combinations of words or tokens.

## Autoregressive Language Models - Usecases

- **Natural language generation:** they can generate natural and human-like text for writing stories, poems, essays, songs, jokes, and more. For example, **GPT-3** is a powerful autoregressive language model that can generate text on any topic given a prompt.

- **Text summarization:** they can produce concise and informative summaries of long texts, such as news articles, research papers, books, and more. For example, **T5** is a large language model that can perform text summarization and other natural language processing tasks using a text-to-text framework.

- **Machine translation:** they can translate text from one language to another, while preserving the meaning and style of the original text. For example, **Transformer** is an attention-based neural network that can perform machine translation and other natural language processing tasks using an encoder-decoder architecture.

- **Text completion:** they can predict the next word in a sequence of words, based on the previous words. This can be used for tasks such as autocomplete, spell check, code completion, and more. For example, **BERT** is a bidirectional language model that can perform text completion and other natural language processing tasks using a masked language modeling.

- **Text analysis:** they can analyze text and extract useful information, such as sentiment analysis, topic modeling, keyword extraction, named entity recognition, relation extraction, etc. For example, **RoBERTa** is an optimized version of BERT that can perform text analysis and other natural language processing tasks using a large amount of data and computational resources.

In [1]:
# Run this code first
!pip install transformers -qq

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m51.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.7/311.7 kB[0m [31m36.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m76.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m73.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [6]:
from transformers import pipeline

In [17]:
generate = pipeline(
    task = "text-generation",
    model = "gpt2"
)

prompt = "Neural Networks with attention have been used with great success"

result = generate(
    prompt,
    num_return_sequences = 3,
    max_length = 30
)
print(result)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Neural Networks with attention have been used with great success in Parkinson's disease (Dupond, 1978 et al., 1980). The aim of the"}, {'generated_text': 'Neural Networks with attention have been used with great success in the context of neuroscience and neuroscience. Neural networks are very often used when we need to make'}, {'generated_text': "Neural Networks with attention have been used with great success in understanding people's neural systems and how they relate to each other and to objects. The neural"}]


## Text Generation with Python

In [18]:
# Download data
! wget https://github.com/lazyprogrammer/machine_learning_examples/raw/master/hmm_class/robert_frost.txt

--2023-11-17 16:15:13--  https://github.com/lazyprogrammer/machine_learning_examples/raw/master/hmm_class/robert_frost.txt
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/hmm_class/robert_frost.txt [following]
--2023-11-17 16:15:13--  https://raw.githubusercontent.com/lazyprogrammer/machine_learning_examples/master/hmm_class/robert_frost.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56286 (55K) [text/plain]
Saving to: ‘robert_frost.txt’


2023-11-17 16:15:13 (4.65 MB/s) - ‘robert_frost.txt’ saved [56286/56286]



In [21]:
import torch
import pandas as pd
import textwrap
import matplotlib.pyplot as plt

from transformers import pipeline

from pprint import pprint

In [22]:
# List what's in my current directory
!ls

robert_frost.txt  sample_data


In [23]:
# Let's
!cat robert_frost.txt

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth; 

Then took the other, as just as fair,
And having perhaps the better claim
Because it was grassy and wanted wear,
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day! 
Yet knowing how way leads on to way
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference.

Whose woods these are I think I know.
His house is in the village, though; 
He will not see me stopping here
To watch his woods fill up with snow.

My little horse must think it queer
To stop without a farmhouse near
Between the woods and frozen lake
The darkest evenin

In [26]:
# Remove the spaces
lines = [line.rstrip() for line in open("robert_frost.txt")]
lines = [line for line in lines if len(line) > 0]

print(lines)

['Two roads diverged in a yellow wood,', 'And sorry I could not travel both', 'And be one traveler, long I stood', 'And looked down one as far as I could', 'To where it bent in the undergrowth;', 'Then took the other, as just as fair,', 'And having perhaps the better claim', 'Because it was grassy and wanted wear,', 'Though as for that the passing there', 'Had worn them really about the same,', 'And both that morning equally lay', 'In leaves no step had trodden black.', 'Oh, I kept the first for another day!', 'Yet knowing how way leads on to way', 'I doubted if I should ever come back.', 'I shall be telling this with a sigh', 'Somewhere ages and ages hence:', 'Two roads diverged in a wood, and I,', 'I took the one less traveled by,', 'And that has made all the difference.', 'Whose woods these are I think I know.', 'His house is in the village, though;', 'He will not see me stopping here', 'To watch his woods fill up with snow.', 'My little horse must think it queer', 'To stop without 

In [28]:
# Load the pretrained model
generate = pipeline(
    task = "text-generation",
    model = "gpt2"
)

In [32]:
print(lines[0])

Two roads diverged in a yellow wood,


In [43]:
result = generate(lines[0], num_return_sequences = 1, max_length = 20)
print(result)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Two roads diverged in a yellow wood, and the entire block stood still as the river rushed across'}]


In [44]:
# Prints the previous text beautifully
pprint(_)

[{'generated_text': 'Two roads diverged in a yellow wood, and a few trees '
                    'burst free from the clearing.\n'
                    '\n'
                    '"He\'s dead!" she shouted into the microphone, pointing '
                    'one of her knives out at them. She slammed the mic shut. '
                    'The cops arrived'}]


In [45]:
pprint(generate(lines[0], num_return_sequences = 3, max_length = 20))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Two roads diverged in a yellow wood, the green sign '
                    'telling all how much a car or truck'},
 {'generated_text': 'Two roads diverged in a yellow wood, their leaves and '
                    'branches twirling and turning as the winds'},
 {'generated_text': 'Two roads diverged in a yellow wood, and in the direction '
                    'of the bridge. On one side'}]


In [46]:
# Helper funtion to help formatting poems
def wrap(x):
  return textwrap.fill(
      text = x,
      replace_whitespace = True,
      fix_sentence_endings = True
    )

In [49]:
result = generate(lines[0], max_length = 30)
print(wrap(result[0]["generated_text"]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Two roads diverged in a yellow wood, but there were no deaths reported
from them.  Several hours before the first fire began at the base


In [50]:
previous = "Two roads diverged in a yellow wood, but there were no deaths reported from them."

result = generate(previous + "\n" + lines[1], max_length = 60)
print(wrap(result[0]["generated_text"]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Two roads diverged in a yellow wood, but there were no deaths reported
from them.  And sorry I could not travel both cars in one direction,
but you know, the guy on the opposite side and one side of the line is
really busy and the guy on the other side is busy and


In [52]:
previous = """Two roads diverged in a yellow wood, but there were no deaths reported
from them.  And sorry I could not travel both cars in one direction,
but you know, the guy on the opposite side and one side of the line is
really busy and the guy on the other side is busy"""

result = generate(previous + "\n" + lines[2], max_length = 90)
print(wrap(result[0]["generated_text"]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Two roads diverged in a yellow wood, but there were no deaths reported
from them.  And sorry I could not travel both cars in one direction,
but you know, the guy on the opposite side and one side of the line is
really busy and the guy on the other side is busy And be one traveler,
long I stood there as a part of the road,and there was nothing to do
inbetween from that


In [54]:
previous = """Two roads diverged in a yellow wood, but there were no deaths reported
from them.  And sorry I could not travel both cars in one direction,
but you know, the guy on the opposite side and one side of the line is
really busy and the guy on the other side is busy And be one traveler,
long I stood there as a part of the road,and there was nothing to do
inbetween from that"""

result = generate(previous + "\n" + lines[3], max_length = 120)
print(wrap(result[0]["generated_text"]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Two roads diverged in a yellow wood, but there were no deaths reported
from them.  And sorry I could not travel both cars in one direction,
but you know, the guy on the opposite side and one side of the line is
really busy and the guy on the other side is busy And be one traveler,
long I stood there as a part of the road,and there was nothing to do
inbetween from that And looked down one as far as I could See from the
road to any time that was close to the road And that's when you


In [57]:
prompt = "Neural Networks with Attention have been used with great success" + "in Natural Language Processing"

result = generate(prompt, max_length = 300)
print(wrap(result[0]["generated_text"]))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Neural Networks with Attention have been used with great successin
Natural Language Processing for many long time.  However, the work now
used in this paper may be most applicable to applications such as
language comprehension.  This paper introduces the concepts of neural
networks through a theoretical proposal.  The theoretical framework
for working with network networks is known from many different fields
including deep learning, networks, inference, reinforcement learning,
reinforcement learning and other approaches such as neural nets.  The
paper discusses what the paper is attempting to show with natural
systems, using natural networks as an example to try and draw in
natural systems.  It then tries to show this to the computer users.
For more details, please see this paper,  [Reference:
http://molexpress.net/2018/07/23/network-learning-has-been-used-with-
great-success/]  Advertisements
