## Text Generation

In [2]:
from transformers import pipeline
import numpy as np
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

from sklearn.metrics import roc_auc_score, f1_score, confusion_matrix
from sklearn.model_selection import train_test_split

In [3]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device:", device)
if device == 'cuda':
    print("current_device: ", torch.cuda.current_device())

Device: cuda
current_device:  0


In [4]:
gen = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [5]:
prompt = "Neural networks with attention have been used with great success"

In [6]:
gen(prompt)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Neural networks with attention have been used with great success in other, related areas. (See below for a brief history of neural network studies.) The primary goal of my study was to investigate (5) how the presence of certain connections may affect the'}]

In [8]:
gen(prompt, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Neural networks with attention have been used with great success in humans. But what is unique about neural networks is that it can also be used to explore behavior on neurons in other areas of the brain.\n\nIn a recent paper, the researchers showed'},
 {'generated_text': 'Neural networks with attention have been used with great success to identify and assess mood in people who do not have a deep level of cognition. The aim of the present study was to assess whether and by whom attentional responses to attention in children and adolescents'},
 {'generated_text': 'Neural networks with attention have been used with great success to determine the neural networks of attentional and visual attentional control, particularly during cognitive tasks in humans such as working memory and attentional control.\n\nIntroduction\n\nAs the most common form'}]

In [9]:
gen(prompt, max_length=30)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Neural networks with attention have been used with great success. But with the advent of more sophisticated devices like wearable, we can see the use case of'}]

In [12]:
from transformers import set_seed
import textwrap
from pprint import pprint


In [13]:
filePath ="data/robert_frost.txt"

In [16]:
lines = [line.rstrip() for line in open(filePath)]
lines = [line for line in lines if len(line) > 0]

In [18]:
set_seed(42)

In [19]:
lines[0]

'Two roads diverged in a yellow wood,'

In [20]:
gen(lines[0])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Two roads diverged in a yellow wood, but only one passed. A short time later—after a short time being outrunning the attackers, and the two sides retreating to the road back to Porto dei Fiori—a number of cars'}]

In [21]:
pprint(_)

[{'generated_text': 'Two roads diverged in a yellow wood, but only one passed. '
                    'A short time later—after a short time being outrunning '
                    'the attackers, and the two sides retreating to the road '
                    'back to Porto dei Fiori—a number of cars'}]


In [22]:
pprint(gen(lines[0], max_length=20))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Two roads diverged in a yellow wood, only to be swamped '
                    'by the waves. The winds'}]


In [23]:
def wrap(x):
    return textwrap.fill(x, replace_whitespace=False, fix_sentence_endings=True)

In [24]:
out = gen(lines[0], max_length=30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [25]:
print(wrap(out[0]['generated_text']))

Two roads diverged in a yellow wood, with one coming in from the West
Side into the East Harlem neighborhood of Harlem, two coming up at
speeds


In [27]:
prev = 'Two roads diverged in a yellow wood, with one coming in from the West' + \
'Side into the East Harlem neighborhood of Harlem, two coming up at' + \
' speeds.'
out = gen(prev + '\n' + lines[2], max_length=60)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Two roads diverged in a yellow wood, with one coming in from the
WestSide into the East Harlem neighborhood of Harlem, two coming up at
speeds.
And be one traveler, long I stood in the intersection to check
the distance.
So what I've come to say to our visitors and
