# Transformers study
by anna lin

## "translating" a poem

In [1]:
import sys
!conda install --prefix {sys.prefix} -y -c pytorch pytorch

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [2]:
import sys
!{sys.executable} -m pip install transformers



In [9]:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('distilgpt2')
model = AutoModelForCausalLM.from_pretrained('distilgpt2')
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

For the source material, I used [Frank O'Hara's poem](https://poets.org/poem/having-coke-you) and split it up into an array of 27 sentences and randomly chose a few as the prompt.

In [23]:
import torch
import random

# prompt = "读万卷书行万里路"
prompt = "'Having a Coke with You\nis even more fun than going to San Sebastian, Irún, Hendaye, Biarritz, Bayonne\nor being sick to my stomach on the Travesera de Gracia in Barcelona\npartly because in your orange shirt you look like a better happier St. Sebastian\npartly because of my love for you, partly because of your love for yoghurt\npartly because of the fluorescent orange tulips around the birches\npartly because of the secrecy our smiles take on before people and statuary\nit is hard to believe when I’m with you that there can be anything as still\nas solemn as unpleasantly definitive as statuary when right in front of it\nin the warm New York 4 o’clock light we are drifting back and forth\nbetween each other like a tree breathing through its spectacles\n\nand the portrait show seems to have no faces in it at all, just paint\nyou suddenly wonder why in the world anyone ever did them\nI look\nat you and I would rather look at you than all the portraits in the world\nexcept possibly for the Polish Rider occasionally and anyway it’s in the Frick\nwhich thank heavens you haven’t gone to yet so we can go together for the first time\nand the fact that you move so beautifully more or less takes care of Futurism\njust as at home I never think of the Nude Descending a Staircase or\nat a rehearsal a single drawing of Leonardo or Michelangelo that used to wow me\nand what good does all the research of the Impressionists do them\nwhen they never got the right person to stand near the tree when the sun sank\nor for that matter Marino Marini when he didn’t pick the rider as carefully\nas the horse\nit seems they were all cheated of some marvelous experience\nwhich is not going to go wasted on me which is why I’m telling you about it'"
lines = prompt.split('\n')
prompted = random.sample(lines, 6)
print(prompted)

['I look', 'in the warm New York 4 o’clock light we are drifting back and forth', 'or for that matter Marino Marini when he didn’t pick the rider as carefully', 'and the fact that you move so beautifully more or less takes care of Futurism', 'between each other like a tree breathing through its spectacles', "'Having a Coke with You"]


Then I generated a new poem in two different ways.
First, I used each line and passed it into the generator to complete it.
(I also found out from this [thread](https://stackoverflow.com/questions/69609401/suppress-huggingface-logging-warning-setting-pad-token-id-to-eos-token-id) how to get rid of the 'Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.' errors.)

In [20]:
for p in prompted:
    print((generator(p, pad_token_id=tokenizer.eos_token_id)[0]['generated_text']).lower(), end="\n")

is even more fun than going to san sebastian, irún, hendaye, biarritz, bayonne and la vibar.

the last installment of the series is also set for premiere on december 22 at 7 p.m
between each other like a tree breathing through its spectacles, it is often that its heart is in the centre of the tree, or in either the other side of it or in one area.
till the body does not breathe again.

in the warm new york 4 o’clock light we are drifting back and forth about the horizon. when we are about the bright blue side we are on the horizon so it is hard to discern a spot with great detail. the sun is more
as the horse was only able to pass through the air after the game. i found myself laughing hysterically for not being able to see the horse, but the whole situation seemed to play into it and the horse was only able to pass through my arms
or being sick to my stomach on the travesera de gracia in barcelona and my sister in mexico city in los angeles while on the road. in the last couple of wee

The second way I ran the poem was using the tokenizer, passing each prompt in and passing it back in 10 times.

In [32]:
for n in prompted:
# for chinese prompt
#     p = n + " means"
    for i in range(10):
    # encode the prompt
        prompt_encoded = tokenizer([n], return_tensors="pt")
        # run a forward pass on the network
        result = model(**prompt_encoded)
        # get the probabilities for the next word
        next_token_probs = result.logits[0,-1]
        # sort by value, get the top 12 (you can change this number! try 1, or 1000)
        nexts = torch.argsort(next_token_probs)[-12:]
        # append the decoded ID to the current prompt
        n += tokenizer.decode(random.choice(nexts))
    print(n.lower(), end=" ")

i look back. he has never felt this. i'm in the warm new york 4 o’clock light we are drifting back and forth from new yorkers who can not remember what they came or for that matter marino marini when he didn’t pick the rider as carefully he could with other drivers in terms
i would and the fact that you move so beautifully more or less takes care of futurism
i'm still going through what i thought, between each other like a tree breathing through its spectacles (e-mog). in fact a simple 'having a coke with you? it will help me stay away in that area 

For fun, I tried to use a Chinese proverb, and added " stands for" to see what it would generate to complete that sentence.

In [37]:
prompted = "读万卷书行万里路"
for n in prompted:
# for chinese prompt
    n += " stands for"
    for i in range(6):
    # encode the prompt
        prompt_encoded = tokenizer([n], return_tensors="pt")
        # run a forward pass on the network
        result = model(**prompt_encoded)
        # get the probabilities for the next word
        next_token_probs = result.logits[0,-1]
        # sort by value, get the top 12 (you can change this number! try 1, or 1000)
        nexts = torch.argsort(next_token_probs)[-12:]
        # append the decoded ID to the current prompt
        n += tokenizer.decode(random.choice(nexts))
    print(n.lower(), end="\n")

读 stands for, the right thing you can
万 stands for 'free palestine.' it�
卷 stands for, in particular a small number
书 stands for
\
.

i
行 stands for, and a large group is
万 stands for free trade, but that would
里 stands for his brother. his name has
路 stands for freedom from religion & labor party
