# Homework 05 

> Use predictive models to generate text: either a Markov chain or a neural network, or both. How does your choice of source text affect the output? Try combining predictive text with other methods we’ve used for analyzing and generating text: use neural network-generated text to fill Tracery templates, or train a Markov model on the output of parsing parts of speech from a text, or some other combination. What works and what doesn’t? How does neural network-generated text “feel” different from Markov-generated text? How does the length of the n-gram and the unit of the n-gram affect the quality of the output?

So, what do I have to work with? 
- Markov Chains 
- Neural Networks 
- Books from Project Gutenberg 
- Personal Writing 
- Tracery
- NLP via SpaCy

---
# Loading transformers
Just doing the basics, directly copied from Allison's examples. 

In [1]:
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

In [2]:
tokenizer = AutoTokenizer.from_pretrained('distilgpt2')
model = AutoModelForCausalLM.from_pretrained('distilgpt2')

In [3]:
generator = pipeline('text-generation', model=model, tokenizer=tokenizer)

In [7]:
print(generator("Once, I started to walk")[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once, I started to walk with a fellow woman. It was weird, scary, uncomfortable, dangerous. It was very awkward and it was terrifying to me. I had just been on the bus.

You are like a child.
My


---
# Fine Tuning with Peter Pan 
I think I want to fine-tune this model to fit something a little closer to home. I have a deep love for Peter Pan, so I downloaded the text from Project Gutenberg to work with. Peter Pan (as troublesome as some of the story is) holds themes of everlasting childhood, love for wonder, and imagination. 

File sources are: 
```
"sources/peter-pan.txt"
```

At first I created a truncated text to work with (seen in the cell below), but I realized that the majority of the more interesting, descriptive language occurs later in the text. I decided to try to train using the full text instead, which definitely took longer, but still only about 4 minutes. 

In [12]:
# Create a truncated peter-pan text to work with
with open("sources/truncated-peter-pan.txt", "w") as fh:
    fh.write(open("sources/peter-pan.txt").read()[:20000])

In [8]:
import sys
!{sys.executable} -m pip install datasets

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting datasets
  Downloading datasets-2.11.0-py3-none-any.whl (468 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m468.7/468.7 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pyarrow>=8.0.0
  Downloading pyarrow-11.0.0-cp39-cp39-macosx_10_14_x86_64.whl (24.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.5/24.5 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting responses<0.19
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Collecting multiprocess
  Downloading multiprocess-0.70.14-py39-none-any.whl (132 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.9/132.9 kB[0m [31m4.6 MB/s

In [14]:
import datasets

In [24]:
training_data = datasets.load_dataset('text', data_files="sources/peter-pan.txt")

Downloading and preparing dataset text/default to /Users/leiachang/.cache/huggingface/datasets/text/default-33e0c349e4d6c2f1/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset text downloaded and prepared to /Users/leiachang/.cache/huggingface/datasets/text/default-33e0c349e4d6c2f1/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [25]:
tokenizer.pad_token = tokenizer.eos_token
tokenized_training_data = training_data.map(
    lambda x: tokenizer(x['text']),
    remove_columns=["text"]
)

Map:   0%|          | 0/6261 [00:00<?, ? examples/s]

In [26]:
block_size = 64
# magic from https://github.com/huggingface/notebooks/blob/master/examples/language_modeling.ipynb
def group_texts(examples):
    concatenated_examples = {k: sum(examples[k], []) for k in examples.keys()}
    total_length = len(concatenated_examples[list(examples.keys())[0]])
    total_length = (total_length // block_size) * block_size
    result = {
        k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
        for k, t in concatenated_examples.items()
    }
    result["labels"] = result["input_ids"].copy()
    return result
lm_training_data = tokenized_training_data.map(
    group_texts,
    batched=True,
    batch_size=200
)

Map:   0%|          | 0/6261 [00:00<?, ? examples/s]

In [19]:
from transformers import Trainer, TrainingArguments

In [29]:
trainer = Trainer(model=model,
                  train_dataset=lm_training_data['train'],
                  args=TrainingArguments(
                      output_dir='distilgpt2-finetune-peter-pan',
                      num_train_epochs=1,
                      do_train=True,
                      do_eval=False
                  ),
                  tokenizer=tokenizer)

In [30]:
trainer.train()

Step,Training Loss


TrainOutput(global_step=129, training_loss=3.843318968780281, metrics={'train_runtime': 198.4606, 'train_samples_per_second': 5.2, 'train_steps_per_second': 0.65, 'total_flos': 16853640413184.0, 'train_loss': 3.843318968780281, 'epoch': 1.0})

In [31]:
trainer.save_model()

In [32]:
generator("I have no thoughts ", max_length=100)[0]['generated_text']

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'I have no thoughts _____—he gave us. It is the last time he talked,” and now we are a little surprised. Wendy is a big-headed figure at your door.“Are she still crying?” he whispered thoughtfully.“I am going to be quite sure.”“I don’t have any hope.”She sat upand down on the floor when she spoke.“Yes,” said Peter to'

---
# Great! Now what? 
I have a language model, tweaked to generate text based on Peter Pan by J.M. Barre. Here I attempt to use this language model to create some new works, with prompts based on some other sources. I take some quotes from the stories that I've gathered with my thesis test-runs, and use those as prompts to try and create some sort of juxtoposition between what was originally written and what it is "reimagined" in our fairy-tale land. 

In [46]:
original_01 = "I usually call my mother with bad news, or my friend Joe. When I want to tell them the worries I have, I know theyll listen with care and call me on my bullshit whenever I need them to. And they'll take a moment to mourn with me"
prompt_01 = "I usually call my mother with bad news, "

In [50]:
for x in range(5): 
    print(generator(prompt_01, max_length=100)[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I usually call my mother with bad news,    ;and she was not my mother. Of course, but you know her, and you have to tell her that at the last moment. Which is very good. It would be almost impossible for the boys to change minds or the feeling that their faces had gone out of their minds to make them proud, and it was so that they thoughtfor a moment that they should have lost their minds. But they did not. In a moment when


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I usually call my mother with bad news,    _______________________________________________________.”“Yes,” she answered, “we will be very pleased to welcome you.”“It’s time to get back again,” he said, “what do you have to do?”“Then, we will go to school in the evening by night.“O’HO.”“So, where,


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I usually call my mother with bad news,      , just in case she would like it to go home.” She will not be allowed to tell you what happened to Tootles, because there is the most fatal lesson to take—and to explain it in her simple language!”“Wendy’s mother said, ‘Hinde your bed!“What are you fussing about?’Then she looked at it


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I usually call my mother with bad news,                                                                                           
I usually call my mother with bad news, ick in the night, and I have no other person to call. But Peter is the only man in the world who knows the secret.He is always trying to persuade me. To all, who knows, but Peter is the only friend by nature.He is always trying to persuade me with bad news. Peter is the man of course. But he is the only person in the world who knows all the secrets.”“When I


In [35]:
a = _

In [37]:
print(a)

I usually call my mother with bad news,   they can be told it is a game. When the mother is told it is a game, it is an empty box. For some time she is not allowed to talk, but sometimes she can. She is not allowed to be, and her time is spent, thinking with the children her hand about it. Ofthis she is not always safe enough to call one another, and there is no one to whom she may give it a chance to


The original: 
```
I usually call my mother with bad news, or my friend Joe. When I want to tell them the worries I have, I know theyll listen with care and call me on my bullshit whenever I need them to. And they'll take a moment to mourn with me
```

And the generated: 

```
I usually call my mother with bad news,   they can be told it is a game. When the mother is told it is a game, it is an empty box. For some time she is not allowed to talk, but sometimes she can. She is not allowed to be, and her time is spent, thinking with the children her hand about it. Ofthis she is not always safe enough to call one another, and there is no one to whom she may give it a chance to
```

In [39]:
b = generator("I have a small box I got from India ", max_length=100)[0]['generated_text']
print(b)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I have a small box I got from India _______________  ________ ________  ________ ________ ________________   ________       ________   ________________  ________                                                         


That's ... strange. I wonder if there's just a number of breaks in there. 

In [51]:
for x in range(3): 
    print("--------------------")
    print(generator("I have a small box I got from India ", temperature=3.0, max_length=100)[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


--------------------


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I have a small box I got from India !!!!!!!Dice not-so large-wise in it!It says a thing all right: we all know one another and all love to say the greatest love that you give andtell with your hearts like yourest work. Which may, from it shall surely know not too more!Tightly it goes, let our best friends call it.Oh! they do. Howl really the heart must grow! TIGHTDATE FOR LIFE!!!!
--------------------


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I have a small box I got from India   so wellI“lovelyly-sounding call I never saw ~~~~ ~~~~~~~~__;Y a moment I am in. When no need come on, just because we did ********‏, it can turn some lovely colour against the tree leavesand now at some other lightening in that place I was walking without sightless—before they could hear theiroftoil; I shall only do this once every night!The house
--------------------
I have a small box I got from India  this was my only thing till then whichI made it the same or of little black colour into, like the very large haters gave to TubbyDance in Tumpme!They thought in such things wheninvisible shadow, theyhad got them very nice at the moment till quite often that of the white in blue who didof that time byadoe to them from outside the window was found just to leave me for good at some


Original: 
```
I have a small box I got from India in which I keep little notes and cards and tokens from people I love. I've moved a lot, but this box stays with me wherever I go. It's the one thing I'd grab if there were a fire.
```

Generated: 
```
I have a small box I got from India 〙 to a long distance, it was the only way to look. But my nose was blown open by the sun.”“I have a small box, I have a small box, I have a small box, and I have a small box, I have a small box, and I have a small box. I have a small box, and I have a small box.”“Where is my bag?
```

---
# And now something more critical
Alright, that was informative. The prompts are curious, but I'm not sure the prompts and the model really work well together. 
There's something here in the model/corpus/original text about fantasy, escapism, and running away. What can we do with that? 

Taking for example my last Homework attempt (where I created a poem that forgot itself as I ran it), I now try to create a poem that runs further and further afoot, or further and further consistent. The texts generated by the neural net are already nonsensical (while some sense can be made, it certainly does not read as if a person wrote it, or moreso it reads as if someone were stream-of-conscious writing but with little to no preconcieved idea of consistency or attempting to communicate something). So let's lean into that, rather than away from it. 

In [54]:
prompt = "I ran away. I ran, and "
generated = []
temperatures = []
temp = 0.1
addTemp = 0.5

In [55]:
for x in range (5): 
    generated.append(generator(prompt, temperature=temp, max_length=50)[0]['generated_text'])
    temperatures.append(temp)
    temp += addTemp

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [57]:
for text in generated: 
    print(text)
    print('\n\n')

I ran away. I ran, and icky, and I went.“I’ll tell you,” he said, “I’ll tell you,” and he was so glad that he was not so



I ran away. I ran, and ________.”“Oh, dear,” he said, “we have to go,� visas.”“No,” she said, “not to tell



I ran away. I ran, and    They had the light to tell you—thoughthe stars in the sky dimened—and they could be up with you if they were up,but you could not talk; your little mother had



I ran away. I ran, and ____________”I went at it wondering if he were dying. There is only one one who knows whom it is all his age; indeed one who was a girl, it became one. Wendy liked



I ran away. I ran, and ~~ he was dead because John felt him, I found a home near Peterhouse on the river to put on one bedand let them fly where ~~ I came last winter to see him and see my





That's kind of lovely. It gets a bit dark, with the dying and dead, and I love that. It's hard to read though, and I want to try and add some kind of breaks into it (and and curious spacing). Let's break this up into words, and add some spaces around it using random.  

In [58]:
import random 

In [106]:
to_be_inserted = ["\n", "          ", "\n\n"]
generated_02 = generated.copy()
generated_02_spaced = []

In [107]:
for text in generated_02: 
    words = text.split()
    maxIndex = len(words)
    chosenIndexes = random.sample(range(0, maxIndex), 5)
    print(chosenIndexes)
    for i in chosenIndexes: 
        words.insert(i, random.choice(to_be_inserted))
    generated_02_spaced.append(" ".join(words))

[2, 10, 8, 3, 24]
[14, 13, 1, 6, 11]
[13, 17, 35, 8, 18]
[38, 15, 10, 19, 35]
[7, 29, 41, 9, 31]


In [108]:
for text in generated_02_spaced: 
    print(text)
    print("\n\n\n   -------------------------------------------- \n")

I ran            

 away. I ran, and icky, 
 and I 

 went.“I’ll tell you,” he said, “I’ll tell you,” and he was 

 so glad that he was not so



   -------------------------------------------- 

I 

 ran away. I ran,            and ________.”“Oh, dear,” he 
 said, “we have to            go,� 

 visas.”“No,” she said, “not to tell



   -------------------------------------------- 

I ran away. I ran, and They had 
 the light to tell you—thoughthe            stars in the 
 

 sky dimened—and they could be up with you if they were up,but you could not talk; your            little mother had



   -------------------------------------------- 

I ran away. I ran, and ____________”I went at it            wondering if he were dying. 

 There is 
 only one one who knows whom it is all his age; indeed one who was 
 a girl, it became one. Wendy 
 liked



   -------------------------------------------- 

I ran away. I ran, and ~~ 

 he 

 was dead because John felt him, I found a home near Pet

---
# Final Output 


```
I ran            

 away. I ran, and icky, 
 and I 

 went.“I’ll tell you,” he said, “I’ll tell you,” and he was 

 so glad that he was not so



   -------------------------------------------- 

I 

 ran away. I ran,            and ________.”“Oh, dear,” he 
 said, “we have to            go,� 

 visas.”“No,” she said, “not to tell



   -------------------------------------------- 

I ran away. I ran, and They had 
 the light to tell you—thoughthe            stars in the 
 

 sky dimened—and they could be up with you if they were up,but you could not talk; your            little mother had



   -------------------------------------------- 

I ran away. I ran, and ____________”I went at it            wondering if he were dying. 

 There is 
 only one one who knows whom it is all his age; indeed one who was 
 a girl, it became one. Wendy 
 liked



   -------------------------------------------- 

I ran away. I ran, and ~~ 

 he 

 was dead because John felt him, I found a home near Peterhouse on the river to put on one bedand                       let them fly where ~~ I came last winter to see 

 him and see my



   -------------------------------------------- 
```

I am in love with this iteration's output, if not the whole piece. This reminds me of a book called [Death in Her Hands by Ottessa Moshfegh](https://www.newyorker.com/books/page-turner/ottessa-moshfeghs-death-in-her-hands-is-a-new-kind-of-murder-mystery), which a lovely friend lent me last summer and it left me in a TIZZY. In that book, a women is (perhaps) going more and more insane, or just dealing with more and more (let me not spoil it). Here, it feels like the series of sets are slowly coming up to the understanding and processing that someone has died, and that the author had to run away because of it. 

Applying the "author" to this feels almost like folly. I didn't write most of it, I left it at the hands of a language model. I want to lay my hands on this more deeply, but I'm not sure how yet. Perhaps I'd add more of my own words to the mix, or to run it over and over and over, re-structuring the poems on my own. I'm deeply intrigued by the process taken by David Jhave Johnston in [Rerites](http://glia.ca/rerites/), where the poet edited the generated output deeply and ritualistically every day. I may wish to perform something similar here. 