# What
Generate new text from some starter text.

# Why
One application would be automatic generation of paragraphs or 
more from starter text.  
I suppose there are more applications.

# Background

Examples of generating text from some starter seed text abound.  
For example, you can try it yourself at Hugging Face
     https://huggingface.co/gpt2
     
Often the generated text seems reasonable.

I was curious what would get generated if the seed were for example
* famous book beginnings
* movie quotes
* jokes, 
and other oddball starter strings.

Note that I am not transfering the model to any of these  
domains and retraining, so it is probably unreasonable to expect much sensible.

In any case, what follows is the result of this experiment.

Kudos to the Hugging face folks for making it very easy to get   
the required software installed and also for giving some great examples.

We will need the transformers package

In [2]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 4.9 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 75.0 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 70.0 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.11.1 tokenizers-0.13.2 transformers-4.25.1


In [3]:
import re
from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

## Useful functions

In [6]:
def split_start_end(gen_strings, start):
    """ split the string into the original start and generated part """
    res = {"start": start}
    res["gen"] = []
    for gstring in gen_strings:
        temp = gstring["generated_text"]
        gen = re.sub(start, '', temp)
        res["gen"].append(gen)
    return res

def pp_results(start_ends):
    """ format the start and ends for nicer printing. """
    res = "\n\n"
    for se in start_ends:
        res += f"""SEED: {se["start"]}\n"""
        for i, gen in enumerate(se["gen"]):
            gen = re.sub("\\n", "", gen)
            res += f"""gen {i}: {gen}\n"""
        res += "\n---"
    return res

def gen_endings(generator, starts, max_new_tokens=20, num_sequences=4):
    """ Generate endings from some starter seed text """
    start_ends = []
    for start in starts:
        gen_strings = generator(start, max_new_tokens = max_new_tokens, num_return_sequences=num_sequences,
                            pad_token_id=tokenizer.eos_token_id)
        start_ends.append(split_start_end(gen_strings, start))
    pp = pp_results(start_ends)
    return pp
 

## Book beginnings

In [7]:
starts = ["Call me Ishmael", 
                "It was the best of times, it was the worst of times,",
                "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days",
                "In the beginning God created the heavens and the earth."
                 ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Call me Ishmael
gen 0: !I'll be ready to go once you arrive, but before I do, I tell
gen 1:  on Twitter
gen 2:  or I've become pregnant, but he's a friend. I tell him to be careful. You
gen 3: , who did not hear my prayers. If, instead, he did hear them, he would tell

---SEED: It was the best of times, it was the worst of times,
gen 0:  but I could make it."For the former Bulls teammate, it went so well that he
gen 1:  it was the best of nights," he said.But the real thrill of taking a nap
gen 2:  but I didn't know if I would be able to make sense of it."
gen 3:  and to be like that you needed to get up a bit early, because you didn't want your

---SEED: He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days
gen 0:  without finding anything.""Oh, I am surprised by his story," said Elizabeth. "
gen 1:  already, but he could hardly stand the thought of being in the wrong place at the wrong time.
gen 2:  without eating in six weeks and h

Of course, you get something different with every run,   
but I am somewhat impressed with the generated text for the Bible.
Also, at least one of the Hemmingway seed seems ok.

## Movie lines

In [8]:
starts = ["Frankly my dear, I don't give a damn",
          "May the Force be with you",
          "No Mr. Bond, I expect you to ",
          "Of all the gin joints in all the towns in all the world, ",
          "What we have here is a failure to "
              ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Frankly my dear, I don't give a damn
gen 0: , even though I'm actually an intelligent, hardworking, selfless human."He said
gen 1:  what you do. The only thing I give is more support and more patience for what is right and
gen 2:  about politics or your little little party."Forgive me for feeling uneasy at the thought of
gen 3:  if you have been at one of these movies for too long."

---SEED: May the Force be with you
gen 0: ! [Chaplin continues to grow older while singing, "We love all of you!" as
gen 1:  now,and we shall never let you live to see her again.It's
gen 2: ! Please pray for us.As if the moment had suddenly given in, a voice that
gen 3: , son of Horus!"Crazy Credits, for some of the things you will get.

---SEED: No Mr. Bond, I expect you to 
gen 0: " "If you must be so polite, have you been able to see?" "Not at all
gen 1: "No Mr. Bond, I expect you to.Mr. Bond: Why
gen 2: ""No Mr. Bond, take some time! Get on!" he said. "
gen 3: _________. I think that a good rea

## Yoda lines
Yes, they are movie lines as well

In [9]:
starts = ["Do or do not, there is no try",
            "Named must be your fear before banish it you can",
            "The greatest teacher, failure is",
            ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Do or do not, there is no try
gen 0: , you will break free and be a free person. So I say to you and everyone else,
gen 1: in' or doin' tryin' tryin'" and so on. Just as our sense
gen 2: , and no grace. Let the Father's commandments be no law, but the glory of Christ the
gen 3:  and get them to agree to what will come to light.The same goes for this point

---SEED: Named must be your fear before banish it you can
gen 0:  do much other than just sit around and listen to nothing.I like to find the right
gen 1:  call every player on the committee or tell them to shut up and play. In essence anyone with real
gen 2:  be sure to make sure to use the same name!Ladies and gentlemen and friends,
gen 3:  choose to use anything that works.Use any spell that needs changing to fix this.

---SEED: The greatest teacher, failure is
gen 0:  the biggest cause of self-pity." ("Harmony, the Child of Satan at Work
gen 1:  not the first one. It's an obligation. And sometimes we get it worse than you

## Yogi Berra quotes

In [10]:
starts = ["When you come to a fork in the road",
             "You can observe a lot by ",
             "It ain't over till it's",
             "No one goes there nowadays, it's",
             "If the world was perfect, ",
             "In theory there is no difference between theory and practice. In practice"]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: When you come to a fork in the road
gen 0:  where there are two lines, two trees, you have three options. Either you can go back and
gen 1:  just follow the bike. Sometimes you have to run for an hour. If you have to stay up
gen 2: , don't use a backhoe to cut it apart.How Often Do We Use the
gen 3: , do you really want that fork to be moving at all? It's a difficult topic, but

---SEED: You can observe a lot by 
gen 0: icky little ways. When talking about a given kind of a problem in the general population it always starts
gen 1:  a lens: The darkness of some light is what changes the colors of the image in the
gen 2:  treading through the images for more. Let's start with a low power background.
gen 3:  explanation. Here is a diagram showing one typical event:This one includes all of

---SEED: It ain't over till it's
gen 0:  over. You get me going.Ruth was in fact a bit disappointed that she couldn
gen 1:  over, I guess, in the end he's going to get paid."
gen 2:  over.' I fe

## Joke beginnings


In [11]:
starts = ["A man walks into a bar",
           "A grasshopper walks into a bar",
              ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: A man walks into a bar
gen 0:  as he awaits his appointment as the new manager at Loughborough. Source: Getty Images 2/
gen 1:  and sees his wife playing with the glass at the edge of the bar and pulls out a gun.
gen 2:  with a handgun and a pair of AK-47 assault rifles and a pistol as police break up a
gen 3:  in the city of Paris, France. The head of the French Secretariat and the French embassy in

---SEED: A grasshopper walks into a bar
gen 0: ber shop in Sydney, NSW on Monday.A tree-lined street of an old Victorian
gen 1:  and the waiter is seated next to them. He says, "I haven't heard any good news
gen 2:  where he's standing, not on the edge of the bar (his stance was obviously wrong) but
gen 3:  with a sign that reads "Hey, you go to the beach or something!" and a bar with

---


## Politics and History

In [12]:
starts = ["Four score and seven years ago",
          "When in the Course of human events, it becomes necessary for one people to"
          "Let us therefore brace ourselves to our duty and so bear ourselves that if the British Commonwealth and Empire lasts for a thousand years men will still say,"," "
          "I would say to the House, as I said to those who have joined this government: 'I have nothing to offer but "
         ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Four score and seven years ago
gen 0: , the NFL banned defensive lineman Chris Culliver. Today, the league is considering making every player ineligible
gen 1: , this was the top option.In the meantime? We'll see what they've done
gen 2:  and again. The players were ready to go forward, but when the start was announced in mid-
gen 3:  you would have called yourself a "snowflake", a "monster hunter". Now you are a

---SEED: When in the Course of human events, it becomes necessary for one people toLet us therefore brace ourselves to our duty and so bear ourselves that if the British Commonwealth and Empire lasts for a thousand years men will still say,
gen 0:  Let us go and kill all those who are here, for if the British Commonwealth is lasted ten times
gen 1:  'When we conquered Jerusalem in the War of Five Kings that followed the conquest of Rome' 'No
gen 2:  'There are no British colonies, no British states, no British government, it is too strong."
gen 3:  'the British Common

## Summary
I did not know what to expect, so I could not really be disappointed.

The gun references from "A man walks into a bar" are slightly troubling.

All in all, I would say the results are amusing, especially  
for the jokes, Yoda and  Yogi Berra ... and  maybe even the Politics and History.