# What
Generate new text from some starter text.

# Why
One application would be automatic generation of paragraphs or 
more from starter text.  
I suppose there are more applications.

# Background

Examples of generating text from some starter seed text abound.  
For example, you can try it yourself at Hugging Face
     https://huggingface.co/gpt2
     
Often the generated text seems reasonable.

I was curious what would get generated if the seed were for example
* famous book beginnings
* movie quotes
* jokes, 
and other oddball starter strings.

Note that I am not transfering the model to any of these  
domains and retraining, so it is probably unreasonable to expect much sensible.

In any case, what follows is the result of this experiment.

Kudos to the Hugging face folks for making it very easy to get   
the required software installed and also for giving some great examples.

We will need the transformers package

In [1]:
!pip install transformers



In [2]:
import re
from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')
from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


## Useful functions

In [3]:
def split_start_end(gen_strings, start):
    """ split the string into the original start and generated part """
    res = {"start": start}
    res["gen"] = []
    for gstring in gen_strings:
        temp = gstring["generated_text"]
        gen = re.sub(start, '', temp)
        res["gen"].append(gen)
    return res

def pp_results(start_ends):
    """ format the start and ends for nicer printing. """
    res = "\n\n"
    for se in start_ends:
        res += f"""SEED: {se["start"]}\n"""
        for i, gen in enumerate(se["gen"]):
            gen = re.sub("\\n", "", gen)
            res += f"""gen {i}: {gen}\n"""
        res += "\n---"
    return res

def gen_endings(generator, starts, max_new_tokens=20, num_sequences=4):
    """ Generate endings from some starter seed text """
    start_ends = []
    for start in starts:
        gen_strings = generator(start, max_new_tokens = max_new_tokens, num_return_sequences=num_sequences,
                            pad_token_id=tokenizer.eos_token_id)
        start_ends.append(split_start_end(gen_strings, start))
    pp = pp_results(start_ends)
    return pp
 

## Book beginnings

In [4]:
starts = ["Call me Ishmael", 
                "It was the best of times, it was the worst of times,",
                "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days",
                "In the beginning God created the heavens and the earth."
                 ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Call me Ishmael
gen 0: !"A.G.: "Thank you for that!"B.B.: "
gen 1: " before his speech. He doesn't speak English. He doesn't know his words. He is
gen 2: . Have you ever seen him as he is? When? And what day of the week or day
gen 3: !" I say."Don't you feel a little uneasy?" she replies when she hears

---SEED: It was the best of times, it was the worst of times,
gen 0:  and I was in a better state of mind when I was there. I was a good student and
gen 1:  you know?" I say."You're going to talk about this because he is a
gen 2:  he's got to get out and make sure his teammates see him for what he is," Williams said
gen 3:  but still. I was there anyway."Oh fuck yeah, but it was good to

---SEED: He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days
gen 0:  without a boat and was the first, for as long as he was here any one would take notice
gen 1:  without one catch of it in four days (see "How the St. Louis Fishmen Catch an
gen 2:  

Of course, you get something different with every run,   
but I am somewhat impressed with the generated text for the Bible.
Also, at least one of the Hemmingway seed seems ok.

## Movie lines

In [5]:
starts = ["Frankly my dear, I don't give a damn",
          "May the Force be with you",
          "No Mr. Bond, I expect you to ",
          "Of all the gin joints in all the towns in all the world, ",
          "What we have here is a failure to "
              ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Frankly my dear, I don't give a damn
gen 0:  about the safety of your son, he shouldn't have. He could have died, for he had
gen 1:  how you're going to know the right thing about something.I know it was funny,
gen 2:  what the government thinks about you. That's all I'm really being told."Marl
gen 3: , I feel pretty guilty about all this, for I'm not a real person.My

---SEED: May the Force be with you
gen 0: , Jedi," said the Sith boy as he placed his head against the Dark Jedi's chest, "
gen 1:  forever…
gen 2: . If your mind does not obey their commands it will not live together."But, at
gen 3: ," the Captain pleaded with his daughter."But," the Admiral explained, pointing to the

---SEED: No Mr. Bond, I expect you to 
gen 0: _______ know me and what I am. It is for that one thing not so much because of your
gen 1: ************** make your choice. **************The next time you're in a party you
gen 2: 」「I'll take care of business with those men」「Oh, that
gen 3: iced it. I

## Yoda lines
Yes, they are movie lines as well

In [6]:
starts = ["Do or do not, there is no try",
            "Named must be your fear before banish it you can",
            "The greatest teacher, failure is",
            ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Do or do not, there is no try
gen 0:  to be a professional," she said.If you or someone you love or trust is bullied
gen 1: . But there is no way I can escape. It is better I am myself than let my feelings
gen 2: out for my character... I have no interest in my character or my job at all, I just
gen 3: . I promise if you do try I will never let you down. I've never had ANY chance

---SEED: Named must be your fear before banish it you can
gen 0:  still run and claim the kill. Note that I personally believe the list is a better one than "
gen 1:  pick it off right nowThe above information should give you an idea about the level of your
gen 2: 't be so sure that it still works without some very complicated changes to what you control to avoid it
gen 3:  easily destroy it. You can have the first minion with that turn, gain one life for each you

---SEED: The greatest teacher, failure is
gen 0:  not so terrible."(Read on.)This essay was updated with comments from Andrew
gen 1:  no g

## Yogi Berra quotes

In [7]:
starts = ["When you come to a fork in the road",
             "You can observe a lot by ",
             "It ain't over till it's",
             "No one goes there nowadays, it's",
             "If the world was perfect, ",
             "In theory there is no difference between theory and practice. In practice"]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: When you come to a fork in the road
gen 0:  with the money that you need as a freelancer (like being part of a successful web app)
gen 1: , don't get carried away. You'll burn up in mud and it'll ruin your life.
gen 2: , I'm not planning on driving the car to the parking lot. And I'll just wait here
gen 3: , there is another path that starts from the beginning.We are all about to find out

---SEED: You can observe a lot by 
gen 0:   the color of each letter; there are 3 possible combinations. This means I had
gen 1:  taking photographs of the road. At the bottom of the slide you will find a picture of a
gen 2:  watching a lot.'' It is only the last word in the story.The next time you
gen 3:  seeing your own eyes, your own body, my own body, and seeing the world, see

---SEED: It ain't over till it's
gen 0:  over, because he's in his head.''He wasn't alone.For several
gen 1:  over!""But they're talking now!""What's the matter, you
gen 2:  over." (James P. Hall, No. 1 in The Wor

## Joke beginnings


In [8]:
starts = ["A man walks into a bar",
           "A grasshopper walks into a bar",
              ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: A man walks into a bar
gen 0:  that features an attractive woman. They are having a party and her boyfriend breaks up and he decides to
gen 1: ber shop on March 15, 2017 in Los Angeles. Eric L. Sanchez/Getty Images
gen 2:  with a friend who had had a fight with some people around him. They're all out on the
gen 3:  and shouts at a woman to "keep going on this shit." She does not reply, until he

---SEED: A grasshopper walks into a bar
gen 0:  in the small town of Bautista about 30 miles southwest of San Diego. "There were the
gen 1:  and begins to drink. He tries to drink but the barkeep is unable to hold it, and
gen 2:  and walks up to them. "Hey, guys!" he says. He points to a sign for
gen 3: bershop and buys a bottle of white wine.In this photo, a man can be

---


## Politics and History

In [10]:
starts = ["Four score and seven years ago",
          "When in the Course of human events, it becomes necessary for one people to"
          "Let us therefore brace ourselves to our duty and so bear ourselves that if the British Commonwealth and Empire lasts for a thousand years men will still say,"," "
          "I would say to the House, as I said to those who have joined this government: 'I have nothing to offer but "
         ]
res = gen_endings(generator, starts, max_new_tokens=20, num_sequences=4)
print(res)



SEED: Four score and seven years ago
gen 0: .What's next? Well, it's not clear, since the deal will take place
gen 1: , he saw his career take off while others waited for him to find his footing. "Now I
gen 2: , I'd already made my first career, so I was in great shape and I knew it was
gen 3: , the idea of using the GAS is pretty old and far from being a reality. We only

---SEED: When in the Course of human events, it becomes necessary for one people toLet us therefore brace ourselves to our duty and so bear ourselves that if the British Commonwealth and Empire lasts for a thousand years men will still say,
gen 0:  the same thing as now is happening and yet there will be no trouble in coming.I
gen 1:  'Oh it came over the British Commonwealth and now we are here, not for this long, but
gen 2:  "There have been no wars in the world".Let us strive in this matter.For it does
gen 3: "the people are now living on earth;"[22]and so

---SEED:  I would say to the House, as I said to those 

## Summary
I did not know what to expect, so I could not really be disappointed.

The gun references from "A man walks into a bar" are slightly troubling.

All in all, I would say the results are amusing, especially  
for the jokes, Yoda and  Yogi Berra ... and  maybe even the Politics and History.