In [18]:
!wget -nc https://raw.githubusercontent.com/ryanmcdermott/trump-speeches/master/speeches.txt

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [19]:
!pip install transformers



In [20]:
import os
import requests

# Define the URL and directory
url = 'https://raw.githubusercontent.com/ryanmcdermott/trump-speeches/master/speeches.txt'
directory = 'datastore'
filename = 'speeches.txt'
file_path = os.path.join(directory, filename)

# Ensure the directory exists
if not os.path.exists(directory):
    os.makedirs(directory)

# Download the content from the URL
response = requests.get(url)

# Write the content to the file in the 'datastore' directory
with open(file_path, 'wb') as file:
    file.write(response.content)

print(f"Content written to {file_path}")

Content written to datastore\speeches.txt


In [21]:
from transformers import pipeline, set_seed

import textwrap
import numpy as np
import matplotlib.pyplot as plt

from pprint import pprint

In [22]:
# Read the input file
with open('datastore/speeches.txt', 'r') as file:
    lines = file.readlines()

# Filter out empty lines and lines containing only whitespace
cleaned_lines = [line for line in lines if line.strip()]

# Write the cleaned lines to the output file
with open('datastore/speeches_cleaned.txt', 'w') as file:
    file.writelines(cleaned_lines)

In [23]:
lines = [line.rstrip() for line in open('datastore/speeches_cleaned.txt')]
lines = [line for line in lines if len(line) > 0]

In [26]:
gen = pipeline("text-generation", model="gpt2")

In [None]:
set_seed(1234)

In [45]:
lines[1]

"...Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it.  It's just not fair.  And I have to tell you I'm here, and very strongly here, because I have great respect for Steve King and have great respect likewise for Citizens United, David and everybody, and tremendous resect for the Tea Party.  Also, also the people of Iowa.  They have something in common.  Hard-working people.  They want to work, they want to make the country great.  I love the people of Iowa.  So that's the way it is.  Very simple."

In [46]:
gen(lines[1], max_length=150, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "...Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it.  It's just not fair.  And I have to tell you I'm here, and very strongly here, because I have great respect for Steve King and have great respect likewise for Citizens United, David and everybody, and tremendous resect for the Tea Party.  Also, also the people of Iowa.  They have something in common.  Hard-working people.  They want to work, they want to make the country great.  I love the people of Iowa.  So that's the way it is.  Very simple. All right on. So"},
 {'generated_text': "...Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it.  It's just not fair.  And I have to tell you I'm here, and very strongly here, because I have great respect for Steve King and have great respect likewise for Citizens United, David and everybody, and tremendous resect for the Tea Party.  Also, also the people of Iow

In [47]:
pprint(_)

[{'generated_text': "...Thank you so much.  That's so nice.  Isn't he a great "
                    "guy.  He doesn't get a fair press; he doesn't get it.  "
                    "It's just not fair.  And I have to tell you I'm here, and "
                    'very strongly here, because I have great respect for '
                    'Steve King and have great respect likewise for Citizens '
                    'United, David and everybody, and tremendous resect for '
                    'the Tea Party.  Also, also the people of Iowa.  They have '
                    'something in common.  Hard-working people.  They want to '
                    'work, they want to make the country great.  I love the '
                    "people of Iowa.  So that's the way it is.  Very simple. "
                    'All right on. So'},
 {'generated_text': "...Thank you so much.  That's so nice.  Isn't he a great "
                    "guy.  He doesn't get a fair press; he doesn't get it.  "
           

In [49]:
def wrap(x):
    return textwrap.fill(x, replace_whitespace=False, fix_sentence_endings=True)

In [50]:
out = gen(lines[1], max_length=150, num_return_sequences=3)
print("\n".join(wrap(x['generated_text']) for x in out))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


...Thank you so much.  That's so nice.  Isn't he a great guy.  He
doesn't get a fair press; he doesn't get it.  It's just not fair.  And
I have to tell you I'm here, and very strongly here, because I have
great respect for Steve King and have great respect likewise for
Citizens United, David and everybody, and tremendous resect for the
Tea Party.  Also, also the people of Iowa.  They have something in
common.  Hard-working people.  They want to work, they want to make
the country great.  I love the people of Iowa.  So that's the way it
is.  Very simple.  And we have all
...Thank you so much.  That's so nice.  Isn't he a great guy.  He
doesn't get a fair press; he doesn't get it.  It's just not fair.  And
I have to tell you I'm here, and very strongly here, because I have
great respect for Steve King and have great respect likewise for
Citizens United, David and everybody, and tremendous resect for the
Tea Party.  Also, also the people of Iowa.  They have something in
common.  Hard-work

In [56]:
prev = "Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it."
out = gen(prev + "\n" + lines[2], max_length=250)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't
get a fair press; he doesn't get it.
With that said, our country is
really headed in the wrong direction with a president who is doing an
absolutely terrible job.  The world is collapsing around us, and many
of the problems we've caused.  Our president is either grossly
incompetent, a word that more and more people are using, and I think I
was the first to use it, or he has a completely different agenda than
you want to know about, which could be possible.  In any event,
Washington is broken, and our country is in serious trouble and total
disarray.  Very simple.  Politicians are all talk, no action.  They
are all talk and no action.  And it's constant; it never ends.
A great
way that I think we've done more to help end our deficit and end our
deficit in the final years of this new Republican Administration.  Not
just by having people raise taxes and lower spending, but by ending
the so-called "gold standard" of our

In [57]:
prev = "Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't get a fair press; he doesn't get it."
out = gen(prev + "\n" + lines[4], max_length=250)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Thank you so much.  That's so nice.  Isn't he a great guy.  He doesn't
get a fair press; he doesn't get it.
You look at Obamacare.  A total
catastrophe and by the way it really kicks in in '16 and it is going
to be a disaster.  People are closing up shops.  Doctors are quitting
the business.  I have a friend of mine who's a doctor, a very good
doctor, a very successful guy.  He said, I have more accountants than
I have patients.  And he needs because it is so complicated and so
terrible and he's never had that before and he's going to close up his
business.  And he was very successful guy.  But it's happening more
and more.  And more and more.  All these things are getting solved and
I'd like to be part of the solution.  And I want to get it done.  But
so many years ago, you wouldn't have known what this bill was trying
to do.  And you wouldn't have believed what sort of legislation it was
fighting.  It was trying to be something very complex and I'm glad I
don't see a great deal of th

In [None]:
# How can I apply GPT-2 to generate text based on a given prompt? and as a business analyst, how can I use this to generate text for a report?

In [59]:
prompt = "Neural networks with attention have been used with great success in natural language processing tasks such as machine translation, text summarization, and question answering. They have also been applied to image captioning and speech recognition. In this notebook, we will explore how to use the GPT-2 model for text generation."
out = gen(prompt, max_length=250)
print(wrap(out[0]['generated_text']))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Neural networks with attention have been used with great success in
natural language processing tasks such as machine translation, text
summarization, and question answering.  They have also been applied to
image captioning and speech recognition.  In this notebook, we will
explore how to use the GPT-2 model for text generation.  First, let
K1, H1, B, and L be an ensemble of words with high level (and low
level) complexity.  We will use K1 and H1 as the inputs to the model
and a vector corresponding to a given image.  We will also use K1 with
low-level similarity function, F1, and L for both input and output.
Finally, we will use K1, H1, B, and L to represent images in the
context of images with the following constraints:

A model will only
output a series of N sentences at the point on the vector, with the
same length length in both directions.  All text on a line at the C1
endpoints will have the same height and width in both directions.

The
text will be a string at least M, with th