# Trump Speech Generator (tweepy, gpt-2)

* The purpose of this notebook is to explore using: (1) Tweepy for downloading data from the Twitter API; and (2) GPT-2 for generating new text.

**Step 1 of 5:** We will begin by installing the Python library "[tweepy](https://www.tweepy.org/)" to interact with the Twitter API.

In [1]:
!pip install tweepy

Collecting tweepy
  Downloading https://files.pythonhosted.org/packages/36/1b/2bd38043d22ade352fc3d3902cf30ce0e2f4bf285be3b304a2782a767aec/tweepy-3.8.0-py2.py3-none-any.whl
Collecting requests-oauthlib>=0.7.0 (from tweepy)
  Downloading https://files.pythonhosted.org/packages/c2/e2/9fd03d55ffb70fe51f587f20bcf407a6927eb121de86928b34d162f0b1ac/requests_oauthlib-1.2.0-py2.py3-none-any.whl
Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->tweepy)
[?25l  Downloading https://files.pythonhosted.org/packages/05/57/ce2e7a8fa7c0afb54a0581b14a65b56e62b5759dbc98e80627142b8a3704/oauthlib-3.1.0-py2.py3-none-any.whl (147kB)
[K     |████████████████████████████████| 153kB 9.8MB/s 
Installing collected packages: oauthlib, requests-oauthlib, tweepy
Successfully installed oauthlib-3.1.0 requests-oauthlib-1.2.0 tweepy-3.8.0


**Step 2 of 5:** We will also use the "user secrets" functionality on Kaggle in order to manage our private API keys.  You can manage your "secrets" in the "add-ons" menu in the upper-most file menu within the [notebook editor](http://www.kaggle.com/kernels).  To interact with the Twitter API you will need to have an ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, and CONSUMER_SECRET.

In [2]:
from kaggle_secrets import UserSecretsClient
ACCESS_TOKEN = UserSecretsClient().get_secret('ACCESS_TOKEN')
ACCESS_SECRET = UserSecretsClient().get_secret('ACCESS_SECRET')
CONSUMER_KEY = UserSecretsClient().get_secret('CONSUMER_KEY')
CONSUMER_SECRET = UserSecretsClient().get_secret('CONSUMER_SECRET')

**Step 3 of 5:** We also need to download the [GPT-2 repository]( https://github.com/graykode/gpt-2-Pytorch).

In [3]:
import os
!git clone https://github.com/graykode/gpt-2-Pytorch.git
os.chdir('./gpt-2-Pytorch')
!curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
!pip install -r requirements.txt

Cloning into 'gpt-2-Pytorch'...
remote: Enumerating objects: 1, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 130 (delta 0), reused 0 (delta 0), pack-reused 129[K
Receiving objects: 100% (130/130), 2.39 MiB | 0 bytes/s, done.
Resolving deltas: 100% (48/48), done.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  522M  100  522M    0     0  69.9M      0  0:00:07  0:00:07 --:--:-- 69.2M
Collecting regex==2017.4.5 (from -r requirements.txt (line 1))
[?25l  Downloading https://files.pythonhosted.org/packages/36/62/c0c0d762ffd4ffaf39f372eb8561b8d491a11ace5a7884610424a8b40f95/regex-2017.04.05.tar.gz (601kB)
[K     |████████████████████████████████| 604kB 3.5MB/s 
[?25hBuilding wheels for collected packages: regex
  Building wheel for regex (setup.py) ... [?25l- \ | / - \ | / - done
[?25h  Created wheel for regex: filena

**Step 4 of 5:** Next I will create a dataframe of tweets from @realDonaldTrump
* Twitter's [Developer Agreement and Developer Policy](https://developer.twitter.com/en/developer-terms/agreement-and-policy.html) are both very strict and include terms such as "If you provide Twitter Content to third parties, including downloadable datasets of Twitter Content or an API that returns Twitter Content, you will only distribute or allow download of Tweet IDs, Direct Message IDs, and/or User IDs.” In an attempt to comply with this policy I expose only recent tweets from @realdonaldtrump (which are already a part of the official Presidential Records).

In [4]:
import tweepy
import pandas as pd

def connect_to_twitter_OAuth():
    """adapted from https://towardsdatascience.com/my-first-twitter-app-1115a327349e"""
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
    api = tweepy.API(auth)
    return api

def extract_tweets(tweet_object):
    """adapted from https://towardsdatascience.com/my-first-twitter-app-1115a327349e"""
    tweet_list =[]
    for tweet in tweet_object:
        text = tweet.text
        created_at = tweet.created_at
        tweet_list.append({'created_at':created_at,'text':text,})
    df = pd.DataFrame(tweet_list, columns=[ 'created_at','text',])
    return df

In [5]:
api = connect_to_twitter_OAuth()
trump_tweets = api.user_timeline('realdonaldtrump')
df = extract_tweets(trump_tweets)
pd.set_option('display.max_colwidth', -1)
df.tail(10)

Unnamed: 0,created_at,text
9,2019-10-21 01:57:10,Congratulations Barbara! https://t.co/0cYrjL1YOj
10,2019-10-21 01:56:05,"RT @SecretaryRoss: Since the election of President Trump, the U.S. has added 6.4 million jobs, including 500,000 in the #manufacturing sect…"
11,2019-10-21 01:55:36,RT @SecretaryRoss: This action by the Commerce Department sends another clear message to the Cuban regime – that they must immediately ceas…
12,2019-10-21 01:54:14,RT @realDonaldTrump: It is ONLY about this! https://t.co/ZB5xPDKs4b
13,2019-10-21 01:52:11,RT @IngrahamAngle: Republicans in the Congress should be asked: why have you allowed the State Department to become an arm of the Democrat…
14,2019-10-21 01:48:35,"“To the Democrats, Impeachment is partisan politics dressed up as principle.” @SteveHiltonx @FoxNews"
15,2019-10-20 22:41:10,"So interesting that, when I announced Trump National Doral in Miami would be used for the hosting of the G-7, and t… https://t.co/tDDMw6CPOd"
16,2019-10-20 21:45:17,"....fiction to Congress and the American People? I demand his deposition. He is a fraud, just like the Russia Hoax… https://t.co/Hum7mHfNiV"
17,2019-10-20 21:45:17,....because their so-called story didn’t come even close to matching up with the exact transcript of the phone call… https://t.co/SUGMgGZeJ3
18,2019-10-20 21:45:17,"This Scam going on right now by the Democrats against the Republican Party, and me, was all about a perfect phone c… https://t.co/nXPfrl25ag"


**Step 5 of 5:** And finally I will take the list of tweets that we retrieved and I will feed a few of them into a [GPT-2](https://github.com/graykode/gpt-2-Pytorch) model to generate a full-blown speech.

In [6]:
!python main.py --text "To the Democrats, Impeachment is partisan politics dressed up as principle.  This Scam going on right now by the Democrats against the Republican Party, and me, was all about a perfect phone call."

Namespace(batch_size=-1, length=-1, nsamples=1, quiet=False, temperature=0.7, text='To the Democrats, Impeachment is partisan politics dressed up as principle.  This Scam going on right now by the Democrats against the Republican Party, and me, was all about a perfect phone call.', top_k=40, unconditional=False)
To the Democrats, Impeachment is partisan politics dressed up as principle.  This Scam going on right now by the Democrats against the Republican Party, and me, was all about a perfect phone call.
100%|█████████████████████████████████████████| 512/512 [00:29<00:00, 17.42it/s]
  I will tell you, it was a perfect call.  It went on for about 10 minutes, and the Democrats have called me on a number of different issues, including a number of issues that I really don't like.  They call me on that and that, and that, all around being very, very respectful. Then I'm told the Democrats have called me on the issue of abortion, and I'm told that the Democrats have called me on the iss

In [7]:
!rm -r /kaggle/working/*