### Articulate the main goal of your project

Create a neural network that takes input (via a tweet) and returns a tweet response, seeded from that input, in the style of Donald Trump.

### Outline your proposed methods and models

Intend to use a character-level RNN for tweet generation

Looking into Markov chains as well as other RNN implementations

### Define the risks & assumptions of your data

Assumptions:

Trump only tweets via 'Twitter for Android'.  All other tweets are from staff.

Downloaded twitter archive is complete (author has done extensive testing/website is the premier archive on the internet)

Risks:

Trump may have dictated some tweets to staff who might have posted using other devices; these will be missed with the current setup.

Trump used the old retweet style for several months, which embeds another user's tweet text within his own.  This will skew the 'voice' of the generated tweets but since Trump only retweets things he agrees with, I think these are still valid to use as input.

### Revise initial goals & success criteria, as needed

Twitter infrastructure:

Currently have 

### Perform & summarize the EDA of your data

In [2]:
import pandas as pd
import numpy as np
import re

In [40]:
list_of_dfs = []

In [55]:
# https://github.com/bpb27/trump-tweet-archive/tree/master/data/realdonaldtrump
# data gleaned from above on 4/5/17

for year in range(2009,2018):
    df = pd.read_json('data/realdonaldtrump/%s.json' % year)
    list_of_dfs.append(df)

In [56]:
df = pd.concat(list_of_dfs, axis=0)

In [57]:
df

Unnamed: 0,created_at,favorite_count,id_str,in_reply_to_user_id_str,is_retweet,retweet_count,source,text
0,2009-12-23 17:38:18,,6971079756,,False,,Twitter Web Client,From Donald Trump: Wishing everyone a wonderfu...
1,2009-12-03 19:39:09,,6312794445,,False,,Twitter Web Client,Trump International Tower in Chicago ranked 6t...
2,2009-11-26 19:55:38,,6090839867,,False,,Twitter Web Client,Wishing you and yours a very Happy and Bountif...
3,2009-11-16 21:06:10,,5775731054,,False,,Twitter Web Client,Donald Trump Partners with TV1 on New Reality ...
4,2009-11-02 14:57:56,,5364614040,,False,,Twitter Web Client,"--Work has begun, ahead of schedule, to build ..."
5,2009-10-27 15:31:48,,5203117820,,False,,Twitter Web Client,"--From Donald Trump: ""Ivanka and Jared’s weddi..."
6,2009-10-22 13:57:04,,5069623974,,False,,Twitter Web Client,"Hear Donald Trump discuss big gov spending, ba..."
7,2009-10-14 14:13:17,,4862580190,,False,,Twitter Web Client,Watch video of Ivanka Trump sharing business a...
8,2009-10-05 14:37:38,,4629116949,,False,,Twitter Web Client,- Read what Donald Trump has to say about daug...
9,2009-09-29 15:28:23,,4472353826,,False,,Twitter Web Client,"""A lot of people have imagination, but can't e..."


In [58]:
### it has been theorized that tweets from 'Twitter for Android' are Donald himself, the rest from staff
df.source.value_counts()

Twitter for Android         29090
Twitter Web Client          24270
Twitter for iPhone           5246
TweetDeck                     966
TwitLonger Beta               810
Instagram                     266
Facebook                      210
Twitter for BlackBerry        194
Twitter Ads                   134
Mobile Web (M5)               112
Twitlonger                     46
Twitter for iPad               44
Vine - Make a Scene            20
Twitter QandA                  20
Periscope                      14
Neatly For BlackBerry 10       10
Media Studio                    4
Twitter for Websites            2
Twitter Mirror for iPad         2
Name: source, dtype: int64

In [59]:
df_android = df[df.source == 'Twitter for Android'].copy()

In [60]:
df_android

Unnamed: 0,created_at,favorite_count,id_str,in_reply_to_user_id_str,is_retweet,retweet_count,source,text
0,2013-12-31 22:21:51,,418145002255302656,,False,,Twitter for Android,"Have a happy, successful and healthy New Year!"
1,2013-12-31 22:19:28,,418144399252795392,,False,,Twitter for Android,"When it comes to money, finance and even life,..."
5,2013-12-31 11:48:59,,417985734168285184,,False,,Twitter for Android,Spend your last day of 2013 contemplating the ...
6,2013-12-31 00:44:02,,417818392826232832,,False,,Twitter for Android,The con artists changed the name from GLOBAL W...
7,2013-12-31 00:34:40,,417816035107299328,,False,,Twitter for Android,What the hell is going on with GLOBAL WARMING....
17,2013-12-30 03:07:44,,417492171244449792,,False,,Twitter for Android,"""@alefx33: Nelson Mandela and @realDonaldTrump..."
18,2013-12-29 23:08:10,,417431882461347840,,False,,Twitter for Android,Temperature at record lows in many parts of th...
19,2013-12-28 22:51:14,,417065230381105152,,False,,Twitter for Android,In the upcoming New Year we will focus like ne...
20,2013-12-28 22:47:02,,417064175983403008,,False,,Twitter for Android,We're coming up on the NEW YEAR-It is really i...
21,2013-12-28 12:37:03,,416910668798111744,,False,,Twitter for Android,The global warming scientists don't want to b...


In [61]:
android_tweet_text = df_android.text
android_tweet_text

0         Have a happy, successful and healthy New Year!
1      When it comes to money, finance and even life,...
5      Spend your last day of 2013 contemplating the ...
6      The con artists changed the name from GLOBAL W...
7      What the hell is going on with GLOBAL WARMING....
17     "@alefx33: Nelson Mandela and @realDonaldTrump...
18     Temperature at record lows in many parts of th...
19     In the upcoming New Year we will focus like ne...
20     We're coming up on the NEW YEAR-It is really i...
21     The global warming  scientists don't want to b...
22     We should be focused on clean and beautiful ai...
23     The rescue icebreaker, trying to free the ship...
24     "@HumorInstitute: I could sleep tonight if you...
25     Will be working with contractors at Trump Nati...
26     "@ProudlySA: Thank you for your response - we ...
27     @ProudlySA  As a major fan of Nelson Mandela a...
30     It is really too bad that the scientists study...
35     O.K., Christmas is over,

In [62]:
# import string

# s = 'testing"" 1 3 : : @realdonald'
# printable = set(string.printable)

# ''.join(filter(lambda x: x in string.printable, s))

'testing"" 1 3 : : @realdonald'

In [63]:
# removing unicode/ascii chars that were not agreeing with np.savetxt
android_tweet_text = android_tweet_text.apply(lambda x: ''.join(filter(lambda x: x in string.printable, x)))

In [64]:
# removing urls (in testing, were not being output intellignetly by the neural net)
android_tweet_text = android_tweet_text.apply(lambda x: re.sub(r'http\S+', '', x))

In [65]:
android_tweet_text.values[:30]

array(['Have a happy, successful and healthy New Year!',
       'When it comes to money, finance and even life, PROTECT THE DOWNSIDE AND THE UPSIDE WILL TAKE CARE OF ITSELF!',
       'Spend your last day of 2013 contemplating the moves you will make in 2014 to make it your best year ever!',
       'The con artists changed the name from GLOBAL WARMING to CLIMATE CHANGE when GLOBAL WARMING was no longer working and credibility was lost!',
       'What the hell is going on with GLOBAL WARMING. The planet is freezing, the ice is building and the G.W. scientists are stuck-a total con job',
       '"@alefx33: Nelson Mandela and @realDonaldTrump two world\'s leaders ',
       'Temperature at record lows in many parts of the country. 50 degrees below zero with wind chill in large area. Global warming folks iced in!',
       'In the upcoming New Year we will focus like never before - if we do that we will have complete and total VICTORY in all we do!',
       "We're coming up on the NEW YEAR-It

In [73]:
# remove twitter usernames @ (many generated tweets were just @'s because of real-life Trumps propensity to use the old-school RT method)
android_tweet_text = android_tweet_text.apply(lambda x: re.sub(r'(?<=^|(?<=[^a-zA-Z0-9-\.]))@([A-Za-z0-9_]+)', '', x))

In [74]:
# second run through to catch usernames
android_tweet_text = android_tweet_text.apply(lambda x: re.sub(r'(?<=^|(?<=[^a-zA-Z0-9-\.])).@([A-Za-z0-9_]+)', '', x))

In [75]:
#removing artifacts from RTs

exclude = (':', '"')
s = ''.join(ch for ch in s if ch not in exclude)

android_tweet_text = android_tweet_text.apply(lambda x: ''.join(ch for ch in x if ch not in exclude))
android_tweet_text.head(100)

0         Have a happy, successful and healthy New Year!
1      When it comes to money, finance and even life,...
5      Spend your last day of 2013 contemplating the ...
6      The con artists changed the name from GLOBAL W...
7      What the hell is going on with GLOBAL WARMING....
17              Nelson Mandela and  two world's leaders 
18     Temperature at record lows in many parts of th...
19     In the upcoming New Year we will focus like ne...
20     We're coming up on the NEW YEAR-It is really i...
21     The global warming  scientists don't want to b...
22     We should be focused on clean and beautiful ai...
23     The rescue icebreaker, trying to free the ship...
24      I could sleep tonight if you told me we will ...
25     Will be working with contractors at Trump Nati...
26      Thank you for your response - we live in a be...
27       As a major fan of Nelson Mandela and the peo...
30     It is really too bad that the scientists study...
35     O.K., Christmas is over,

In [76]:
# save with line breaks by tweet
np.savetxt('trump_android_tweets.txt', android_tweet_text.values, fmt="%s")