In [2]:
import tweepy #https://github.com/tweepy/tweepy
import csv
import re
import sys
import os
%matplotlib inline

Set up API and such

In [3]:
#Twitter API credentials
consumer_key = "CONSUMER_KEY"
consumer_secret = "CONSUMER_SECRET"
access_key = "ACCESS_KEY"
access_secret = "MAKE_AND_MODEL_OF_FIRST_CAR"



OAUTH_KEYS = {'consumer_key':consumer_key, 'consumer_secret':consumer_secret,
 'access_token_key':access_key, 'access_token_secret':access_secret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])

# In order to manage the rate limiting, use these options below. 
# You will find later that rate limiting is, well, a limiting factor.

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

So what we want to do here is generate a corpus of text, from which the markov model can generate probabilities. The steps look like this:

1. Collect the data
2. 'Clean' the data
3. Combine the data
4. Apply Markov model
5. Generate some example tweets!

First I want to collect tweets from all the 'friends' of the account:

In [5]:

THECORPUSOFMYMIND = []
thought_leaders = []
user = 'infosec_truths'
print("importing thought leadership from friends of {}".format(user))
for friend in api.friends_ids(user):
    sn = api.get_user(friend).screen_name
    thought_leaders.append(sn)

    

for thought_leader in thought_leaders:
    #print("importing thought leadership from {}".format(thought_leader))
    tweets = api.user_timeline(screen_name =thought_leader,count=1000)
    for tweet in tweets:
            if not tweet.retweeted:
                THECORPUSOFMYMIND.append(tweet.text)

importing thought leadership from friends of infosec_truths
importing thought leadership from derekcoulson
importing thought leadership from ByrneGh
importing thought leadership from BartInglot
importing thought leadership from 3dRailForensics
importing thought leadership from TunnelsUp
importing thought leadership from spresec
importing thought leadership from mttcrns
importing thought leadership from BarryV
importing thought leadership from ISecPlayasClub
importing thought leadership from MikeOppenheim
importing thought leadership from jepayneMSFT
importing thought leadership from gentilkiwi
importing thought leadership from harmj0y
importing thought leadership from markrussinovich
importing thought leadership from vysecurity
importing thought leadership from PyroTek3
importing thought leadership from jaredhaight
importing thought leadership from robknake
importing thought leadership from RobertMLee
importing thought leadership from cnoanalysis
importing thought leadership from JohnH

Then I wanted to remove not-thought-leading tweets, like Re-tweets, @ mentions, and stuff with useful links in them. I want just the pure, text-only, stream-of-consciousness thought leadership that makes infosec twitter great.

In [6]:
for i in range(3):
    for tweet in THECORPUSOFMYMIND:
        if "@" in tweet:
            THECORPUSOFMYMIND.remove(tweet)
    
http_regex = r"http.+?(?=[ \"\']|$|\n)"
at_regex = r"@.+?(?= |$)"
RT_regex = r"^RT.+?(?=[^ ])"


for tweet in THECORPUSOFMYMIND:
    idx = THECORPUSOFMYMIND.index(tweet)
    tweet = re.sub(http_regex, '', tweet)
    tweet = re.sub(at_regex, '', tweet)
    tweet = re.sub(RT_regex, '', tweet)
    THECORPUSOFMYMIND[idx] = tweet

In [7]:
def clean_tweets(tweetlist):
    for tweet in tweetlist:
        idx = tweetlist.index(tweet)
        tweet = re.sub(http_regex, '', tweet)
        tweet = re.sub(at_regex, '', tweet)
        tweet = re.sub(RT_regex, '', tweet)
        tweetlist[idx] = tweet
    return tweetlist

Take a sneak peek just to make sure we're getting what we want:

In [34]:
THECORPUSOFMYMIND[:10]

[' So many small security software companies use it...',
 '"Our product will fix the problem that is thought to have caused X beach"  is a misleading sales pitch, and also re… ',
 'You should not be doing anything sexy until you can tell me that AV is installed on each endpoint and have mechanis… ',
 'somewhere in China a forensic analyst just found out that he wasted days and nights analyzing old systems ',
 '  True, having the data to search through is a great thing.',
 '#braggingrights ',
 'Once a transient city, over the last few years people have fallen in love with it -  buying homes and investing in… ',
 ' Later they would find a way to get our military technology without putting lives at risk...',
 'I want to know the collective time wasted by a single milkshake order at Potbelly during the lunch rush.',
 'This is going to exacerbate my issues with session hoarding. ']

Then I join the corpus into a single document I can use to generate the markov model.

In [27]:
FINALCORPUS = " ".join(THECORPUSOFMYMIND)

In [28]:
FINALCORPUS = FINALCORPUS.replace('\r', '').replace('\n', '')

In [29]:
ASCIICORPUS = str(FINALCORPUS.encode('ascii',errors='ignore'))

In [36]:
ASCIICORPUS[:1000]

'b\' So many small security software companies use it... "Our product will fix the problem that is thought to have caused X beach"  is a misleading sales pitch, and also re  You should not be doing anything sexy until you can tell me that AV is installed on each endpoint and have mechanis  somewhere in China a forensic analyst just found out that he wasted days and nights analyzing old systems    True, having the data to search through is a great thing. #braggingrights  Once a transient city, over the last few years people have fallen in love with it -  buying homes and investing in   Later they would find a way to get our military technology without putting lives at risk... I want to know the collective time wasted by a single milkshake order at Potbelly during the lunch rush. This is going to exacerbate my issues with session hoarding.  And couldn\\\'t manage to steal a better metal than bronze  Couple is watching videos on phone without headphones. Flight attendant hands them free e

In [31]:
#add to existing corpus document
text_file = open("corpus.txt", "w")
text_file.write(ASCIICORPUS)
text_file.close()

For the Markov model, I use 'markovify', which I found from this @ChrisAlbon page: https://chrisalbon.com/python/other/generate_tweets_using_markov_chain/ 

Sidenote: if you don't use Chris Albon pages to learn python/data science, you should.

Now the fun part! Let's thought lead!

In [37]:
import markovify

# Get raw text as string.
with open("corpus.txt") as f:
    text = f.read()

# Build the model.
text_model = markovify.Text(text)

# Print five randomly-generated sentences

for i in range(5):
    print(text_model.make_short_sentence(280))
    print()

Trying to pull together a list of awesome people to follow on twitter, there are so many or bec This time with the FBI that they have standup desks!

Look at the sample yet, but here\'s a more complete list:0: Spiders Outcomes are what matter.

Nice detail about the challenges and their host &amp; network artifacts and behaviors!

If you missed my SANS webcast on key Sysinternals tools for data analysis are on LinkedIn, so I read body language very very well...

The Enterprise ATT&amp;CK site has been missing a page on the best security on the tips Tips for searching for rogue processes:It started with the countless sacrifices of others you better have a narrow aperture, it can lead you to believe something is incredibly rare.



## Next Steps:

As you can see, the markov approach _kind of_ works...but makes a lot of nonsense. That's partly because the markov model only cares about the next word, not the whole sentence or thought. To build a model more aware of the structure of language, other folks have used neural networks, perhaps the most hilarious/ infamous being @deepdrumpf: https://twitter.com/deepdrumpf

News here: 

https://www.forbes.com/sites/janetwburns/2016/10/19/deepdrumpf-is-an-uncanny-twitterbot-thats-fundraising-for-girls-in-stem/#1f078fa649da

https://www.theguardian.com/technology/2016/mar/04/donald-trump-deep-drumpf-twitter-bot

This seems like the logical next step for infosec_truths. There are a bunch of tutorials online as well:

https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/

https://dzone.com/articles/generating-tweets-using-a-recurrent-neural-net-tor
