# Exploring Elon Musk Tweets

If the Internet is to be believed, Elon Musk is the smartest person in the universe. Musk is a South African -born, Silicon Valley -raised technology CEO and innovator who juggles managing a portfolio of half a dozen innovate companies of (or involving) his own making: Tesla, SpaceX, Solar City, OpenAI etcetera. His greatest ambition at the moment? No less than to put man on Mars.

In [None]:
import pandas as pd
tweets = pd.read_csv("../input/data_elonmusk.csv", encoding='latin1')
tweets = tweets.assign(Time=pd.to_datetime(tweets.Time)).drop('row ID', axis='columns')
tweets.head(3)

## Tweet patterns

Musk engages on Twitter very heavily. To start with, let's look at his Tweet patterns in time.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style("dark")

(tweets
     .set_index('Time')
     .groupby(pd.TimeGrouper('1D'))
     .Tweet
     .count()
     .value_counts()
     .sort_values(ascending=False)
).plot.bar(figsize=(14, 7), fontsize=16, color='lightcoral')
plt.gca().set_title('@elonmusk number of tweets per day', fontsize=20)

Unlike a certain president, Musk actually spaces his tweets out quite a bit. He only tweets at least once every three days or so.

In [None]:
(tweets.Time
     .dt
     .hour
     .value_counts()
     .sort_index()
).plot.bar(figsize=(14, 7), fontsize=16, color='lightcoral')
plt.gca().set_title('@elonmusk tweets per hour of day', fontsize=20)

Surprisingly (but unsurprisingly also), Musk's tweets do not quite go silent in the dead of night. While his 4 AM tweet volume is nowhere near where it is at midday, it never stops!

In [None]:
d = (tweets
     .set_index('Time')
     .groupby(pd.TimeGrouper('1D'))
     .Tweet
     .count()
     .sort_index()
     .reset_index()
    )
fig = plt.figure(figsize=(14, 7))
ax = plt.gca()
sns.regplot(d.index.values, d.Tweet.values, ax=ax, color='lightcoral')
ax.set_title('@elonmusk tweets per day of year', fontsize=20)

It does seem like his Tweet volume is going up over time.

## Tweet characteristics

Now let's look at some structural tweet characteristics.

In [None]:
tweets['Retweet from'].notnull().value_counts() / len(tweets)

In [None]:
tweets['Retweet from'].value_counts().head(20).plot.bar(
    figsize=(14, 7), fontsize=16, color='lightcoral'
)
plt.gca().set_title('@elonmusk top retweet sources', fontsize=20)
plt.gca().set_xticklabels(plt.gca().get_xticklabels(), rotation=45, ha='right', fontsize=16)
pass

Musk retweets only one in six times. When he does, he strongly favors his companies, NASA, and Wired Magazine.

In [None]:
tweets.Tweet.str.contains('https://').value_counts() / len(tweets)

A quarter of Musk tweets include URLs...

In [None]:
tweets.Tweet.str.contains('@').value_counts() / len(tweets)

...while a third tag or are replies to other users. How's that for engaging with a billionare! Who gets this privilege?

In [None]:
import itertools

c = list(
itertools.chain(
    *tweets.Tweet.map(lambda t: [handle.replace(":", "")[1:] for handle in t.split(" ") 
                            if '@' in handle.replace(":", "")]).tolist())
)

pd.Series(c).value_counts().head(20).plot.bar(
    figsize=(14, 7), fontsize=16, color='lightcoral'
)
plt.gca().set_title('@elonmusk top user tags', fontsize=20)
plt.gca().set_xticklabels(plt.gca().get_xticklabels(), rotation=45, ha='right', fontsize=16)
pass

There are some interesting names here. John Carmack is a genius in his own right (now at Oculus, famous in the 90s for having been the principal programmer on DOOM). Fredric Lambert is editor of a specialty eletric vehicle car magazine, while Phil Plaite (@BadAstronomer) is Internet famous. Neat!

## Things on the mind

Just for fun, let's see what we can dig up with a super-simple word tokenization on these tweets. What does Elon Musk think about various topics de jour? Let's see straight from the source!

In [None]:
from nltk import word_tokenize
tokens = tweets.Tweet.map(word_tokenize)

def what_does_elon_think_about(x):
    x_l = x.lower()
    x_t = x.title()
    return tweets.loc[tokens.map(lambda sent: x_l in sent or x_t in sent).values]

In [None]:
what_does_elon_think_about('Trump').Tweet.values.tolist()

In [None]:
what_does_elon_think_about('oil').Tweet.values.tolist()

In [None]:
what_does_elon_think_about('life').Tweet.values.tolist()

That's all folks!

Hopefully this kernel gives you a good sense of what this dataset is like. To explore further, I highly recommend trying to throw some actual NLP techniques at this dataset. What do you get when you do TF-IDF, for example?

And if you *really* want to have fun, try building a classifier to distinguish Musk tweets from Trump tweets. If I don't do it first!