# <u>Exploring the effect that Elon Musk's tweets have on the stock price of Tesla</u>

Authors: Trinity Lee and Sidney Taylor

## Why does this matter?
As social media has become easier to access over the years, sites like Twitter have become increasingly popular by executive figures, as they can voice their own opinions in a more casual manner, and interact with a fanbase that they have built through their companies. The most prolific and active example of this at the moment is Elon Musk. He has ammased a twitter following of over 130 million users, at least 10% of which interact with each of his tweets. This naturally places him directly in the eye of the public, and gives him an even bigger influence on society. Being the CEO and Co-Founder of Tesla, everything he says and does has the potential to create a ripple effect within the EV Giant. In particular, his activity on Twitter has been speculated to cause the stocks of his companies to be negatively affected, as investors lose confidence in Elon.

Looking at whether his tweets specifically caused this dip is interesting, as it would provide decent evidence towards a claim of Musk intentionally tweeting specific things to benefit his own financial position. This raises ethical and legal questions about the behavior and actvity of CEO's on social media platforms. Some may argue that they should be able to say whatever they want, while others will want to protect their investments and limit the comments of CEO's on partisan and other dividing issues.

In this essay, we will use public tweet data to attempt to come to a reasonable conclusion on the effect that Elon Musk's tweets actually have on the stock price of Tesla.

We hope to answer the question:
<br/>To what extent is the stock price of Tesla affected by the content of Elon Musk's tweets?

## Hypothesis
Based on a range of news stories and other compelling studies done into subject revolving company optics, we can do our best to make a prediction on whether there is a direct correlation between the decrease in stock price of Tesla, and the content of Elon Musk's tweets around that time period. In the past, Musk has been penalized by overseeing authorities for tweets that have cost his investors substantial losses. This suggests the fact that there is at least a small correlation that he and disiplinary bodies such as the SEC are aware of, however the actual correlation is harder to pinpoint. This is because any direct statement that suggests Musk tweeted knowing he would affect the stock price could be treated as market manipulation, which is a much more serious accusation and requires much more thorough investigation. For the purpose of this study, we are only exploring whether the change took place due to the tweets, not whether it was intentional or not.

With the previous and ongoing cases surrounding Elon Musk's tweets, as well as the real time stock prices of Tesla being heavily volatile for numerous reasons, it is reasonable to predict that the content of Elon Musk's tweets have a decent impact on the stock price of Tesla.

## Methodology
This exploration will be broken down into several stages. First we will explore the `snscrape` library, and how it can be used to scrape tweet data. There are many reasons why direct api calls are risky to use with twitter in particular, which need to be addressed for context. We will then use this library to scrape all of Elon Musk's tweets after 2017. 2018 as the starting year will be justified during this essay, but the overall arguement stems from the fact that out of Musk's 23000 tweets, 20000 of them are from 2018 and beyond. He was also not as vocal about Tesla on social media before that time period, so all the tweets in that time will not provide us useful data for our exploration.

After using the `snscrape` library, we will then generate some python functions to filter this data in different ways. We will then be left with dataframes and processed csvs that we can use for the next stage.

We will then look at using the `stockplot` and `yfinance` libraries to scrape the stock prices of the relevant indicies. In particular, indicies relating to tech companies, top 100 companies, and EV indicies will be useful.

We will then determine a threshold for what constitutes a significant drop in stock price, and generate a list of dates of interest.

After looking at the content of the tweets made on these days, we can then conduct small event studies on specific cases that we know had consequences for Elon Musk, to verify the validity of our method and parameters.

## Making sense of snscrape

Run the following cells to make sure that you are using the latest versions of the functions and that all the required libraries are imported.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import pandas as pd
from twitscrape import *
import snscrape.modules.twitter as sntwitter

To adequately test the different parts of snscrape, you also need to have the library installed and updated to its latest version. This is because updates to the formatting of the Twitter site can happen at any time, and the library is constantly being updated to reflect these format changes.

You can install snscrape by running the following command in a conda environment:
<br/> `pip install snscrape`

Make sure it is updated by running the following command:
<br/> `pip install --upgrade snscrape`

## Why snscrape?
Over the recent years, Twitter API has gotten a lot worse. Most features have been paywalled, and the ones that are available are heavily rate limited so that the servers don't get overloaded with network traffic. While this is a good thing in theory for stability, it is not ideal for people who want to scrape tweets from the site and make comments on any potential correlations. Using snscrape is a good way to harvest data without directly calling from the API. The only issue is that there is zero official documentation on how to use it in a python wrapper environment, as it was intended to be used in a shell environment. The following sections will detail the discovery process that we went through in order to learn about the different modules that snscrape has, and will go through how to scrape tweets and other metadata from a profile, using helper functions.

Note: This library was created by the user "JustAnotherArchivist" on GitHub, and the repository is linked [here](https://github.com/JustAnotherArchivist/snscrape).

## Looking at different modules

Some modules include:
* TwitterProfileScraper
* TwitterTweetScraper
* TwitterUserScraper
* TwitterTrendsScraper

For the purposes of our exploration, we want to look at TwitterUserScraper, as it doesnt include other people mentioning the user that we are requesting the search for. 

You can look at all the contents that it gets from the most recent tweet in a profile, and from that we can grab all of the useful data we need and manipulate it however we want.

Below is a cell that will grab the data of the latest tweet from NAME's twitter profile, try it if you want and see all the info we can use.

In [None]:
NAME = "jack" #paste the twitter handle here without the '@'
user_scraper = sntwitter.TwitterUserScraper(NAME).get_items()
for tweet in user_scraper:
    break
tweet

As you can see, various metadata is captured by the generator and stored in different modules. For example, `date` gives you the date and time that the tweet happened, retweetCount and likeCount give you the number of retweets and likes that a tweet has gotten respectively. We can call these values individually by calling the iterable variable, in this case `tweet`, followed with a dot and the field that we want to look at.

`tweet.displayname` returns the display name of the twitter user, while `tweet.followersCount` tells you the amount of followers that the user had at the time of tweeting.

## Pulling specific data from a tweet

In `twitscrape.py`, there is a function called `get_tweet` which returns the content of a tweet using the rawContent module.

Feel free to test grabbing the content of a tweet from any account using the code cell below.

If the tweet is an image, a shortened hyperlink to the tweet will be the output.

In [None]:
NAME = "jack"
tweet = get_tweet(NAME)
print(tweet)

You can call multiple elements from the scraper generator at once, and print their values in a list to see a the content of a tweet, as well as its key metadata.

The `get_tweet_data` function does this and returns the date, tweet, likes and retweets for the latest tweet on a specified profile.

In [None]:
NAME = "jack"
tweet_data = get_tweet_data(NAME)
print(tweet_data)

The date is in a weird format, however, if we were to print this extrated date separately, it would look normal. This is because the date is being stored as a datetime object. Date time objects store each piece of information about the date and time as separate integers, and they are converted and output based on a specified output format. These objects can be used to call different parts of the object such as the date only, if you want to compare them to a different date and time. This is an approach we can take to compare the day of a tweet to the day of a stock price. Use [this](https://www.listendata.com/2019/07/how-to-use-datetime-in-python.html#id-886a0d) link to learn more about datetime objects.

For this part of the exploration, it is best to convert the datetime object into a string, which can then be broken down into aspects for date and time manually. Later in the exploration we will work with datetime objects directly, but since this data will be exported to a csv and implicitly converted to a string anyways, working with strings is easier in this case. The `datetime` class has a function which can do this, and below you can see its implementation.

(For simplicity, a `get_tweet_date` function was defined that gets the date of the latest tweet from a profile.)

In [None]:
NAME = "jack"
dates = get_tweet_date(NAME)
print(dates)
date = dates.strftime("%d-%m-%Y")
time = dates.strftime("%H:%M:%S")
date_type=type(date)
time_type=type(time)
print(f"The date of this tweet is {date} and it is a {date_type}.")
print(f"The time of this tweet is {time} and it is a {time_type}.")

## Scraping tweets within a certain timeframe.
Using the basis of the previous functions, the elements can be combined to scrape a profile for a certain number of tweets. It can then later be customized to allow for the number of tweets scraped to be determined, as well as all the tweets tweeted in a certain month. The function `get_thou_tweets` starts by taking the twitter handle as an input, and stores the list of the last 1000 tweets in a csv called `"NAME-1000.csv"` in the `raw-data` folder, where name is the display name of the twitter profile. The data is stored as a pandas dataframe, with the rows representing the date, time, content, and view count of the tweet respectively. The dataframe is then returned for further analysis and plotting.

In [None]:
NAME = "jack"

last_thou_tweets = get_thou_tweets(NAME)

Running the above cell meant that the scraper had to be iterated with 1000 times. While this specific function doesn't take too long, it can become very time consuming if tens of thousands of tweets are iterated through. While this time is difficult to cut down due to the nature of the scraper collecting the data, keeping the file size down is imperative.

One way we did this was by considering looking at all tweets in a specific month, and perhaps using this data to determine tweet activity per month.
<br/>The function `get_tweets_in_month` scans the same 10000 tweet period, but this time only tweets within a specifed month are saved to the csv. This data is viewable from the `raw-data` folder.

Note: This function differs to the previous ones in a few ways. The `statusesCount` (total tweets) value needs to be grabbed freom an object within the `user` object, so that the loop can be broken out of if the loop index surpasses it. The position of the loop conditions changes, and the additional object call takes place at the start of the for loop.

In [None]:
NAME = "jack"
MONTH = "Feb"
tweets = get_tweets_in_month(NAME,MONTH)

This function has to scrape through the entire tweet list of a user, which in some cases can be very large. This is unfortunately unavoidable. To prevent this time-heavy task from being repeated multiple times, we decided that the best approach would be to do the large initial data scrape with snscrape and save the output to a csv, and then programatically filter it down by reading into a variable and iterating through the list. This second stage is almost instant, no matter the size of the data, and means we can filter down the raw data as many times as needed without a concievable time penalty.

## Generating raw data

Now we have an understanding of all the different aspects of the scraping library and we have determined different ways to filter the tweets you are interested in.

Because of the fact that the majority of Elon Musk's tweets were after 2017, and Tesla as a project began to rapidly grow around 2018, it makes sense to  only scrape the tweets that took place in 2018 and beyond. However, this is still a significant number of tweets (20000) and will take a long time to run. Since Elon Musk has 23000 tweets in total, for completeness we will be scraping through all tweets.

The function `get_all_tweets` scrapes through a user's entire profile for all tweets, and stores the data in a csv. The output is in the `raw-data` folder.

Note: This __will__ take an extremely long time to run.

In [12]:
NAME = "elonmusk"
tweets = get_all_tweets(NAME)

## Manipulating the scraped tweets

For this section, run the cell below to import the tweet filtering functions.

In [14]:
from csv_process import *

With all this data stored in a csv, it can then be filtered down further based on specific dates, and we can chooose to omit other data if needed.

The easiest way to start this is by writing all the data in the csv to a variable that can be manipulated.

The function `read_to_variable` takes the name of the scraped data we are looking for and returns a list of all the tweet data.

Note: The files are stored with the name format: `*name*-all-tweets.csv`. If you are looking for the file that has all of Elon Musk's tweets, the calling function would be:
`read_to_variable("elonmusk")`.

In [15]:
NAME = "elonmusk"
tweets = read_to_variable(NAME)

With all the data in rows, it can then be iterated through to only include tweets that meet specific criteria. That way there is a much smaller dataset to work with. For example, if you only want tweets tweeted on a certain date to be in the list, this can be done.

The function `show_tweets_on` takes a list of tweets and a target date as the input, and returns the tweets that were tweeted on that date.

This function also removes replies made by the account, as they branch of into separate conversations, and are not in the scope of this exploration.

Note: The date should be in the format `mm-dd-yyyy`

In [20]:
DATE = "02-14-2020"
specific_tweets_on_date = show_tweets_on(tweets,DATE)
print(specific_tweets_on_date)

[['2020-02-14 17:56:07+00:00', 'Only the heart senses beauty', '']]


In some cases, the tweets we want to examine might not have taken place on the exact date that the stock price dropped. Because of this, it can be sensible to look at all tweets tweeted within a specific time period. The function `get_tweets_around` takes a name and the number of days before and after a given date to look for tweets for.<br/>Note: The date should be in the format `mm-dd-yyyy`<br/>This function omits replies, by checking if the first character of the tweet is an `"@"` symbol.

In [21]:
DATE = "2021-12-12"
RANGE = 3
specific_tweets_around_date = get_tweets_around(tweets,DATE,RANGE)
print(specific_tweets_around_date)

[['2021-12-14 10:34:23+00:00', 'Tesla will make some merch buyable with Doge &amp; see how it goes', ''], ['2021-12-13 22:21:51+00:00', 'Will also be important for Mars', ''], ['2021-12-13 22:21:34+00:00', 'SpaceX is starting a program to take CO2 out of atmosphere &amp; turn it into rocket fuel. Please join if interested.', ''], ['2021-12-12 12:47:36+00:00', 'https://t.co/5LE1PjFwgS', ''], ['2021-12-12 12:29:34+00:00', 'Sine qua non non https://t.co/iTBlSwiX53', ''], ['2021-12-12 12:27:32+00:00', 'Sorry https://t.co/ppBPBAWxZ6', ''], ['2021-12-12 03:44:29+00:00', 'Just did a @HardcoreHistory episode with Dan Carlin. Hope you like it.', ''], ['2021-12-11 20:12:56+00:00', '“No better friend, no worse enemy” https://t.co/e2TeRBiFbg', ''], ['2021-12-10 17:52:37+00:00', 'Mars &amp; Cars', ''], ['2021-12-10 17:29:40+00:00', 'Wow, only three weeks to 2022! \nWhat will 2032 will be like? \nSeems so futuristic!\nWill we be on Mars?', ''], ['2021-12-10 17:19:13+00:00', 'Hahaha … ?1', ''], ['202

## Writing the data to a csv

Once we have all the data we need, we can write it into a pandas dataframe and store it as a csv. The dataframe is returned so that we can use the tweet content and dates in our event studies later.

The function `write_to_csv` takes processed data and the intended filename as inputs, and writes the data to a csv in the `processed-data` folder with that specific name. It also returns the dataframe.

In [22]:
FILENAME = f"{NAME}-tweets-around-{DATE}.csv"

tweets_df = write_to_csv(specific_tweets_around_date,FILENAME)