Presentation: Google Slides Dashboard/Site: Dashboard
Our main communication took place in the Slack messenger app, when a more detailed question/task/troubleshooting needed to be explained a video conference was set up through skype. Each member worked on their own separate github branch, and in individual files (This avoided excessive overlap during merging). When files needed to be combined the main contributor will be in charge of merging the files and the person with the squared role would confirm the merge.
Team members:
Meme stocks have been gaining popularity in the last two years and have generated investors both profits and losses. The rise of social media fandom and its accompanying chatter has been named the culprit of major catapults in stocks such as GME, TSLA, AMC, and many others.
In this project we will test whether we can predict if the price of a meme stock will increase or decrease based on the social media hype around it, that is the conversations invoving their mention online. Specifically, we will be analyzing Twitter data to measure how often Tesla stock is mentioned in a 7 day period, and merging this data with the stock price counterpart on an hourly basis. As a baseline we will see if we can create the same results using a Stock index, such as the S&P500 index, in lieu of the Twitter data.
-
Historical Stock Data: For the stock data we used the Yfiance API python library. With this python API library we were able to extract historical stock data, however we encountered the following limitation: hourly data was limited to the previous 7 days.
- Data Acquire (Hourly Stock Data): % Stock Price change, Volume of stock exchanges for SP500 and TSLA stocks.
-
Twitter - TSLA stock mention: For twitter TSLA data we used the Twarc API python library. This library uses the Twitter API to collect data. The Twitter API also limited us to hourly data, for the previous 7 days.
- Data Acquire (Hourly Tweet Count Data): Tweets that mentioned “#TSLA”, the common way to mention a stock ticker.
- This analysis includes tweet data from the past 7 days at the start of the project.
- Data was extracted from twitter API, placed into a jupyter notebook dataframe, and cleaned up to include the date, hour, and tweet counts for the respective time frame.
- Will a logistical regression model show that the quantity of Tweets mentioning #TSLA affect the price of Tesla stock?
- What has a bigger impact on the price of a meme stock such as TSLA? (Tweet mention count data vs. S&P500)
- This analysis was performed to investigate whether TSLA stock increase or decrease based on certain factors: tweet counts mentioning TSLA, SPY day % change, and a combination of all factors.
- The following image shows our results of the logistical regression model for all three items mentioned above. As evidenced here, all three logistical regression models reflect the same results precision, recall, and f-1 scores, therefore we were unable to draw any solid conclusions.
The linear regression models were performed on the following parameters: Tesla tweet counts, Tesla percent day changes, and lastly Tesla Volume
- Tesla percent day changes vs #TSLA Tweet counts
- Results: In this case, this model yielded an R-squared value of 0.0058, which does not indicate a high level of correlation between tweet count and changes in TSLA's percent day increase or decrease. Larger tweet counts would yield a bit more drastic changes according to our outlier data, but they couldn't very well determine whether the Tesla stock would increase or decrease
- Tesla percent day changes vs SPY percent day changes
- Results: The relationship between SPY percent day changes and TSLA's percent day changes yielded a higher correlation that the tweet count model above, however at 0.1937, it is still not a strong enough model in assessing behavior or making future predictions with confidence.The
- Tesla Volume vs Tesla percent day changes
- Results: From this linear regression test, we found that higher volume was linked to more drastic changes in Tesla stock (either increasing or decreasing), however at lower volume levels, there wasn't a clear picture of correlation
What has the strongest correlation to TSLA (meme stock) price, SP500 or tweet post count?