### **Data Gathering, Cleaning and Preprocessing for Tesla and Apple Stocks**

#### This notebook gathers and preprocesses data from Reddit, GDELT news articles, and Yahoo Finance for Tesla (TSLA) and Apple (AAPL) from 2018 to 2024, preparing it for sentiment analysis and predictive modeling.

#### Installing required libraries

In [3]:
!pip install praw yfinance



#### Imporing Required Libraries 

In [4]:
import pandas as pd
import requests
from datetime import datetime, timedelta
import yfinance as yf
import praw
import time
import json

#### Ensure that you provide the correct file paths for posts.csv, stock_index.csv, and wallstreetbets_2022.csv datasets.

In [15]:
ids_fp = r"D:\ire\Sem_3\research_computing\stock_index.csv"
ids_df = pd.read_csv(ids_fp)

posts_fp= r"D:\ire\Sem_3\research_computing\posts.csv"
reddit_df =  pd.read_csv(posts_fp)

In [16]:
def get_filtered_reddit_data(ticker, reddit_df, ids_df):
    try:
        # Filter ids_df for the specified stock ticker
        ticker_ids_df = ids_df[ids_df['stock_symbol'].str.lower() == ticker.lower()]
        
        # Merge with reddit_df to get only matching records
        ticker_reddit_df = reddit_df.merge(ticker_ids_df, on='id', how='inner')
        
        # Select relevant columns
        filtered_stock_reddit_df = ticker_reddit_df[['title', 'created_utc_x', 'selftext', 'subreddit']]
        
        # Convert 'created_utc_x' to datetime and rename it
        filtered_stock_reddit_df['created_utc'] = pd.to_datetime(
            filtered_stock_reddit_df['created_utc_x'], unit='s', errors='coerce'
        )
        
        # Define date range for filtering
        start_date = pd.to_datetime('2018-01-01')
        end_date = pd.to_datetime('2023-01-01')
        
        # Filter by date range
        filtered_reddit_data = filtered_stock_reddit_df[
            (filtered_stock_reddit_df['created_utc'] >= start_date) &
            (filtered_stock_reddit_df['created_utc'] <= end_date)
        ]
        
        # Drop unnecessary column
        filtered_reddit_data.drop('created_utc_x', axis=1, inplace=True)
        return filtered_reddit_data

    except Exception as e:
        print(f"Error occurred: {e}")
        return pd.DataFrame()  # Return an empty DataFrame in case of error

In [17]:
tsla_reddit_18_22 = get_filtered_reddit_data("tsla", reddit_df, ids_df)
aapl_reddit_18_22 = get_filtered_reddit_data("aapl", reddit_df, ids_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_stock_reddit_df['created_utc'] = pd.to_datetime(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_stock_reddit_df['created_utc'] = pd.to_datetime(


In [18]:
tsla_reddit_18_22

Unnamed: 0,title,selftext,subreddit,created_utc
0,Bi-weekly TSLA Investor Thread,This will post every other Monday (EST) at 6AM...,teslamotors,2018-01-01 11:14:30
1,Long $TSLA,,wallstreetbets,2018-01-01 18:44:21
2,Get ready to short $TSLA - they finally added ...,,wallstreetbets,2018-01-01 20:41:06
3,TSLA,[removed],stocks,2018-01-01 21:04:17
4,Quick 1/1/18 drive-by looks like Marina Del Re...,[deleted],teslamotors,2018-01-01 23:52:28
...,...,...,...,...
73267,$TSLA Awaiting Short Signal based off 9 signal...,,StockTradingIdeas,2022-12-31 21:13:49
73268,2022 Performance and Lessons,Doing this in post form so I can include scree...,u_SpiritBearBC,2022-12-31 21:45:57
73269,2022 Berry Bad Bear year in review,"Tough year for many, myself included. My tradi...",options,2022-12-31 22:53:13
73270,"LEADERBOARD: Sat, Dec 31, 2022: 06:16 PM EST",#TOP TRADERS \n ##Overall\nRanking | Name | ...,InsiderMemeTrading,2022-12-31 23:16:31


In [19]:
aapl_reddit_18_22

Unnamed: 0,title,selftext,subreddit,created_utc
0,Blowing versus sucking,AAPL just entered a contract to purchase 51 of...,wallstreetbets,2018-01-01 01:04:25
1,The 2018 /r/Robinhood Stock Picking Game,# tl;dr\n\n - Stock picking game will last all...,RobinHood,2018-01-01 16:37:41
2,Hesitant to invest in $AAPL,Looking at AAPLs fundamentals and the pile of ...,investing,2018-01-01 20:21:33
3,Stock mix help required (ETF),I’ve decided I’m most likely interested in jus...,stocks,2018-01-01 21:54:14
4,ETF advice please!,I’ve decided I’m most likely interested in jus...,personalfinance,2018-01-01 22:12:01
...,...,...,...,...
39301,$AAPL Awaiting Buy Signal based off 9 signals ...,,StockTradingIdeas,2022-12-31 21:03:43
39302,2022 Performance and Lessons,Doing this in post form so I can include scree...,u_SpiritBearBC,2022-12-31 21:45:57
39303,2022 Berry Bad Bear year in review,"Tough year for many, myself included. My tradi...",options,2022-12-31 22:53:13
39304,Today's most mentioned tickers,## The following are the top 10 most mentioned...,u_WSBTickercountBOT,2022-12-31 23:50:18


In [35]:
wallstreetbets_fp = r"D:\ire\Sem_3\research_computing\wallstreetbets_2022.csv"
wallstreetbets_df = pd.read_csv(wallstreetbets_fp)

  wallstreetbets_df = pd.read_csv(wallstreetbets_fp)


In [36]:
def get_filtered_reddit_23_24(df, keywords):
    # Convert timestamp column to datetime if it isn't already
    df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
    
    # Define the start date for filtering
    start_date = "2023-01-01"
    
    # Filter rows based on keywords in the body column and timestamp after the start date
    reddit_data = df[
        (df['body'].str.contains('|'.join(keywords), case=False, na=False)) &
        (df['timestamp'] >= start_date)]
    return reddit_data

In [37]:
# Define keywords for filtering
tsla_keywords = ["tesla stocks", "tsla", "tesla stock market", "tesla", "tesla finance", "tesla stock price", "tsla clsoing price"]
aapl_keywords = ["apple stocks", "aapl", "apple stock market", "apple", "apple finance", "apple stock price", "aapl clsoing price"]
tsla_reddit_23_24 = get_filtered_reddit_23_24(wallstreetbets_df, tsla_keywords)
aapl_reddit_23_24 = get_filtered_reddit_23_24(wallstreetbets_df, aapl_keywords)

In [38]:
tsla_reddit_23_24

Unnamed: 0,title,score,id,url,comms_num,created,body,timestamp
290353,Comment,1,j2gkuq7,,0.0,1.672542e+09,"Bag holder at 33 and 38 (two puts assigned), b...",2023-01-01 03:02:31
290384,Comment,1,j2gkbft,,0.0,1.672542e+09,#Ban Bet Lost\n\n/u/_theshortbig made a bet th...,2023-01-01 02:57:44
290461,Comment,1,j2gjf6e,,0.0,1.672541e+09,You're right. I'm wrong. Investors definitely ...,2023-01-01 02:49:58
290471,Comment,1,j2gjb0w,,0.0,1.672541e+09,good points. I'm not saying that they should h...,2023-01-01 02:48:58
290509,Comment,4,j2giwpr,,0.0,1.672541e+09,Tesla 200 end of march?,2023-01-01 02:45:32
...,...,...,...,...,...,...,...,...
985277,Those who think removing the EV tax credit wil...,7133,1grqiw1,https://www.reddit.com/r/wallstreetbets/commen...,861.0,1.731653e+09,"1. Trump removes $7,500 EV tax credits and imp...",2024-11-15 06:49:43
985285,Coping with Loss in the Time of 🥭 (TSLA and LUNR),278,1grk58c,https://www.reddit.com/r/wallstreetbets/commen...,177.0,1.731632e+09,https://preview.redd.it/tk91sh5tpy0e1.jpg?widt...,2024-11-15 00:49:00
985297,"South Korea's ""Value Up"" program, and why Kore...",123,1grc49j,https://www.reddit.com/r/wallstreetbets/commen...,96.0,1.731610e+09,"Korean stocks have long been undervalued, refe...",2024-11-14 18:51:15
985777,Comment,2,lzsjz62,,0.0,1.733011e+09,Open the fucking markets my Tesla calls are re...,2024-11-30 23:58:49


In [41]:
aapl_reddit_23_24

Unnamed: 0,title,score,id,url,comms_num,created,body,timestamp
290425,Comment,1,j2gjs2h,,0.0,1.672542e+09,I made 100k on Amazon and lost it all. So I’m ...,2023-01-01 02:53:05
290439,Comment,1,j2gjn90,,0.0,1.672542e+09,Apple UIs are specifically designed for regard...,2023-01-01 02:51:55
290969,Comment,1,j2ge5co,,0.0,1.672539e+09,Consumers have a short memory. And Musk is fic...,2023-01-01 02:06:35
291084,Comment,2,j2gd4xp,,0.0,1.672538e+09,Woah that Apple commercial was something special,2023-01-01 01:58:27
291199,Comment,0,j2gc6ko,,0.0,1.672538e+09,People are always so impressed when i rip an a...,2023-01-01 01:50:49
...,...,...,...,...,...,...,...,...
985002,2 month rollercoaster: 2k>55k > -25k > 125k,84,1gwj01d,https://www.reddit.com/gallery/1gwj01d,21.0,1.732206e+09,After not trading for a 2 years to save up for...,2024-11-21 16:15:48
985062,$ACHR The Bull Run Hasn't Started Yet,2339,1gvrgv6,https://www.reddit.com/r/wallstreetbets/commen...,498.0,1.732115e+09,"**TLDR:** Current fair value is +$10imo, Arch...",2024-11-20 15:02:27
985173,Trump taps big tech critic Carr to lead US com...,794,1gu78hv,https://www.yahoo.com/news/trump-taps-big-tech...,141.0,1.731944e+09,"How do you think MSFT, META, GOOGL, and AAPL w...",2024-11-18 15:26:03
985321,SoundHound AI Stock Tumbles as Margins Drop,21,1gqnmrh,https://www.reddit.com/r/wallstreetbets/commen...,15.0,1.731532e+09,Was looking more into earnings and what could ...,2024-11-13 21:08:42


In [39]:
# Normalize columns in tsla_reddit_23_24
tsla_reddit_23_24 = tsla_reddit_23_24.rename(columns={'body': 'selftext', 'timestamp': 'created_utc'})

# Keep only the necessary columns
tsla_reddit_18_22 = tsla_reddit_18_22[['title', 'selftext', 'created_utc']]
tsla_reddit_23_24 = tsla_reddit_23_24[['title', 'selftext', 'created_utc']]

# Concatenate the two DataFrames
tsla_reddit_data_18_to_24 = pd.concat([tsla_reddit_18_22, tsla_reddit_23_24], ignore_index=True)

# Convert 'created_utc' to datetime format
tsla_reddit_data_18_to_24['created_utc'] = pd.to_datetime(tsla_reddit_data_18_to_24['created_utc'])

# Fill missing selftext with empty strings
tsla_reddit_data_18_to_24['selftext'] = tsla_reddit_data_18_to_24['selftext'].fillna("")

In [40]:
tsla_reddit_data_18_to_24

Unnamed: 0,title,selftext,created_utc
0,Bi-weekly TSLA Investor Thread,This will post every other Monday (EST) at 6AM...,2018-01-01 11:14:30
1,Long $TSLA,,2018-01-01 18:44:21
2,Get ready to short $TSLA - they finally added ...,,2018-01-01 20:41:06
3,TSLA,[removed],2018-01-01 21:04:17
4,Quick 1/1/18 drive-by looks like Marina Del Re...,[deleted],2018-01-01 23:52:28
...,...,...,...
83736,Those who think removing the EV tax credit wil...,"1. Trump removes $7,500 EV tax credits and imp...",2024-11-15 06:49:43
83737,Coping with Loss in the Time of 🥭 (TSLA and LUNR),https://preview.redd.it/tk91sh5tpy0e1.jpg?widt...,2024-11-15 00:49:00
83738,"South Korea's ""Value Up"" program, and why Kore...","Korean stocks have long been undervalued, refe...",2024-11-14 18:51:15
83739,Comment,Open the fucking markets my Tesla calls are re...,2024-11-30 23:58:49


In [42]:
# Normalize columns in aapl_reddit_23_24
aapl_reddit_23_24 = aapl_reddit_23_24.rename(columns={'body': 'selftext', 'timestamp': 'created_utc'})

# Keep only the necessary columns
aapl_reddit_18_22 = aapl_reddit_18_22[['title', 'selftext', 'created_utc']]
aapl_reddit_23_24 = aapl_reddit_23_24[['title', 'selftext', 'created_utc']]

# Concatenate the two DataFrames
aapl_reddit_data_18_to_24 = pd.concat([aapl_reddit_18_22, aapl_reddit_23_24], ignore_index=True)

# Convert 'created_utc' to datetime format
aapl_reddit_data_18_to_24['created_utc'] = pd.to_datetime(aapl_reddit_data_18_to_24['created_utc'])

# Fill missing selftext with empty strings
aapl_reddit_data_18_to_24['selftext'] = aapl_reddit_data_18_to_24['selftext'].fillna("")

In [43]:
aapl_reddit_data_18_to_24

Unnamed: 0,title,selftext,created_utc
0,Blowing versus sucking,AAPL just entered a contract to purchase 51 of...,2018-01-01 01:04:25
1,The 2018 /r/Robinhood Stock Picking Game,# tl;dr\n\n - Stock picking game will last all...,2018-01-01 16:37:41
2,Hesitant to invest in $AAPL,Looking at AAPLs fundamentals and the pile of ...,2018-01-01 20:21:33
3,Stock mix help required (ETF),I’ve decided I’m most likely interested in jus...,2018-01-01 21:54:14
4,ETF advice please!,I’ve decided I’m most likely interested in jus...,2018-01-01 22:12:01
...,...,...,...
44615,2 month rollercoaster: 2k>55k > -25k > 125k,After not trading for a 2 years to save up for...,2024-11-21 16:15:48
44616,$ACHR The Bull Run Hasn't Started Yet,"**TLDR:** Current fair value is +$10imo, Arch...",2024-11-20 15:02:27
44617,Trump taps big tech critic Carr to lead US com...,"How do you think MSFT, META, GOOGL, and AAPL w...",2024-11-18 15:26:03
44618,SoundHound AI Stock Tumbles as Margins Drop,Was looking more into earnings and what could ...,2024-11-13 21:08:42


In [31]:
# Base GDELT API URL
gdelt_api_url = "https://api.gdeltproject.org/api/v2/doc/doc"

# Function to fetch data from GDELT for a specific date range
def fetch_tsla_gdelt_data(start_date, end_date):
    
    params = {
        "query": "($Tsla OR Tesla OR Tesla stocks OR Tesla market OR Tesla shares OR Tesla trading OR Tesla finance OR Tesla investment OR Tesla Nasdaq OR Tesla performance)",
        "mode": "artlist",
        "maxrecords": "250",  # Max records per query
        "format": "json",
        "startdatetime": start_date.strftime('%Y%m%d%H%M%S'),
        "enddatetime": end_date.strftime('%Y%m%d%H%M%S'),
    }
    response = requests.get(gdelt_api_url, params=params)

    # Check if the response was successful
    if response.status_code == 200:
        try:
            data = response.json()
            articles = data.get('articles', [])
            return articles
        except ValueError as e:
            print(f"JSON decode error for {start_date} to {end_date}: {e}")
            return []
    else:
        print(f"Request failed for {start_date} to {end_date} with status code {response.status_code}")
        return []

# Date range to iterate over
start_date = datetime(2018, 1, 1)
end_date = datetime(2024, 12, 1)

# Initialize an empty list to store all the articles
all_articles = []

# Fetch data in batches of 30 days
batch_size = 30
current_start_date = start_date

while current_start_date < end_date:
    current_end_date = current_start_date + timedelta(days=batch_size)
    if current_end_date > end_date:
        current_end_date = end_date

    # Fetch articles for the current date range
    articles = fetch_tsla_gdelt_data(current_start_date, current_end_date)
    all_articles.extend(articles)

    # Move to the next date range
    current_start_date = current_end_date

# Convert the list of all articles into a DataFrame
tsla_news_data = pd.DataFrame(all_articles)
print(f"Fetching data from {start_date} to {end_date} is done.")

# Filter and keep only relevant columns if data is available
if not tsla_news_data.empty:
    tsla_news_data = tsla_news_data[['url', 'seendate', 'title', 'domain']]
else:
    print("No data fetched.")

Fetching data from 2018-01-01 00:00:00 to 2024-12-01 00:00:00 is done.


In [32]:
# Function to fetch data from GDELT for a specific date range
def fetch_apple_gdelt_data(start_date, end_date):
    params = {
        "query": "($AAPL OR Apple OR Apple stocks OR Apple market OR Apple shares OR Apple trading OR Apple finance OR Apple investment OR Apple Nasdaq OR Apple performance)",
        "mode": "artlist",
        "maxrecords": "250",  # Max records per query
        "format": "json",
        "startdatetime": start_date.strftime('%Y%m%d%H%M%S'),
        "enddatetime": end_date.strftime('%Y%m%d%H%M%S'),
    }
    response = requests.get(gdelt_api_url, params=params)

    # Check if the response was successful
    if response.status_code == 200:
        try:
            data = response.json()
            articles = data.get('articles', [])
            return articles
        except ValueError as e:
            print(f"JSON decode error for {start_date} to {end_date}: {e}")
            return []
    else:
        print(f"Request failed for {start_date} to {end_date} with status code {response.status_code}")
        return []

# Date range to iterate over
start_date = datetime(2018, 1, 1)
end_date = datetime(2024, 12, 1)

# Initialize an empty list to store all the articles
all_articles = []

# Fetch data in batches of 30 days
batch_size = 30
current_start_date = start_date

while current_start_date < end_date:
    current_end_date = current_start_date + timedelta(days=batch_size)
    if current_end_date > end_date:
        current_end_date = end_date

    # Fetch articles for the current date range
    articles = fetch_apple_gdelt_data(current_start_date, current_end_date)
    all_articles.extend(articles)

    # Move to the next date range
    current_start_date = current_end_date

# Convert the list of all articles into a DataFrame
apple_news_data = pd.DataFrame(all_articles)
print(f"Fetching data from {start_date} to {end_date} is done.")

# Filter and keep only relevant columns if data is available
if not apple_news_data.empty:
    apple_news_data = apple_news_data[['url', 'seendate', 'title', 'domain']]
else:
    print("No data fetched.")

JSON decode error for 2019-12-22 00:00:00 to 2020-01-21 00:00:00: Invalid \X escape sequence '\\': line 1 column 79382 (char 79381)
Fetching data from 2018-01-01 00:00:00 to 2024-12-01 00:00:00 is done.


In [33]:
tsla_news_data

Unnamed: 0,url,seendate,title,domain
0,https://kldaily.com/tesla-inc-tsla-eps-estimat...,20180129T074500Z,"Tesla , Inc . ( TSLA ) EPS Estimated At $ - 3 ...",kldaily.com
1,https://friscofastball.com/analysts-see-3-75-e...,20180129T011500Z,"Analysts See $ - 3 . 75 EPS for Tesla , Inc . ...",friscofastball.com
2,http://www.benchmarkmonitor.com/cnbc-a-key-eng...,20180130T111500Z,CNBC ; A key Engineer at Tesla Motors Inc ( ...,benchmarkmonitor.com
3,https://santimes.com/cousins-properties-cuz-an...,20180115T161500Z,Cousins Properties ( CUZ ) Analysts See $0 . 1...,santimes.com
4,https://ledgergazette.com/2018/01/19/insider-s...,20180119T063000Z,Tesla Inc ( NASDAQ : TSLA ) VP John Douglas Fi...,ledgergazette.com
...,...,...,...,...
21245,http://www.itbear.com.cn/html/2024-11/609574.html,20241125T140000Z,特斯拉中国罕见优惠 ： Model Y最高减10000元 ！- 智能汽车 - ITBear科技资讯,itbear.com.cn
21246,https://www.ideastream.org/2024-11-22/tesla-wo...,20241125T041500Z,Tesla won the plug war . Enter the age of the ...,ideastream.org
21247,https://www.cnet.com/paid-content/news/complet...,20241125T163000Z,Complete Your Home Energy Ecosystem With Tesla...,cnet.com
21248,https://finance.sina.com.cn/jjxw/2024-11-30/do...,20241130T093000Z,极越汽车被指抄袭特斯拉 ？ CEO夏一平 ： 只是自动驾驶理念相同,finance.sina.com.cn


In [34]:
apple_news_data

Unnamed: 0,url,seendate,title,domain
0,http://weeklyregister.com/british-columbia-inv...,20180121T120000Z,British Columbia Investment Management Corp Cu...,weeklyregister.com
1,https://bzweekly.com/2018/01/13/can-apple-inc-...,20180114T013000Z,Can Apple Inc . ( AAPL ) s Tomorrow be Differe...,bzweekly.com
2,https://kldaily.com/could-apple-inc-aapl-chang...,20180117T211500Z,Could Apple Inc . ( AAPL ) Change Direction Af...,kldaily.com
3,https://friscofastball.com/eaton-vance-managem...,20180120T131500Z,Eaton Vance Management Has Trimmed Position in...,friscofastball.com
4,http://finnewsdaily.com/as-apple-inc-aapl-stoc...,20180120T133000Z,"As Apple INC ( AAPL ) Stock Price Rose , Share...",finnewsdaily.com
...,...,...,...,...
20995,https://telegrafi.com/en/apple-zgjeron-kampanj...,20241125T060000Z,Apple expands There more in an iPhone campai...,telegrafi.com
20996,http://kr.xinhuanet.com/20241127/f8f050da74194...,20241127T071500Z,"팀 쿡 애플 CEO 中 , 세계에서 가장 중요한 공급사슬 - Xinhua",kr.xinhuanet.com
20997,https://www.annapurnapost.com/story/469101/,20241126T164500Z,मुस्ताङको कृषि उत्पादनमा पर्यटकको क्रेज,annapurnapost.com
20998,https://www.macworld.com/article/2535956/the-i...,20241127T201500Z,The next iPhone SE will be a study in sacrifice,macworld.com


In [57]:
# Defining the tickers and company names
tickers = ['TSLA', 'AAPL']
start_date = '2018-01-01'
end_date = '2024-12-01'

# Creating a dict to store the data for each company
stock_data = {}

# Downloading historical data from Yahoo Finance for each ticker
for ticker in tickers:
    stock_data[ticker] = yf.download(ticker, start=start_date, end=end_date, interval='1d')

tesla_data = stock_data['TSLA']
apple_data = stock_data['AAPL']

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


In [58]:
tesla_data.reset_index(inplace=True)
apple_data.reset_index(inplace=True)

In [59]:
tesla_data

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2018-01-02,20.799999,21.474001,20.733334,21.368668,21.368668,65283000
1,2018-01-03,21.400000,21.683332,21.036667,21.150000,21.150000,67822500
2,2018-01-04,20.858000,21.236668,20.378668,20.974667,20.974667,149194500
3,2018-01-05,21.108000,21.149332,20.799999,21.105333,21.105333,68868000
4,2018-01-08,21.066668,22.468000,21.033333,22.427334,22.427334,147891000
...,...,...,...,...,...,...,...
1735,2024-11-22,341.089996,361.529999,337.700012,352.559998,352.559998,89140700
1736,2024-11-25,360.140015,361.929993,338.200012,338.589996,338.589996,95890900
1737,2024-11-26,341.000000,346.959991,335.660004,338.230011,338.230011,62295900
1738,2024-11-27,341.799988,342.549988,326.589996,332.890015,332.890015,57896400


In [60]:
apple_data

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2018-01-02,42.540001,43.075001,42.314999,43.064999,40.524345,102223600
1,2018-01-03,43.132500,43.637501,42.990002,43.057499,40.517288,118071600
2,2018-01-04,43.134998,43.367500,43.020000,43.257500,40.705486,89738400
3,2018-01-05,43.360001,43.842499,43.262501,43.750000,41.168930,94640000
4,2018-01-08,43.587502,43.902500,43.482498,43.587502,41.016026,82271200
...,...,...,...,...,...,...,...
1735,2024-11-22,228.059998,230.720001,228.059998,229.869995,229.869995,38168300
1736,2024-11-25,231.460007,233.250000,229.740005,232.869995,232.869995,90152800
1737,2024-11-26,233.330002,235.570007,233.330002,235.059998,235.059998,45986200
1738,2024-11-27,234.470001,235.690002,233.809998,234.929993,234.929993,33498400


##### Saving all the datasets

In [44]:
tsla_reddit_data_18_to_24.to_csv("tesla_reddit_data_18_to_24.csv", index=False)
aapl_reddit_data_18_to_24.to_csv("aapl_reddit_data_18_to_24.csv", index=False)

In [46]:
tsla_news_data.to_csv("tesla_news_data_18_to_24.csv", index=False)
apple_news_data.to_csv("aapl_news_data_18_to_24.csv", index=False)

In [63]:
tesla_data.to_csv("tesla_stocks_18_to_24.csv", index=False)
apple_data.to_csv("aapl_stocks_18_to_24.csv", index=False)