# Twitter Sentiment Investing Strategy
This notebook demonstrates how to implement an investing strategy based on Twitter sentiment data.

## Step 1: Load and Process Sentiment Data
We start by loading the sentiment data from a CSV file and processing it to compute the engagement ratio and filter the data.

In [1]:
pip install yfinance pandas matplotlib

Defaulting to user installation because normal site-packages is not writeable
Collecting yfinance
  Downloading yfinance-0.2.41-py2.py3-none-any.whl (73 kB)
[K     |████████████████████████████████| 73 kB 4.8 MB/s eta 0:00:011
Collecting matplotlib
  Downloading matplotlib-3.9.1-cp39-cp39-macosx_11_0_arm64.whl (7.8 MB)
[K     |████████████████████████████████| 7.8 MB 3.1 MB/s eta 0:00:01
Collecting multitasking>=0.0.7
  Downloading multitasking-0.0.11-py3-none-any.whl (8.5 kB)
Collecting peewee>=3.16.2
  Using cached peewee-3.17.6-cp39-cp39-macosx_10_9_universal2.whl
Collecting frozendict>=2.3.4
  Downloading frozendict-2.4.4-cp39-cp39-macosx_11_0_arm64.whl (37 kB)
Collecting html5lib>=1.1
  Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
[K     |████████████████████████████████| 112 kB 5.8 MB/s eta 0:00:01
Collecting contourpy>=1.0.1
  Downloading contourpy-1.2.1-cp39-cp39-macosx_11_0_arm64.whl (244 kB)
[K     |████████████████████████████████| 244 kB 5.8 MB/s eta 0:00:01


In [2]:
import pandas as pd

# Load sentiment data
sentiment_data_path = "/mnt/data/sentiment_data.csv"
sentiment_df = pd.read_csv(sentiment_data_path)

# Process sentiment data
sentiment_df['date'] = pd.to_datetime(sentiment_df['date'])
sentiment_df = sentiment_df.set_index(['date', 'symbol'])
sentiment_df['engagement_ratio'] = sentiment_df['twitterComments'] / sentiment_df['twitterLikes']
sentiment_df = sentiment_df[(sentiment_df['twitterLikes'] > 20) & (sentiment_df['twitterComments'] > 10)]

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/sentiment_data.csv'

## Step 2: Aggregate and Rank Data
We aggregate the data by month and rank the symbols based on their engagement ratio.

In [None]:
# Aggregate and rank data
aggregated_df = (sentiment_df.reset_index('symbol')
                 .groupby([pd.Grouper(freq='ME'), 'symbol'])
                 [['engagement_ratio']].mean())
aggregated_df['rank'] = (aggregated_df.groupby(level=0)['engagement_ratio']
                         .transform(lambda x: x.rank(ascending=False)))

# Filter top symbols
filtered_df = aggregated_df[aggregated_df['rank'] < 6].copy()
filtered_df = filtered_df.reset_index(level=1)
filtered_df.index = filtered_df.index + pd.DateOffset(1)
filtered_df = filtered_df.reset_index().set_index(['date', 'symbol'])

# Generate fixed dates dictionary
dates = filtered_df.index.get_level_values('date').unique().tolist()
fixed_dates = {d.strftime('%Y-%m-%d'): filtered_df.xs(d, level=0).index.tolist() for d in dates}

## Step 3: Fetch Stock Prices
We fetch the stock prices for the symbols in the sentiment data using the `yfinance` library.

In [None]:
import yfinance as yf

# Fetch stock prices
stocks_list = sentiment_df.index.get_level_values('symbol').unique().tolist()
prices_df = yf.download(tickers=stocks_list, start='2021-01-01', end='2023-03-01')

# Debug: Verify prices_df structure
print("prices_df head:")
print(prices_df.head())

## Step 4: Extract Adjusted Close Prices
We extract the adjusted close prices from the fetched stock prices data.

In [None]:
# Ensure 'Adj Close' exists and is structured properly
if 'Adj Close' not in prices_df.columns.levels[0]:
    raise ValueError("'Adj Close' not found in prices_df")

# Debug: Verify prices_df columns
print("prices_df columns:")
print(prices_df.columns)

# Extract 'Adj Close' prices
prices_adj_close = prices_df['Adj Close']

# Debug: Verify prices_adj_close
print("prices_adj_close head:")
print(prices_adj_close.head())

## Step 5: Handle NaN Values and Calculate Log Returns
We handle any NaN values in the adjusted close prices and calculate the log returns.

In [None]:
# Check for any NaN values and data types
print("prices_adj_close NaNs:")
print(prices_adj_close.isna().sum())

print("prices_adj_close data types:")
print(prices_adj_close.dtypes)

# Debug: Print the shape and index of prices_adj_close
print("prices_adj_close shape:", prices_adj_close.shape)
print("prices_adj_close index:", prices_adj_close.index)

# Ensure index is sorted
prices_adj_close.sort_index(inplace=True)

# Debug: Verify data after shifting
shifted_prices = prices_adj_close.shift(1)
print("shifted_prices head:")
print(shifted_prices.head())

# Calculate log returns without dropping NaNs
raw_log_returns = np.log(prices_adj_close / shifted_prices)
print("raw_log_returns head:")
print(raw_log_returns.head())

# Handle NaNs by filling them with zero
log_returns = raw_log_returns.fillna(0)
print("log_returns head after filling NaNs:")
print(log_returns.head())

# Print the index of log_returns to check available dates
print("Available dates in log_returns:")
print(log_returns.index)

## Step 6: Create Portfolio DataFrame
We create a DataFrame for the portfolio returns by calculating the average log returns for the selected symbols in each date range.

In [None]:
# Initialize empty DataFrame for portfolio returns
portfolio_df = pd.DataFrame()

for start_date in fixed_dates.keys():
    end_date = (pd.to_datetime(start_date) + pd.offsets.MonthEnd()).strftime('%Y-%m-%d')
    cols = fixed_dates[start_date]
    temp_df = log_returns.loc[start_date:end_date, cols].mean(axis=1).to_frame('portfolio_return')
    portfolio_df = pd.concat([portfolio_df, temp_df])

# Display the final portfolio DataFrame
print("Final Portfolio DataFrame:")
print(portfolio_df)

## Conclusion
We have successfully implemented a Twitter sentiment investing strategy, where we used Twitter sentiment data to select stocks and calculated portfolio returns based on the log returns of these selected stocks.