# Correlation Between News Sentiment and Stock Movements
This notebook demonstrates how to analyze the correlation between news headline sentiment and stock price movements using modular functions.

In [4]:
# Import required libraries and custom modules
import pandas as pd
import sys
import os
sys.path.append(os.path.abspath(".."))
from src.news_stock_correlation import (
    normalize_dates, compute_sentiment, aggregate_daily_sentiment,
    compute_daily_returns, merge_sentiment_returns, compute_correlation
)  

## 1. Load News and Stock Data
Place your news data (CSV) and stock price data (CSV) in the `data/` directory. Adjust file paths as needed.

In [5]:
# Example file paths (update as needed)
news_path = '../data/raw_analyst_ratings.csv'
stock_path = '../data/yfinance_data/AAPL_historical_data.csv'

# Load data
news_df = pd.read_csv(news_path)
stock_df = pd.read_csv(stock_path)

# Display first few rows
news_df.head(), stock_df.head()

(   Unnamed: 0                                           headline  \
 0           0            Stocks That Hit 52-Week Highs On Friday   
 1           1         Stocks That Hit 52-Week Highs On Wednesday   
 2           2                      71 Biggest Movers From Friday   
 3           3       46 Stocks Moving In Friday's Mid-Day Session   
 4           4  B of A Securities Maintains Neutral on Agilent...   
 
                                                  url          publisher  \
 0  https://www.benzinga.com/news/20/06/16190091/s...  Benzinga Insights   
 1  https://www.benzinga.com/news/20/06/16170189/s...  Benzinga Insights   
 2  https://www.benzinga.com/news/20/05/16103463/7...         Lisa Levin   
 3  https://www.benzinga.com/news/20/05/16095921/4...         Lisa Levin   
 4  https://www.benzinga.com/news/20/05/16095304/b...         Vick Meyer   
 
                         date stock  
 0  2020-06-05 10:30:54-04:00     A  
 1  2020-06-03 10:45:20-04:00     A  
 2  2020-05-

## 2. Normalize Dates
Align date columns in both datasets for accurate merging.

In [6]:
# Normalize date columns
news_df = normalize_dates(news_df, 'date')
stock_df = normalize_dates(stock_df, 'Date')

## 3. Sentiment Analysis on News Headlines
Assign sentiment polarity scores to each news headline.

In [None]:
# Compute sentiment scores
news_df = compute_sentiment(news_df, text_col='headline')
news_df[["date", "headline", "sentiment"]].head()

## 4. Aggregate Daily Sentiment
Calculate the average sentiment score for each day.

In [None]:
# Aggregate daily sentiment
sentiment_daily = aggregate_daily_sentiment(news_df, date_col='date')
sentiment_daily.head()

## 5. Calculate Daily Stock Returns
Compute daily percentage changes in closing prices.

In [None]:
# Compute daily returns
returns_daily = compute_daily_returns(stock_df, date_col='Date', price_col='Close')
returns_daily.head()

## 6. Merge Sentiment and Returns by Date
Combine the two datasets for correlation analysis.

In [None]:
# Merge on date
merged = merge_sentiment_returns(sentiment_daily, returns_daily, date_col='date')
merged.head()

## 7. Correlation Analysis
Calculate the Pearson correlation coefficient between average daily sentiment and stock returns.

In [None]:
# Compute correlation
corr, pval = compute_correlation(merged)
print(f"Pearson correlation: {corr:.3f} (p-value: {pval:.3g})")

## 8. Visualize the Relationship
(Optional) Plot the relationship between daily sentiment and stock returns.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(8,5))
plt.scatter(merged['avg_sentiment'], merged['daily_return'], alpha=0.6)
plt.xlabel('Average Daily Sentiment')
plt.ylabel('Daily Stock Return')
plt.title('Sentiment vs. Stock Return')
plt.grid(True)
plt.show()

---
### References
- [TextBlob Documentation](https://textblob.readthedocs.io/en/dev/)
- [Pearson Correlation (scipy)](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html)