### Correlation Between News & Stock Movement

### Correlation Between News & Stock Movement

This section focuses on:
- Aligning news data with stock price dates  
- Computing daily stock returns  
- Aggregating daily sentiment  
- Calculating Pearson correlation between sentiment and stock returns  

We assume the sentiment column already exists in the news dataframe.


In [None]:
import pandas as pd

# Load sentiment-processed news dataset
news_df = pd.read_csv("../data/raw_analyst_ratings.csv")

# Ensure proper datetime format
news_df["date"] = pd.to_datetime(news_df["date"]).dt.date

# Keep relevant columns
news_df = news_df[["date",  "headline", "stock"]]

news_df.head()


### Markdown: Aggregate Daily Sentiment
Multiple headlines may appear per day, so we compute the **average daily sentiment per ticker**.


In [None]:
daily_sentiment = (
    news_df.groupby(["Date", "Ticker"])["Sentiment"]
    .mean()
    .reset_index()
    .rename(columns={"Sentiment": "Daily_Sentiment"})
)

daily_sentiment.head()


### Prepare Stock Returns
Calculate Daily Stock Returns
We compute:
- Daily percent change of closing price
- Align return dates to match news dates

### Compute Daily Returns for All Tickers

In [None]:
returns_list = []

for t in tickers:
    df = data[t].copy()
    df["Date"] = df["Date"].dt.date
    df["Daily_Return"] = df["Close"].pct_change()
    returns_list.append(df[["Date", "Daily_Return"]].assign(Ticker=t))

daily_returns = pd.concat(returns_list)
daily_returns.head()


### Merge Sentiment & Stock Returns

### Code: Merge Datasets

In [None]:
merged_df = pd.merge(
    daily_sentiment,
    daily_returns,
    on=["Date", "Ticker"],
    how="inner"
)

merged_df.head()


### Compute Correlations Per Ticker

In [None]:
correlation_results = {}

for t in tickers:
    sub = merged_df[merged_df["Ticker"] == t]
    corr = sub["Daily_Sentiment"].corr(sub["Daily_Return"])
    correlation_results[t] = corr

correlation_results


### Scatter Plot

In [None]:
import matplotlib.pyplot as plt

for t in tickers:
    sub = merged_df[merged_df["Ticker"] == t]
    
    plt.figure(figsize=(7,4))
    plt.scatter(sub["Daily_Sentiment"], sub["Daily_Return"])
    plt.title(f"{t}: Sentiment vs. Return")
    plt.xlabel("Daily Sentiment")
    plt.ylabel("Daily Return")
    plt.grid(True)
    plt.show()
