# Week 1 Interim Analysis

Tracking descriptive stats, sentiment, and early technical indicators on the sample CSV files.

## Notebook Goals

1. Load news and price data from `data/raw/`.
2. Produce quick descriptive stats (headline length, publisher mix, publishing cadence).
3. Run baseline sentiment scoring and daily aggregation.
4. Join sentiment with moving averages, RSI, and daily returns to preview the final workflow.

In [None]:
# imports
from pathlib import Path
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from src.data_io import load_news_data, load_stock_data
from src.eda import headline_length_stats, publisher_activity, daily_article_counts
from src.sentiment import compute_headline_sentiment, aggregate_daily_sentiment
from src.technical import add_moving_average, add_rsi, compute_daily_returns
from src.correlation import align_sentiment_with_returns, correlation_between_sentiment_and_returns

In [None]:
data_dir = Path('..') / 'data' / 'raw'
news_path = data_dir / 'sample_news.csv'
price_path = data_dir / 'sample_prices.csv'
news_df = load_news_data(news_path)
price_df = load_stock_data(price_path)
news_df.head()

In [None]:
headline_stats = headline_length_stats(news_df)
headline_stats

In [None]:
top_publishers = publisher_activity(news_df, top_n=5)
top_publishers

In [None]:
daily_counts = daily_article_counts(news_df)
sns.barplot(data=daily_counts, x='date', y='article_count')
plt.title('Articles per day (sample)')
plt.xticks(rotation=45)
plt.tight_layout()

In [None]:
scored_news = compute_headline_sentiment(news_df)
scored_news[['headline', 'polarity', 'subjectivity']]

In [None]:
sentiment_daily = aggregate_daily_sentiment(scored_news)
sentiment_daily

In [None]:
tech_prices = add_moving_average(price_df, window=3)
tech_prices = add_rsi(tech_prices, window=3)
tech_prices = compute_daily_returns(tech_prices)
tech_prices[['date', 'close', 'ma_3', 'rsi_3', 'daily_return']]

In [None]:
aligned_df = align_sentiment_with_returns(sentiment_daily, tech_prices)
corr_value = correlation_between_sentiment_and_returns(aligned_df)
corr_value

## Notes and Next Steps

- Pull the full FNSPID dataset and repeat the workflow across more tickers.
- Add extra indicators (MACD, Bollinger Bands) once TA-Lib wheels are available.
- Compare TextBlob with VADER or FinBERT for better sentiment coverage.
- Document blockers (late data delivery, TA-Lib install) in the interim PDF.