<h1><center><font size=10>Introduction to LLMs and GenAI</center></font></h1>
<h1><center>Capstone Project 1 : AI-Powered Stock News Sentiment & Summarization System</center></h1>

## Context

Stock prices are influenced by company performance, innovations, collaborations, and market sentiment. Rapidly emerging news and media reports can significantly sway investor perceptions, making it challenging for analysts to keep up with the volume of information. Investment firms need AI tools to quickly assess market sentiment and integrate insights into trading strategies.

### Objective

To develop an AI system that:

- Analyzes historical financial news to determine market sentiment toward a NASDAQ-listed company.

- Generates weekly sentiment summaries of the news.

- Correlates sentiment trends with stock price movements (Open, High, Low, Close, Volume).

- Provides analysts with actionable insights for trading and investment decisions.

### Data Dictionary

- Date: The date the news was released

- News: The content of news articles that could potentially affect the company's stock price

- Open: The stock price (in $) at the beginning of the day

- High: The highest stock price (in $) reached during the day

- Low: The lowest stock price (in $) reached during the day

- Close: The adjusted stock price (in $) at the end of the day

- Volume: The number of shares traded during the day

- Label: The sentiment polarity of the news content
	 -  1: Positive
	 -  0: Neutral
	 -  -1: Negative


##Setup & Install Libraries

In [None]:
# Install necessary libraries
!pip install pandas numpy matplotlib seaborn plotly
!pip install sentence-transformers
!pip install transformers
!pip install torch
!pip install scikit-learn
!pip install tqdm

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from tqdm import tqdm
import torch


##Load & Inspect Data

In [None]:
# Load your dataset (CSV with Date, News, Open, High, Low, Close, Volume, Label)
df = pd.read_csv("/content/drive/MyDrive/Intro to LLM and Gen AI/stock_news.csv")

# Inspect the first few rows
df.head()


In [None]:
# Check data info
df.info()

# Check missing values
df.isnull().sum()

# Parse date column to datetime
df['Date'] = pd.to_datetime(df['Date'])


##Text Preprocessing

In [None]:
import re

def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'http\S+', '', text)          # remove URLs
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # remove punctuation
    text = re.sub(r'\s+', ' ', text).strip()    # remove extra spaces
    return text

# Apply preprocessing
df['clean_news'] = df['News'].apply(clean_text)

# Preview
df[['News', 'clean_news']].head()


##Generate Embeddings for News

In [None]:
# Load pre-trained Sentence Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
news_embeddings = model.encode(df['clean_news'].tolist(), batch_size=32, show_progress_bar=True)

# Convert embeddings to numpy array
news_embeddings = np.array(news_embeddings)
print("Embeddings shape:", news_embeddings.shape)


##Sentiment Analysis Using Labels / LLM

In [None]:
# Check distribution
sns.countplot(x='Label', data=df)
plt.title("Sentiment Distribution")
plt.show()


##Weekly Sentiment Aggregation

In [None]:
# Create a week column
df['Week'] = df['Date'].dt.to_period('W').apply(lambda r: r.start_time)

# Aggregate weekly sentiment
weekly_sentiment = df.groupby('Week')['Label'].mean().reset_index()
weekly_sentiment.head()

# Plot weekly sentiment
plt.figure(figsize=(12,5))
sns.lineplot(x='Week', y='Label', data=weekly_sentiment)
plt.title("Weekly Average Sentiment")
plt.xlabel("Week")
plt.ylabel("Average Sentiment")
plt.show()


##Weekly Stock Summary

In [None]:
# Aggregate weekly stock data
weekly_stock = df.groupby('Week').agg({
    'Open': 'first',
    'High': 'max',
    'Low': 'min',
    'Close': 'last',
    'Volume': 'sum'
}).reset_index()

# Merge sentiment and stock
weekly_data = weekly_stock.merge(weekly_sentiment, on='Week')
weekly_data.head()


##Correlation Analysis

In [None]:
# Compute correlation between sentiment and stock movement
weekly_data['Pct_Change'] = (weekly_data['Close'] - weekly_data['Open']) / weekly_data['Open'] * 100

# Plot correlation
sns.scatterplot(x='Label', y='Pct_Change', data=weekly_data)
plt.title("Weekly Sentiment vs Stock % Change")
plt.show()

# Correlation coefficient
corr = weekly_data['Label'].corr(weekly_data['Pct_Change'])
print(f"Correlation between sentiment and stock % change: {corr:.2f}")


##Weekly News Summarization (LLM)

In [None]:
from transformers import pipeline

# Load summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Example: summarize all news in a week
def summarize_weekly_news(week):
    texts = df[df['Week'] == week]['clean_news'].tolist()
    if len(texts) == 0:
        return ""
    combined_text = " ".join(texts)[:2000]  # limit to 2000 chars for model
    summary = summarizer(combined_text, max_length=100, min_length=40, do_sample=False)
    return summary[0]['summary_text']

# Add summary column
weekly_data['Weekly_Summary'] = weekly_data['Week'].apply(summarize_weekly_news)
weekly_data[['Week','Weekly_Summary']].head()


##Generate Automated Actionable Insights

In [None]:
# Define a function to generate insights
def generate_insights(row):
    insights = []

    # Positive sentiment followed by price rise
    if row['Label'] > 0 and row['Pct_Change'] > 0:
        insights.append(f"Positive news sentiment ({row['Label']:.2f}) preceded a price increase of {row['Pct_Change']:.2f}%")

    # Negative sentiment followed by price drop
    if row['Label'] < 0 and row['Pct_Change'] < 0:
        insights.append(f"Negative news sentiment ({row['Label']:.2f}) preceded a price decrease of {row['Pct_Change']:.2f}%")

    # Neutral sentiment
    if row['Label'] == 0:
        insights.append("Neutral sentiment observed this week; minimal price impact")

    # Large price change with low sentiment
    if abs(row['Pct_Change']) > 5 and abs(row['Label']) < 0.3:
        insights.append(f"Stock moved {row['Pct_Change']:.2f}% despite neutral sentiment; investigate external factors")

    return insights

# Apply to weekly data
weekly_data['Insights'] = weekly_data.apply(generate_insights, axis=1)
weekly_data[['Week', 'Label', 'Pct_Change', 'Insights']].head(10)


##Interactive Visualizations with Plotly

In [None]:
#Weekly Sentiment vs Stock Price
import plotly.express as px

fig = px.line(
    weekly_data, x='Week', y=['Label', 'Close'],
    title="Weekly Sentiment vs Stock Close Price",
    labels={'value':'Score / Price', 'Week':'Week'},
)
fig.update_layout(yaxis_title="Sentiment / Close Price")
fig.show()


In [None]:
#Scatter Plot: Sentiment vs % Change
fig = px.scatter(
    weekly_data, x='Label', y='Pct_Change',
    text='Week', size='Volume',
    color='Label', color_continuous_scale='RdYlGn',
    title="Weekly Sentiment vs Stock % Change",
    labels={'Label':'Avg Sentiment', 'Pct_Change':'Stock % Change'}
)
fig.show()


In [None]:
#Weekly News Summary Dashboard
for idx, row in weekly_data.iterrows():
    print(f"Week: {row['Week'].date()}")
    print(f"Avg Sentiment: {row['Label']:.2f} | % Change: {row['Pct_Change']:.2f}%")
    print(f"Weekly Summary: {row['Weekly_Summary']}")
    if row['Insights']:
        print("Insights:")
        for insight in row['Insights']:
            print(f"- {insight}")
    print("-"*80)


##LLM-Powered Analyst Recommendations

In [None]:
# Install required libraries (if not already installed)
!pip install transformers torch sentence-transformers tqdm

# Import libraries
import torch
from transformers import pipeline
from tqdm import tqdm

# Make sure GPU is used
device = 0 if torch.cuda.is_available() else -1
print("Using device:", "GPU" if device == 0 else "CPU")


###Load the GPT-Neo Model

In [None]:
# Load EleutherAI GPT-Neo 2.7B text-generation pipeline
llm = pipeline(
    "text-generation",
    model="EleutherAI/gpt-neo-2.7B",
    device=device,
    tokenizer="EleutherAI/gpt-neo-2.7B"
)


In [None]:
#Define LLM Insight Generation Function
def llm_generate_insights(row):
    """
    Generates actionable analyst insights using GPT-Neo 2.7B
    Input: row from weekly_data containing 'Weekly_Summary', 'Label', 'Pct_Change'
    Output: List of actionable insights
    """

    prompt = f"""
    You are a financial analyst.
    Based on the following information, generate 2-3 concise, actionable insights for investors.

    Weekly Summary: {row['Weekly_Summary']}
    Average Sentiment Score: {row['Label']:.2f}  (1=Positive, 0=Neutral, -1=Negative)
    Stock % Change: {row['Pct_Change']:.2f}%

    Provide insights in bullet points.
    """

    # Generate text using the model
    output = llm(prompt, max_length=200, do_sample=True, temperature=0.7)

    # Extract generated text
    generated_text = output[0]['generated_text']

    # Optional: clean the text by splitting into bullet points
    insights = [line.strip("-â€¢ \n") for line in generated_text.split("\n") if len(line.strip())>0]

    return insights


In [None]:
from tqdm import tqdm

# Make sure the column exists first
weekly_data['LLM_Insights'] = None

# Apply LLM generation for first 5 weeks (demo)
for idx, row in tqdm(weekly_data.head(5).iterrows(), total=5):
    weekly_data.at[idx, 'LLM_Insights'] = llm_generate_insights(row)

# Preview results
weekly_data[['Week', 'Weekly_Summary', 'Label', 'Pct_Change', 'LLM_Insights']].head()


In [None]:
#Display LLM-Powered Insights Nicely
for idx, row in weekly_data.head(5).iterrows():
    print(f"Week: {row['Week'].date()}")
    print(f"Average Sentiment: {row['Label']:.2f} | % Change: {row['Pct_Change']:.2f}%")
    print(f"Weekly Summary: {row['Weekly_Summary']}")
    if row['LLM_Insights']:
        print("LLM Insights:")
        for insight in row['LLM_Insights']:
            print(f"- {insight}")
    print("-"*80)


##Interactive Filters with Plotly & Widgets

In [None]:
!pip install ipywidgets
from ipywidgets import widgets
from IPython.display import display

# Date range slider
start_date = weekly_data['Week'].min()
end_date = weekly_data['Week'].max()

date_range = widgets.SelectionRangeSlider(
    options=weekly_data['Week'],
    index=(0, len(weekly_data)-1),
    description='Select Weeks',
    orientation='horizontal',
    layout={'width':'800px'}
)

display(date_range)

# Function to update chart
def update_chart(change):
    selected_weeks = change['new']
    filtered_data = weekly_data[(weekly_data['Week'] >= selected_weeks[0]) &
                                (weekly_data['Week'] <= selected_weeks[1])]

    fig = px.line(filtered_data, x='Week', y=['Label','Close'],
                  title="Weekly Sentiment vs Stock Close Price")
    fig.show()

date_range.observe(update_chart, names='value')


##Export Reports (CSV/PDF)

In [None]:
# Save weekly data with insights
weekly_data.to_csv("weekly_stock_insights.csv", index=False)
print("CSV report saved!")


In [None]:
!pip install fpdf
from fpdf import FPDF

pdf = FPDF()
pdf.set_auto_page_break(auto=True, margin=15)
pdf.add_page()
pdf.set_font("Arial", size=12)

for idx, row in weekly_data.iterrows():
    pdf.multi_cell(0, 8, f"Week: {row['Week'].date()}")
    pdf.multi_cell(0, 8, f"Avg Sentiment: {row['Label']:.2f} | % Change: {row['Pct_Change']:.2f}%")
    pdf.multi_cell(0, 8, f"Weekly Summary: {row['Weekly_Summary']}")

    # Handle LLM_Insights as a list
    if 'LLM_Insights' in row and row['LLM_Insights']:
        pdf.multi_cell(0, 8, "LLM Insights:")
        # Join list into a single string with newlines
        insights_text = "\n".join(row['LLM_Insights'])
        pdf.multi_cell(0, 8, insights_text)

    pdf.multi_cell(0, 8, "-"*80)

pdf.output("Weekly_Stock_Insights.pdf")
print("PDF report saved!")
