# Sentiment Analysis

This notebook uses OpenAI's GPT-3.5 to analyze the sentiment of financial news articles.

**Process:**
1. Load news data collected in the previous notebook
2. Analyze sentiment of each article using OpenAI API
3. Assign sentiment labels (positive/negative/neutral) and numerical scores (-1 to +1)
4. Save sentiment-enriched data for correlation analysis

## Setup and Load Data

In [None]:
import pandas as pd
import openai
from dotenv import load_dotenv
import os
import time
import json

# Load environment variables
load_dotenv()
openai.api_key = os.getenv('OPENAI_API_KEY')

# Load the news data
company_news = pd.read_csv('../data/raw/company_news.csv')
macro_news = pd.read_csv('../data/raw/macro_news.csv')

print("Company News:")
print(f"  {len(company_news)} articles")
print(f"  Distribution by stock: {company_news['ticker'].value_counts().to_dict()}")

print("\nMacro News:")
print(f"  {len(macro_news)} articles")

print("\n✓ Data loaded successfully")

## Define Sentiment Analysis Function

This function uses OpenAI's API to analyze news sentiment from a financial perspective.

**Returns:**
- `sentiment`: Classification (positive/negative/neutral)
- `score`: Numerical value from -1 (very negative) to +1 (very positive)

In [None]:
def analyze_sentiment(title, description):
    """
    Use OpenAI to analyze sentiment of a news article
    
    Parameters:
    - title: Article title
    - description: Article description/summary
    
    Returns:
    - sentiment: 'positive', 'negative', or 'neutral'
    - score: Float between -1 and 1
    """
    
    # Combine title and description
    text = f"Title: {title}\nDescription: {description}"
    
    try:
        response = openai.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": """You are a financial sentiment analyzer. Analyze news articles and determine their impact on stock prices.
                
Respond with ONLY a JSON object in this exact format:
{"sentiment": "positive" or "negative" or "neutral", "score": number between -1 and 1}

Where:
- positive: good news for stock price (product launch, earnings beat, etc.)
- negative: bad news for stock price (layoffs, earnings miss, scandals, etc.)
- neutral: informational, no clear impact
- score: -1 (very negative) to +1 (very positive)"""},
                {"role": "user", "content": f"Analyze this news article:\n\n{text}"}
            ],
            temperature=0,
            response_format={"type": "json_object"}
        )
        
        # Parse JSON response
        result = json.loads(response.choices[0].message.content)
        sentiment = result.get('sentiment', 'neutral')
        score = float(result.get('score', 0.0))
        
        return sentiment, score
        
    except Exception as e:
        print(f"Error analyzing sentiment: {e}")
        return 'neutral', 0.0

# Test on one article
print("Testing sentiment analysis function:\n")
test_article = company_news.iloc[0]
print(f"Title: {test_article['title']}")
print(f"Description: {test_article['description']}\n")

sentiment, score = analyze_sentiment(test_article['title'], test_article['description'])
print(f"Sentiment: {sentiment}")
print(f"Score: {score}")
print("\n✓ Sentiment analysis function ready!")

## Analyze Company-Specific News

Processing all company news articles (MSFT, TSLA, VISA) through sentiment analysis.

**Note:** This takes 2-3 minutes and costs approximately $0.10-0.20 in OpenAI API usage.

In [None]:
print("Analyzing sentiment for all company news articles...")
print(f"Processing {len(company_news)} articles (estimated cost: $0.10-0.20)\n")

sentiments = []
scores = []

for idx, row in company_news.iterrows():
    if idx % 10 == 0:
        print(f"Processing article {idx+1}/{len(company_news)}...")
    
    sentiment, score = analyze_sentiment(row['title'], row['description'])
    sentiments.append(sentiment)
    scores.append(score)
    
    # Delay to avoid rate limits
    time.sleep(0.5)

# Add sentiment columns to dataframe
company_news['sentiment'] = sentiments
company_news['sentiment_score'] = scores

# Save
company_news.to_csv('../data/processed/company_news_with_sentiment.csv', index=False)

print("\n✓ Company news sentiment analysis complete!")
print(f"\nSentiment distribution:")
print(company_news['sentiment'].value_counts())
print(f"\nAverage sentiment score: {company_news['sentiment_score'].mean():.3f}")
print(f"\nSentiment by stock:")
print(company_news.groupby('ticker')['sentiment_score'].mean())

company_news.head(10)

## Analyze Macro-Economic News

Processing broader market news (interest rates, tariffs, economic indicators) that affects all stocks.

In [None]:
print("Analyzing sentiment for macro-economic news...")
print(f"Processing {len(macro_news)} articles\n")

sentiments = []
scores = []

for idx, row in macro_news.iterrows():
    if idx % 10 == 0:
        print(f"Processing article {idx+1}/{len(macro_news)}...")
    
    sentiment, score = analyze_sentiment(row['title'], row['description'])
    sentiments.append(sentiment)
    scores.append(score)
    
    time.sleep(0.5)

# Add sentiment columns
macro_news['sentiment'] = sentiments
macro_news['sentiment_score'] = scores

# Save
macro_news.to_csv('../data/processed/macro_news_with_sentiment.csv', index=False)

print("\n✓ Macro news sentiment analysis complete!")
print(f"\nSentiment distribution:")
print(macro_news['sentiment'].value_counts())
print(f"\nAverage sentiment score: {macro_news['sentiment_score'].mean():.3f}")

# Show examples
print("\nMost positive macro news:")
print(macro_news.nlargest(3, 'sentiment_score')[['date', 'title', 'sentiment_score']])

print("\nMost negative macro news:")
print(macro_news.nsmallest(3, 'sentiment_score')[['date', 'title', 'sentiment_score']])

## Summary

Sentiment analysis complete! All news articles now have:
- Sentiment classification (positive/negative/neutral)
- Numerical sentiment score (-1 to +1)

Data saved to `../data/processed/` for correlation analysis in the next notebook.