# Sentiticks: Sentiment Analysis on Stock Tickers

## Overview

This project focuses on analyzing financial news articles to extract and summarize key information, as well as to evaluate sentiment related to specific stock tickers. It utilizes natural language processing (NLP) techniques and leverages large language models to create concise summaries of the articles and then generate sentiment-based recommendations. Here are some potential use cases for this project:

- Financial News Summarization: Automatically summarize financial news articles to provide concise overviews, saving users time by highlighting the most relevant information.

- Sentiment Analysis: Evaluate the sentiment of news articles related to specific stock tickers to assess market sentiment and its potential impact on stock prices.

- Stock Recommendations: Generate sentiment-based recommendations or alerts based on the analysis of news articles, helping users make informed investment decisions.

- Trend Identification: Identify emerging trends in financial news by analyzing patterns and sentiments over time, aiding in strategic planning and forecasting.

- Custom Alerts: Set up personalized alerts for news related to particular stocks or sectors, based on sentiment or key information extracted from articles.

- Portfolio Insights: Provide insights into how news sentiment is affecting the performance of user portfolios by correlating news summaries and sentiment scores with stock movements.

- Competitive Analysis: Analyze news related to competitors to understand their market positioning and public perception, helping users make strategic business decisions.

## Components

1. **Article Retrieval**: Fetches news articles related to specific stock tickers from the News API.

2. **Text Summarization**:
    - Uses a language model to summarize the content of each article.

3. **Sentiment Analysis**:
    - Analyzes the sentiment of each summarized article.
    - Provides recommendations on whether to buy, hold, or sell based on the sentiment.

4. **Caching**:
    - Saves the retrieved articles and their summaries as JSON files for future reference.


## Example Code

```python
# Fetch articles
articles = get_news_articles(q='NVDA', from_date='2024-07-07', to_date='2024-07-14', page_size=10)

# Summarize articles
summaries = summarize_articles(articles)

# Analyze sentiment
recommendations = analyze_article_sentiment(articles)



In [2]:
# imports 

import hashlib
import json
import os
import pandas as pd
import requests
import yfinance as yf
from bs4 import BeautifulSoup
from collections import Counter
from datetime import datetime, timedelta
from typing import Optional, List, Dict, Any, Union
from dotenv import load_dotenv
pd.set_option('display.max_columns', None)

### Configuring API Keys

To securely manage API keys for accessing various services, we'll load environment variables from a `.env` file using Python's `dotenv` library.  
- NEWS_API_KEY: This key is used to access news-related data from a news API service.

- OPENAI_API_KEY: This key is used to interact with OpenAI’s API, which can include accessing language models like GPT. 

- LANGCHAIN_API_KEY: This key is used to access LangChain services, which typically involve tools and APIs for building applications that leverage language models and other AI capabilities.


In [3]:
# loading api keys
load_dotenv()

NEWS_API_KEY = os.getenv('NEWS_API_KEY')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
LANGCHAIN_API_KEY = os.getenv('LANGCHAIN_API_KEY')

## Retrieving News Articles

### News API

To fetch news articles related to a specific stock ticker, we'll use NewsAPI by calling the `get_news_articles` function. 

[NewsAPI](https://newsapi.org) is a service that provides a simple way to access current and historical news articles from a variety of sources. It offers a unified API for retrieving news articles, headlines, and sources from across the web.

### Key Features

1. **Comprehensive News Coverage**
   - Access articles from major news sources and blogs worldwide.
   - Includes news from various categories such as business, technology, health, and more.

2. **Search and Filter Capabilities**
   - Search for news articles using keywords, phrases, or specific topics.
   - Filter news by date, language, source, and relevance.

3. **Top Headlines**
   - Retrieve the latest headlines from around the world or from specific countries or sources.

4. **Historical Data**
   - Access past news articles within a certain timeframe.

5. **API Endpoints**
   - **Everything**: Search for articles based on keywords, dates, and other parameters.
   - **Top Headlines**: Get the latest headlines from various sources.
   - **Sources**: List available news sources and their details.

### How to Use NewsAPI

1. **Get an API Key**
   - Sign up on [NewsAPI](https://newsapi.org) to obtain a free API key.

2. **Make API Requests**
   - Use the API key to authenticate your requests.
   - Construct requests to endpoints based on your needs.



In [51]:
def get_news_articles(
    q: str,
    searchIn: str = 'title',
    from_date: Optional[str] = None,
    to_date: Optional[str] = None,
    language: str = 'en',
    sort_by: str = 'relevance',
    page_size: int = 100,
    page: int = 1,
    api_key: str = NEWS_API_KEY
) -> List[Dict[str, Any]]:
    """
    Fetch news articles from the News API based on the provided parameters.
    :param q: The query string to search for.
    :param searchIn: The field to search in ('title', 'description', or 'content').
    :param from_date: The start date for articles in 'YYYY-MM-DD' format.
    :param to_date: The end date for articles in 'YYYY-MM-DD' format.
    :param language: The language of the articles.
    :param sort_by: How to sort the results ('relevance' or 'publishedAt').
    :param page_size: Number of articles to return per page.
    :param page: The page number of the results to return.
    :param api_key: The API key for authentication.
    :return: A list of articles where each article is represented as a dictionary.
    """
    base_url = 'https://newsapi.org/v2/everything'
    
    # Construct the request parameters
    params = {
        'apiKey': api_key,
        'q': q,
        'searchIn': searchIn,
        'from': from_date,
        'to': to_date,
        'language': language,
        'sortBy': sort_by,
        'pageSize': page_size,
        'page': page
    }
    
    # Make the request
    response = requests.get(base_url, params=params)
    
    # Check if the request was successful
    if response.status_code == 200:
        return response.json()['articles']  # Return articles if successful
    else:
        response.raise_for_status()  # Raise an error for unsuccessful requests


Let's retrieve a few articles on $NVDA and examine an example.

In [53]:
ticker = 'NVDA'
today = datetime.today()
from_date = '2024-07-07' 
to_date = '2024-07-14'
articles = get_news_articles(q=ticker, from_date=from_date, to_date=to_date, page_size=10)
articles[0]

{'source': {'id': None, 'name': 'Yahoo Entertainment'},
 'author': 'Zacks Equity Research',
 'title': 'Nvidia (NVDA) Just Flashed Golden Cross Signal: Do You Buy?',
 'description': 'Should investors be excited or worried when a stock crosses above the 20-day simple moving average?',
 'url': 'https://finance.yahoo.com/news/nvidia-nvda-just-flashed-golden-133503737.html',
 'urlToImage': 'https://media.zenfs.com/en/zacks.com/2f0d6efdd8740952a371c97ddea6e926',
 'publishedAt': '2024-07-09T13:35:03Z',
 'content': 'Nvidia (NVDA) reached a significant support level, and could be a good pick for investors from a technical perspective. Recently, NVDA broke through the 20-day moving average, which suggests a short-… [+1373 chars]'}

### Processing Articles

1. **Add Ticker**: Each article in the list is updated to include the stock ticker in its metadata.
   
2. **Generate URL Hash**: A unique hash is created for each article’s URL using MD5.

3. **Fetch Full Text**: Attempts to retrieve and add the full text of each article using the `get_article_text` function.
   
This ensures that each article has a unique identifier and full text content for further analysis.


In [5]:
def get_article_text(url: str) -> str:
    """
    Fetch and extract the text content from the article at the given URL.

    :param url: The URL of the article to fetch.
    :return: The extracted text content from the article.
    """
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    # Extract and concatenate all paragraph texts
    text = ' '.join([p.text for p in soup.find_all('p')])
    return text


def process_articles(articles: List[Dict[str, Optional[str]]], ticker: str, download_full_text: bool = True) -> List[Dict[str, Optional[str]]]:
    """
    Process a list of articles by adding metadata and optionally downloading full text.

    :param articles: A list of dictionaries, each representing an article with at least a 'url' key.
    :param ticker: The stock ticker to add to each article's metadata.
    :param download_full_text: Whether to attempt to download and add the full text of each article.
    :return: The list of processed articles with updated metadata and full text where applicable.
    """

    for article in articles:
        article['ticker'] = ticker  # Add ticker to the article metadata
        article['url_hash'] = hashlib.md5(article['url'].encode()).hexdigest()  # Generate a unique hash for the URL
        if download_full_text:
            try:
                # Attempt to fetch and add the full text of the article
                article['full_text'] = get_article_text(article['url'])
            except Exception as e:
                # Handle any exceptions that occur and set full_text to an empty string
                article['full_text'] = ''
                print(f"Failed to fetch article text for {article['url']}: {e}")
    
    return articles



Here, we'll process the retrieved articles and review a processed example, including its full text and url_hash.

In [54]:
# process articles
articles = process_articles(articles, ticker)
articles[0]

{'source': {'id': None, 'name': 'Yahoo Entertainment'},
 'author': 'Zacks Equity Research',
 'title': 'Nvidia (NVDA) Just Flashed Golden Cross Signal: Do You Buy?',
 'description': 'Should investors be excited or worried when a stock crosses above the 20-day simple moving average?',
 'url': 'https://finance.yahoo.com/news/nvidia-nvda-just-flashed-golden-133503737.html',
 'urlToImage': 'https://media.zenfs.com/en/zacks.com/2f0d6efdd8740952a371c97ddea6e926',
 'publishedAt': '2024-07-09T13:35:03Z',
 'content': 'Nvidia (NVDA) reached a significant support level, and could be a good pick for investors from a technical perspective. Recently, NVDA broke through the 20-day moving average, which suggests a short-… [+1373 chars]',
 'ticker': 'NVDA',
 'url_hash': '27508037f9eaaa429b86228c99b27964',
 'full_text': "Nvidia (NVDA) reached a significant support level, and could be a good pick for investors from a technical perspective. Recently, NVDA broke through the 20-day moving average, which sugg

### Caching Articles

We’ll cache news articles as JSON files in a specified directory to avoid redundant downloads in the future. Each URL will have a unique `url_hash` to ensure distinct files.

In [6]:
articles_cache = 'articles_cache'

def cache_articles(articles: List[Dict[str, Any]]) -> None:
    """
    Cache each article as a JSON file in the specified directory.

    :param articles: A list of dictionaries representing news articles.
    """
    # Ensure the cache directory exists
    os.makedirs(articles_cache, exist_ok=True)
    
    for article in articles:
        filename = article['url_hash'] + '.json'  # Create a filename based on the article's URL hash
        filepath = os.path.join(articles_cache, filename)  # Create the full file path
        
        # Write the article to a JSON file
        with open(filepath, 'w', encoding='utf-8') as f:
            json.dump(article, f, ensure_ascii=False, indent=4)

# Cache the articles
cache_articles(articles)

## LangChain

LangChain is a framework designed to facilitate the development of applications that leverage large language models (LLMs) for various natural language processing tasks. It provides tools and abstractions to integrate LLMs into applications, manage data flows, and build complex workflows. LangChain supports tasks such as text generation, summarization, and question-answering by connecting LLMs with external data sources and processing pipelines. Its modular design allows developers to customize and extend functionalities, making it easier to create intelligent applications that can interact with and process natural language efficiently.

### LangChain tracing
LangChain tracing enhances debugging, performance monitoring, and understanding of data flow, and supports auditing and compliance by providing 
visibility into data transformations and system behavior. We’ll configure the necessary settings below.

In [8]:
## Setting Environment Variables To configure LangChain Tracing

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = LANGCHAIN_API_KEY

### LangChain with OpenAI

The `ChatOpenAI` class from the `langchain_openai` module enables you to initialize and work with OpenAI's language models, such as GPT-3 and GPT-4, within your LangChain application. It offers the following features:

- **Model Integration**: Seamlessly integrates OpenAI models into LangChain pipelines.
- **API Key Authentication**: Ensures secure communication with OpenAI's API using your API key.
- **Model Customization**: Allows specification of the model version and parameters to suit your needs.


In [9]:
# Initialize OpenAI model with the specified model name and api key
from langchain_openai import ChatOpenAI

model_name = 'gpt-4o-mini-2024-07-18'
llm = ChatOpenAI(model_name=model_name, openai_api_key=OPENAI_API_KEY)


### LangChain Components

- `PromptTemplate` is a class in LangChain used to create and manage structured prompts for language models. It allows you to define and format prompts dynamically based on input data, with placeholders that are filled in with actual values when generating prompts. This helps in creating consistent and well-structured prompts tailored to specific tasks.

- `ChatPromptTemplate` is a class in LangChain designed to create and manage prompts specifically for chat-based interactions with language models. Unlike standard `PromptTemplate`, which is generally used for static prompts, `ChatPromptTemplate` is tailored for dynamic and interactive dialogue scenarios.

- `StrOutputParser` is a class used to parse and process the output from a language model, specifically for outputs in string format. It helps convert raw model outputs into a structured format or extract meaningful information from the text.

- In LangChain, the `|` operator is used to compose and chain together different components of a workflow. It allows passing the output of one component as the input to the next, creating a pipeline of operations. This operator simplifies managing complex workflows by chaining processing steps.

### Summary generation

We’ll use `PromptTemplate` to send the articles to our LLM for summarization.

By using `.invoke()` method on the `PromptTemplate`, we can submit queries to the LLM and get the results. We’ll define a `PromptTemplate` called `summarization_prompt`, and then use the `summarize_articles()` function to generate and return summaries for each article in a list.

In [10]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

# Define the prompt template for summarizing articles
summarization_prompt = PromptTemplate.from_template(
    "Summarize the following text about the stock ticker {ticker} in less than 25 words:\n\n{text}"
)

# Create a summary chain using the prompt template, LLM, and parser
summary_chain = summarization_prompt | llm | parser

def summarize_articles(articles: List[Dict[str, Any]], max_chars: int = 200) -> List[str]:
    """
    Summarize the full text of each article, ensuring summaries are less than 25 words.

    :param articles: A list of dictionaries representing news articles.
    :param max_chars: Maximum number of characters to include from the full text for summarization.
    :return: A list of summaries, one for each article.
    """
    summaries = []
    for article in articles:
        if len(article["full_text"]) > 100:
            # Generate a summary for longer articles
            summary = summary_chain.invoke({'ticker': article['ticker'], 'text': article["full_text"][:max_chars]})
        else:
            # Use the full text for shorter articles
            summary = article["full_text"]
        summaries.append(summary)
        
    return summaries

# Summarize the articles
summaries = summarize_articles(articles)


We’ll add the generated summaries to each article's data. Then, we'll convert our list of articles into a pandas DataFrame for efficient filtering, storage, and analysis of both the articles and their summaries.

In [11]:
# Adding generated summaries to articles data
for article, summary in zip(articles, summaries):
    article['summary'] = summary  # Add the summary to each article dictionary

# displaying the articles data in a tabular format
df = pd.DataFrame(articles)
df.head(3)

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content,ticker,url_hash,full_text,summary
0,"{'id': None, 'name': 'Yahoo Entertainment'}",Zacks Equity Research,Nvidia (NVDA) Just Flashed Golden Cross Signal...,Should investors be excited or worried when a ...,https://finance.yahoo.com/news/nvidia-nvda-jus...,https://media.zenfs.com/en/zacks.com/2f0d6efdd...,2024-07-09T13:35:03Z,Nvidia (NVDA) reached a significant support le...,NVDA,27508037f9eaaa429b86228c99b27964,Nvidia (NVDA) reached a significant support le...,Nvidia (NVDA) hit a key support level and brok...
1,"{'id': None, 'name': 'Yahoo Entertainment'}",Soumya Eswaran,Here’s Why L1 Capital International Fund was N...,"L1 Capital, an investment management firm, rel...",https://finance.yahoo.com/news/why-l1-capital-...,https://media.zenfs.com/en/insidermonkey.com/c...,2024-07-10T12:16:53Z,"L1 Capital, an investment management firm, rel...",NVDA,1dd78781ade45ea9b1ec2f187b8d889c,"L1 Capital, an investment management firm, rel...",L1 Capital's second quarter 2024 investor lett...
2,"{'id': None, 'name': 'Yahoo Entertainment'}",Fahad Saleem,Analyst: AI Revolution is Being Led by ‘Godfat...,We recently published a list of 10 Best AI Sto...,https://finance.yahoo.com/news/analyst-ai-revo...,https://media.zenfs.com/en/insidermonkey.com/c...,2024-07-09T11:51:44Z,We recently published a list of 10 Best AI Sto...,NVDA,769bd6185f1bd816865696c2f64cac56,We recently published a list of 10 Best AI Sto...,NVIDIA (NVDA) ranks 2nd on the list of 10 Best...


### Sentiment Analysis

We'll use `ChatPromptTemplate` to send the generated summaries to our LLM for sentiment analysis.

In `ChatPromptTemplate`, messages are categorized into roles to define the context and content of the conversation with the language model. The roles typically include `system` and `user`.

- System Message: Sets the context or role for the language model. It provides instructions or information about how the model should behave during the interaction.
- User Message: Provides the specific query or task for the model to handle, directing its response to meet the user's needs.


Here, we’ll create a `ChatPromptTemplate` named `sentiment_prompt` and use the `analyze_article_sentiment()` function to generate recommendations based on each article summary.

In [12]:
from langchain_core.prompts import ChatPromptTemplate

# Define a prompt template for sentiment analysis
sentiment_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a financial adviser"),  # System message to set context
    ("user", "Analyze the sentiment of the following article about {ticker} and return only one word as your recommendation (BUY, HOLD, or SELL):\n\n{text}")
])

In [14]:
# Initialize LLMChains for sentiment analysis using the defined prompt template and parser
sentiment_chain = sentiment_prompt | llm | parser

def analyze_article_sentiment(
    articles: List[Dict[str, Union[str, None]]],
    used_field: str = 'summary',
    verbose: bool = False
) -> List[Optional[str]]:
    """
    Analyze the sentiment of articles and provide a recommendation based on the sentiment.

    :param articles: A list of dictionaries representing news articles.
    :param used_field: The field in each article to use for sentiment analysis (default is 'summary').
    :param verbose: If True, prints additional information including the text being analyzed and the resulting recommendation.

    :return: A list of sentiment recommendations, one for each article.
    """
    recommendations = []
    for article in articles:
       
        
        if len(article[used_field]) > 20:
            # Invoke the sentiment chain to get a recommendation
            recommendation = sentiment_chain.invoke({'ticker': article['ticker'], 'text': article[used_field]}).upper()
        else:
            # If the text is too short, no recommendation is provided
            recommendation = None
            
        recommendations.append(recommendation)
        if verbose:
            print(f'{used_field}: {article[used_field]}')  # Print the text being analyzed
            print(f'recommendation: {recommendation}')  # Print the recommendation

    return recommendations

# Analyze sentiment for the list of articles
recommendations = analyze_article_sentiment(articles, used_field='summary', verbose=True)


summary: Nvidia (NVDA) hit a key support level and broke the 20-day moving average, indicating a potential buying opportunity for investors.
recommendation: BUY
summary: L1 Capital's second quarter 2024 investor letter discusses its "L1 Capital International Fund," highlighting insights on investments, including NVDA.
recommendation: BUY
summary: NVIDIA (NVDA) ranks 2nd on the list of 10 Best AI Stocks for H2 2024, warranting further analysis.
recommendation: BUY
summary: Action News Now is investigating claims of potential Shigella infections linked to swimming in Chico's Sycamore Pool following a fire that started Wednesday.
recommendation: SELL
summary: Action News Now investigates claims of potential Shigella infections linked to swimming in Chico's Sycamore Pool, alongside reports of a fire.
recommendation: HOLD
summary: Action News Now is investigating claims of potential Shigella infections linked to swimming in Chico's Sycamore Pool amid a fire incident.
recommendation: SELL
su

Let's look at the news articles, generated summaries, and generated recommendations.

In [15]:
df['recommendations'] = recommendations
df

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content,ticker,url_hash,full_text,summary,recommendations
0,"{'id': None, 'name': 'Yahoo Entertainment'}",Zacks Equity Research,Nvidia (NVDA) Just Flashed Golden Cross Signal...,Should investors be excited or worried when a ...,https://finance.yahoo.com/news/nvidia-nvda-jus...,https://media.zenfs.com/en/zacks.com/2f0d6efdd...,2024-07-09T13:35:03Z,Nvidia (NVDA) reached a significant support le...,NVDA,27508037f9eaaa429b86228c99b27964,Nvidia (NVDA) reached a significant support le...,Nvidia (NVDA) hit a key support level and brok...,BUY
1,"{'id': None, 'name': 'Yahoo Entertainment'}",Soumya Eswaran,Here’s Why L1 Capital International Fund was N...,"L1 Capital, an investment management firm, rel...",https://finance.yahoo.com/news/why-l1-capital-...,https://media.zenfs.com/en/insidermonkey.com/c...,2024-07-10T12:16:53Z,"L1 Capital, an investment management firm, rel...",NVDA,1dd78781ade45ea9b1ec2f187b8d889c,"L1 Capital, an investment management firm, rel...",L1 Capital's second quarter 2024 investor lett...,BUY
2,"{'id': None, 'name': 'Yahoo Entertainment'}",Fahad Saleem,Analyst: AI Revolution is Being Led by ‘Godfat...,We recently published a list of 10 Best AI Sto...,https://finance.yahoo.com/news/analyst-ai-revo...,https://media.zenfs.com/en/insidermonkey.com/c...,2024-07-09T11:51:44Z,We recently published a list of 10 Best AI Sto...,NVDA,769bd6185f1bd816865696c2f64cac56,We recently published a list of 10 Best AI Sto...,NVIDIA (NVDA) ranks 2nd on the list of 10 Best...,BUY
3,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,Benchmark Just Raised Its Price Target on Nvid...,Nvidia (NASDAQ:NVDA) stock is up 2% today afte...,https://biztoc.com/x/a7b59e1f609d3569,https://biztoc.com/cdn/a7b59e1f609d3569_s.webp,2024-07-12T17:32:33Z,Nvidia (NASDAQ:NVDA) stock is up 2% today afte...,NVDA,d06ffa7b2fb10fd144b79cb7af2723dd,Action News Now is digging into some of your c...,Action News Now is investigating claims of pot...,SELL
4,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,Benchmark Just Raised Its Price Target on Nvid...,Nvidia (NASDAQ:NVDA) stock is up 2% today afte...,https://biztoc.com/x/ac36231b16803f20,https://biztoc.com/cdn/ac36231b16803f20_s.webp,2024-07-14T10:46:28Z,Nvidia (NASDAQ:NVDA) stock is up 2% today afte...,NVDA,68bd7b80b233f1591ead737d72b70f57,Action News Now is digging into some of your c...,Action News Now investigates claims of potenti...,HOLD
5,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,KeyBanc Just Raised Its Price Target on Nvidia...,Love continues to pour in for Nvidia (NASDAQ:N...,https://biztoc.com/x/0fc674db4b579648,https://biztoc.com/cdn/0fc674db4b579648_s.webp,2024-07-09T20:40:43Z,Love continues to pour in for Nvidia (NASDAQ:N...,NVDA,1fe2d15a6d187c16d8b71164949430a1,Action News Now is digging into some of your c...,Action News Now is investigating claims of pot...,SELL
6,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,NVDA Stock Warning: EU Watchdogs Are Sizing Up...,"InvestorPlace - Stock Market News, Stock Advic...",https://biztoc.com/x/bddbff96bcd53897,https://biztoc.com/cdn/799/og.png,2024-07-08T15:41:52Z,"InvestorPlace - Stock Market News, Stock Advic...",NVDA,b19f7a53db9b3483284a04934269f758,,,
7,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,Rep. Josh Gottheimer Is Buying Up Nvidia (NVDA...,It seems more members of Congress are getting ...,https://biztoc.com/x/f2acd15c3bfe2cd7,https://biztoc.com/cdn/799/og.png,2024-07-10T20:01:25Z,It seems more members of Congress are getting ...,NVDA,ad5c020dc0a9219ac4ccfdf2dd0d6f42,,,
8,"{'id': None, 'name': 'Biztoc.com'}",investorplace.com,UBS Just Raised Its Price Target on Nvidia (NV...,Nvidia (NASDAQ:NVDA) stock is in the green to ...,https://biztoc.com/x/41442d50b9ca9e77,https://biztoc.com/cdn/41442d50b9ca9e77_s.webp,2024-07-08T17:11:59Z,Nvidia (NASDAQ:NVDA) stock is in the green to ...,NVDA,8292f69a39c3982bfdc8098cd5133f0f,,,
9,"{'id': None, 'name': 'ETF Daily News'}",MarketBeat News,GraniteShares 2x Long NVDA Daily ETF (NASDAQ:N...,GraniteShares 2x Long NVDA Daily ETF (NASDAQ:N...,https://www.etfdailynews.com/2024/07/09/granit...,https://www.americanbankingnews.com/wp-content...,2024-07-09T15:06:45Z,GraniteShares 2x Long NVDA Daily ETF (NASDAQ:N...,NVDA,a13a7189b031be71fcb63f99dc5375e8,"\nPosted by MarketBeat News on Jul 9th, 2024\n...",GraniteShares 2x Long NVDA Daily ETF (NVDL) sa...,BUY


### Final recommendation 
To determine the most common sentiment recommendation for a given ticker, we can count the occurrences of each recommendation (buy, sell, hold) and use the most frequent one as the model's financial advice.

In [16]:
# Count the occurrences of each sentiment recommendation
recommendation_counts = Counter(recommendations)

# Print the counts of each recommendation
print(recommendation_counts)

Counter({'BUY': 4, None: 3, 'SELL': 2, 'HOLD': 1})


## Ticker Scores
We'll define a function to convert the recommendations into a numerical score, employing the following logic:

1. **`BUY`**: Assign a value of +1.
2. **`SELL`**: Assign a value of -1.
3. **`HOLD`**: Assign a value of 0.

The function will sum these numerical values to produce a total score based on the list of recommendations.


In [18]:
def ticker_score(recommendations):
    """
    Convert a list of sentiment recommendations to numerical values.

    :param recommendations: List of sentiment recommendations (e.g., ['buy', 'sell', 'hold'])
    :return: List of numerical values corresponding to each recommendation
    """
    sentiment_map = {
        'BUY': +1,
        'SELL': -1,
        'HOLD': 0,
    }
    
    return sum([sentiment_map.get(rec, 0) for rec in recommendations])


score = ticker_score(recommendations)
print(f'{ticker} score: {score} out of {len(recommendations)}')  


NVDA score: 2 out of 10


### Note:
In this project, we call the LLM twice for each article: first for summarization and then for financial advice recommendations (sentiment analysis). Although we could streamline this by sending the article directly for sentiment analysis to reduce API calls and costs, we chose a two-step process for the following reasons:

- To demonstrate the distinction between `PromptTemplate` and `ChatPromptTemplate` classes.
- To provide article summaries, which are useful for reviewing individual articles that led to specific recommendations.

## Model Performance

To verify the model's performance, we can use backtesting in two ways:

1. **Backtesting for a Single Stock Over Time**: Apply the model’s recommendations to historical data of a single stock across different time periods, to assess how well the model’s recommendations align with the stock’s actual performance over time, such as whether following the model’s advice would have led to profitable trades.

2. **Backtesting for Multiple Stocks Within the Same Time Period**: Generate recommendations for multiple stocks within the same time period to assess the model’s performance across different stocks under the same market conditions.

Both methods allow us to test the model’s accuracy and reliability, but focusing on one stock offers a detailed look at performance for that specific stock, while evaluating multiple stocks gives a more comprehensive view of the model’s general effectiveness.

### Backtesting for Multiple Stocks 
The free version of the news API restricts article searches to the past 30 days. Therefore, we'll evaluate our model by analyzing articles for 
multiple stocks within a given week and examining the correlation between the recommendations and the stocks' performance in the following week.


`articles_from_date` and `articles_to_date` define the start and end dates for retrieving articles. In this instance, articles are fetched from June 30, 2024, to July 7, 2024.

In [42]:
# Set the tickers and time period for article analysis.
tickers = ['AAPL', 'NVDA', 'TSLA', 'META', 'JPM']

articles_from_date = '2024-06-30' 
articles_to_date = '2024-07-07'

#### Sentiment analysis and Scoring
This code snippet processes news articles for a list of stock tickers, performs sentiment analysis, and calculates sentiment scores:

- **Fetching and Processing Articles**:
   - `get_news_articles(...)`: Fetches news articles related to the ticker within the given date range, limiting the results to 10 articles.
   - `process_articles(...)`: Processes the fetched articles, assigning the ticker to each article and preparing the articles for analysis (in this case, it doesn’t download full text).
   - `cache_articles(...)`: Caches the processed articles (e.g., saving to a file or database for future use).

- **Sentiment Analysis**:
   - `analyze_article_sentiment(...)`: Analyzes the sentiment of the articles using the `content` field for sentiment analysis.
   - `ticker_recommendations[ticker] = Counter(recommendations)`: Stores the sentiment recommendations for each ticker, using `Counter` to count the frequency of each recommendation.
   - Note that in this section, we will use the content field of the articles directly from the News API for sentiment analysis. As a result, downloading the full text and generating summaries is not required.
   
- **Scoring**:
   - `ticker_scores[ticker] = ticker_score(recommendations)`: Calculates and stores the numerical sentiment score for each ticker using the `ticker_score` function.


The results are stored in dictionaries for later use, and the recommendations are printed for each ticker.

In [43]:
ticker_recommendations = {}
ticker_scores = {}
for ticker in tickers:
    print(f'processing articles for {ticker} from {articles_from_date} to {articles_to_date}...')
    articles = get_news_articles(q=ticker, from_date=articles_from_date, to_date=articles_to_date, page_size=10)
    articles = process_articles(articles, ticker, download_full_text=False)
    cache_articles(articles)
    recommendations = analyze_article_sentiment(articles, used_field='content')
    ticker_recommendations[ticker] = Counter(recommendations)
    ticker_scores[ticker] = ticker_score(recommendations)    
    print(f'recommendations: {ticker_recommendations[ticker]}')



processing articles for AAPL from 2024-06-30 to 2024-07-07...
recommendations: Counter({'HOLD': 6, 'BUY': 2})
processing articles for NVDA from 2024-06-30 to 2024-07-07...
recommendations: Counter({'BUY': 4, 'HOLD': 4, 'SELL': 2})
processing articles for TSLA from 2024-06-30 to 2024-07-07...
recommendations: Counter({'HOLD': 5, 'BUY': 5})
processing articles for META from 2024-06-30 to 2024-07-07...
recommendations: Counter({'HOLD': 6, 'BUY': 3, 'SELL': 1})
processing articles for JPM from 2024-06-30 to 2024-07-07...
recommendations: Counter({'HOLD': 4, 'BUY': 1})


### Retrieve historical stock price data
To assess the performance of each stock ticker in the week following the articles' publication, we'll use the `calculate_performance` function. This function retrieves stock prices from the yfinance API and computes the percentage change over the specified period.

In [44]:
def calculate_performance(ticker: str, from_date: str, to_date: str) -> dict:
    """
    Fetch adjusted close prices for a given ticker and calculate performance between two dates.

    :param ticker: Stock ticker symbol.
    :param from_date: Start date for fetching data (format: 'YYYY-MM-DD').
    :param to_date: End date for fetching data (format: 'YYYY-MM-DD').
    :return: Dictionary with the start price, end price, and percentage change.
    """
    
    # Fetch historical data
    data = yf.download(ticker, start=from_date, end=to_date)['Adj Close']
    
    # Check if data is available
    if len(data) < 2:
        return {'start_price': None, 'end_price': None, 'pct_change': None}
    
    start_price = data.iloc[0]
    end_price = data.iloc[-1]
    
    # Calculate percentage change
    pct_change = ((end_price - start_price) / start_price) * 100
    
    # Create performance dictionary
    ticker_performance = {
        'start_price': start_price,
        'end_price': end_price,
        'pct_change': pct_change
    }
    
    return ticker_performance

`performance_from_date` and `performance_to_date` are set to specify the start and end dates for evaluating stock performance. In this case, the performance is measured from July 7, 2024, to July 14, 2024.

In [45]:
# Define the date range
performance_from_date = '2024-07-07'
performance_to_date = '2024-07-14'

# Calculate performance
ticker_performance = {ticker: calculate_performance(ticker, performance_from_date, performance_to_date) for ticker in tickers}
df_tickers = pd.DataFrame(ticker_performance).T
df_tickers

[*********************100%%**********************]  1 of 1 completed


[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Unnamed: 0,start_price,end_price,pct_change
AAPL,227.820007,230.539993,1.193919
NVDA,128.199997,129.240005,0.811239
TSLA,252.940002,248.229996,-1.862104
META,529.320007,498.869995,-5.752666
JPM,205.169998,204.940002,-0.1121


### Model accuracy
Finally, we merge the stock performance data (df_tickers) from historical price data with the score data (df_scores) derived from our sentiment analysis of financial articles. This combination provides a comparative view of performance metrics and scores, enabling us to assess the accuracy of our model across different stock tickers during this period.

In [49]:
df_scores = pd.Series(ticker_scores, name='Score')
merged_df = df_tickers.join(df_scores)
merged_df

Unnamed: 0,start_price,end_price,pct_change,Score
AAPL,227.820007,230.539993,1.193919,2
NVDA,128.199997,129.240005,0.811239,2
TSLA,252.940002,248.229996,-1.862104,5
META,529.320007,498.869995,-5.752666,2
JPM,205.169998,204.940002,-0.1121,1


### Final remarks:

It’s important to note that sentiment analysis results may not always align with short-term stock performance for several reasons:

- Short-term stock price changes may not immediately reflect sentiment.
- Interpretation of articles can be subjective, even among human analysts.
- Stock performance is influenced by factors beyond the scope of the analyzed articles.
- The articles reviewed may not fully capture current market conditions.

Note: This project is for educational purposes only and should NOT be used as a financial tool or for making trading decisions.
