## Main idea

Combine 
- fundamental,
- technical (including weekly RSI), and
- financial statements analysis

to find stocks that are attractive to buy long term (as a buy and hold). Ideal holding period is 1 to 5 years.

Goal is to hold these medium/long term and not worry so much about the allocation, then when the stocks look less attractive, then sell them (or a portion).

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from groq import Groq
import re
from datetime import datetime, timedelta

In [None]:
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv('groq_api.env')

In [None]:
# Initialize Groq client
groq_api_key = os.getenv('GROQ_API_KEY')
groq_client = Groq(api_key=groq_api_key)

## Data collection

### TODO
Extend the function below to get all, or top 100 sp500 stocks, plus any that we currently own

In [None]:
def get_top_sp500_stocks(n=20):
    # Declare top 10 stocks in S&P 500 (for demonstration)
    tickers = ['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'GOOG', 'META', 'TSLA', 'BRK.B', 'JPM', 'V', 'DASH', 'NFLX', 'DIS', 'SBUX', 'BABA', 'NVDA', 'BIDU', 'XOM', 'PEG', 'CEG', 'BWXT', 'NEM', 'GFI', 'HMY', 'CVX', 'AVGO', 'HD', 'PG', 'WMT', 'JNJ', 'ABBV']
    
    # Get market cap for each stock
    market_caps = {}
    for ticker in tickers:
        stock = yf.Ticker(ticker)
        market_caps[ticker] = stock.info.get('marketCap', 0)
    
    # Sort by market cap and get top n
    top_stocks = sorted(market_caps.items(), key=lambda x: x[1], reverse=True)[:n]
    return [stock[0] for stock in top_stocks]

In [None]:
top_stocks = get_top_sp500_stocks(20)
pd.DataFrame(top_stocks, columns=['Ticker']).to_csv('top_20_stocks.csv', index=False)

## Retrieve financial data

Retrieve both historical price data and income statements for each stock over the past five years.

In [None]:
def get_financial_data(ticker, start_date, end_date):
    stock = yf.Ticker(ticker)
    
    # Get historical price data
    price_data = stock.history(start=start_date, end=end_date)
    
    # Get income statement
    income_statement = stock.financials
    
    return {
        "price_data": price_data,
        "income_statement": income_statement
    }

In [None]:
start_date = datetime.now() - timedelta(days=5*365)
end_date = datetime.now()

all_data = {}
for ticker in top_stocks:
    all_data[ticker] = get_financial_data(ticker, start_date, end_date)

In [None]:
# all_data

In [None]:
def format_income_statement_for_llm(income_statement_column):
    formatted_text = ""
    for index, value in income_statement_column.items():
        formatted_value = f"{value:,.2f}" if isinstance(value, (int, float)) else str(value)
        formatted_text += f"{index}: {formatted_value}\n"
    return formatted_text.strip()

In [None]:
# Example usage
for ticker, data in all_data.items():
    current_year = data['income_statement'].columns[0]
    formatted_statement = format_income_statement_for_llm(data['income_statement'][current_year])
    print(f"Formatted Income Statement for {ticker}:\n{formatted_statement}\n")

## Prompt for income statement evaluation

### TODO

Update the prompt to figure out which financial metrics should be used out of the ones available in yfinance

In [None]:
def create_prompt_for_income_statement_v0(current_year_income_statement, previous_year_income_statement):
    prompt = f"""
Evaluate the following income statements for the current year and the previous year. Provide a score between 0 and 10 for each criterion, where 0 is very poor and 10 is excellent. Consider criteria such as revenue growth, profitability, operating efficiency, and earnings quality. Additionally, provide an overall score based on the average of the criteria scores.

Income Statement for the Current Year:
{current_year_income_statement}

Income Statement for the Previous Year:
{previous_year_income_statement}

Criteria for Evaluation:
1. Revenue Growth: Analyze the growth in revenue compared to the previous year.
2. Gross Profit Margin: Calculate as Gross Profit / Total Revenue.
3. Operating Margin: Calculate as Operating Income / Total Revenue.
4. Net Profit Margin: Calculate as Net Income / Total Revenue.
5. EPS Growth: Compare EPS to the previous year.
6. Operating Efficiency: Consider Operating Expense relative to Total Revenue.
7. Interest Coverage Ratio: Calculate as EBIT / Interest Expense.

Provide the score for each criterion and an overall score. Include explanations for each score.
    """
    return prompt

def create_prompt_for_income_statement(current_year_income_statement, previous_year_income_statement):
    prompt = f"""
Evaluate the following income statements for the current year and the previous year. Provide a score between 0 and 10 for each criterion, where 0 is very poor and 10 is excellent. Consider criteria such as revenue growth, profitability, operating efficiency, and earnings quality. Additionally, provide an overall score based on the average of the criteria scores.

Income Statement for the Current Year:
{current_year_income_statement}

Income Statement for the Previous Year:
{previous_year_income_statement}

Criteria for Evaluation:
1. Revenue Growth: Analyze the growth in revenue compared to the previous year.
2. Gross Profit Margin: Calculate as Gross Profit / Total Revenue.
3. Operating Margin: Calculate as Operating Income / Total Revenue.
4. Net Profit Margin: Calculate as Net Income / Total Revenue.
5. EPS Growth: Analyze the growth in Earnings Per Share (EPS) compared to the previous year.
6. Operating Efficiency: Consider Operating Expense relative to Total Revenue.
7. Interest Coverage Ratio: Calculate as Earnings Before Interest and Taxes (EBIT) / Interest Expense.

Example Format:
1. Revenue Growth: 8.5
2. Gross Profit Margin: 7.0
3. Operating Margin: 6.5
4. Net Profit Margin: 7.0
5. EPS Growth: 8.0
6. Operating Efficiency: 7.5
7. Interest Coverage Ratio: 9.0
Overall Score: 7.5

Provide the score for each criterion and an overall score. Include explanations for each score.
    """
    return prompt

In [None]:
def evaluate_income_statements_llm(current_year_income_statement, previous_year_income_statement):
    prompt = create_prompt_for_income_statement(current_year_income_statement, previous_year_income_statement)
    response = groq_client.chat.completions.create(
        messages=[{"role": "user", "content": prompt}],
        model="llama-3.1-8b-instant",
        temperature=0.2,
        max_tokens=1000
    )
    analysis = response.choices[0].message.content.strip()
    score = re.search(r"Overall Score: (\d+\.\d+)", analysis)
    return float(score.group(1)) if score else None

In [None]:
def process_llm_output_v0(llm_output):
    # Extract individual criterion scores
    criterion_scores = re.findall(r"(\d+\.\s*[\w\s]+):\s*(\d+(?:\.\d+)?)", llm_output)
    
    # Extract overall score
    overall_score = re.search(r"Overall Score: (\d+(?:\.\d+)?)", llm_output)
    
    return {
        'criterion_scores': dict(criterion_scores),
        'overall_score': float(overall_score.group(1)) if overall_score else None,
        'full_analysis': llm_output
    }

def process_llm_output(llm_output):
    # Extract individual criterion scores
    criterion_scores = re.findall(r"(\d+\.\s*[\w\s]+):\s*(\d+(?:\.\d+)?)", llm_output)
    
    # Create a dictionary to store the scores
    criterion_scores_dict = {score[0].strip(): float(score[1]) for score in criterion_scores}
    
    # Extract overall score
    overall_score = re.search(r"Overall Score:\s*(\d+(?:\.\d+)?)", llm_output)
    
    return {
        'criterion_scores': criterion_scores_dict,
        'overall_score': float(overall_score.group(1)) if overall_score else None,
        'full_analysis': llm_output
    }

## LLM stock evaluation

### TODO 

Add technical analysis, any other components to create a composite score (e.g. using combined or weighted ranking) for the overall scores

In [None]:
def evaluate_stock(ticker, start_date, end_date):
    data = get_financial_data(ticker, start_date, end_date)
    income_statement = data['income_statement']
    
    scores = []
    for i in range(len(income_statement.columns) - 1):
        current_year = format_income_statement_for_llm(income_statement.iloc[:, i])
        previous_year = format_income_statement_for_llm(income_statement.iloc[:, i+1])
        score = evaluate_income_statements_llm(current_year, previous_year)
        scores.append((income_statement.columns[i].year, score))
    
    return pd.DataFrame(scores, columns=['Year', 'Score'])

Calculate scores for all of our stocks

In [None]:
all_scores = []
for ticker in top_stocks:
    print(f"Evaluating {ticker}...")
    scores = evaluate_stock(ticker, start_date, end_date)
    scores['Ticker'] = ticker
    all_scores.append(scores)

all_scores = pd.concat(all_scores)

In [None]:
all_scores.tail(40)

## Visualizing scores

In [None]:
# Pivot the table
pivoted_scores = all_scores.pivot(index='Year', columns='Ticker', values='Score')

# Sort the index (Year) in descending order to have the most recent year first
pivoted_scores = pivoted_scores.sort_index(ascending=False)

# Keep only the last 3 years
pivoted_scores = pivoted_scores.head(3)

# Reset the index to make 'Year' a regular column
pivoted_scores = pivoted_scores.reset_index()

# Save the pivoted DataFrame to a CSV file
pivoted_scores.to_csv('pivoted_stock_scores.csv', index=False)

print(pivoted_scores)

## Backtesting