## 📊 **Financial Report Generation with Economic Indicators**

### **Overview**
This project focuses on creating a concise financial report for companies or stocks using the latest economic and market data. By leveraging open-source tools and APIs, we aim to simplify the process without relying on training or fine-tuning large language models (LLMs) or machine learning models.

---

### **Objectives**
- Build a financial report using real-time economic indicators from the **Financial Modeling Prep API**.
- Streamline data processing and retrieval to produce accurate and actionable insights.
- Avoid the computational overhead of training custom AI models by utilizing pre-trained open-source models.

---

### **Methodology**
1. **Data Retrieval**:  
   Fetch the latest company metrics and market economic indicators using the Financial Modeling Prep API.

2. **Data Preprocessing**:  
   Process the retrieved data using Python and save it in a structured CSV format.

3. **Vector Database**:  
   Load the processed data into a vector database using an embedding model from Hugging Face.

4. **RAG QA Chain**:  
   Build a Retrieval-Augmented Generation (RAG) architecture with **LangChain** and the **Falcon 7B LLM**.

5. **Evaluation**:  
   Query the RAG system and evaluate the quality and relevance of the responses.


### Installing Dependencies and Packages

#### Dependencies


- Install Anaconda from [Anaconda](https://www.anaconda.com/download/success)
- Create a conda virtual environment `conda create finance-venv`
- Activate the conda virtual environment `conda activate finance-venv`
- Install Rust from [Rust](https://rustup.rs/) 
- Install transformers from conda with `conda install -c huggingface transformers`
- Install sentence-transformers from conda with `conda install -c conda-forge sentence-transformers`


#### Python Packages
- langchain
- langchain-community
- langchain-core
- pandas
- python-dotenv
- torch
- torchvision
- torchaudio
- chromadb
- sentence-transformers

In [None]:
%pip install langchain langchain-community langchain-core pandas python-dotenv chromadb

In [None]:
%pip install --upgrade --force-reinstall torch torchvision torchaudio

### Importing Packages

In [6]:
from urllib.request import urlopen
import json
import pandas as pd
from urllib.error import URLError, HTTPError
import ssl
from dotenv import load_dotenv
import os
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import warnings
warnings.filterwarnings('ignore')


### Settings for Financial Modeling Prep

- Create an account on [financial modeling prep](https://site.financialmodelingprep.com/)
- Create a file **.env** in the project folder
- Set the API key in this file as `FINANCIAL_MODELING_PREP_API_KEY=YOUR_KEY`

In [7]:
load_dotenv()

API_KEY = os.getenv("FINANCIAL_MODELING_PREP_API_KEY")

### Data Retreival
This process demonstrates how to fetch economic indicators for a specific stock ticker using the **Financial Modeling Prep API**. It is designed to handle multiple exchanges and process the data into a structured format for further analysis.


In [None]:
url = f"https://financialmodelingprep.com/api/v3/stock/list?apikey={API_KEY}"

try:
    # Create SSL context
    ssl_context = ssl.create_default_context()

    # Fetch and decode data
    with urlopen(url, context=ssl_context) as response:
        data = response.read().decode("utf-8")

except HTTPError as e:
    print(f"HTTP Error: {e.code} - {e.reason}")
except URLError as e:
    print(f"URL Error: {e.reason}")
except json.JSONDecodeError as e:
    print(f"JSON Decode Error: {e.msg}")
except Exception as e:
    print(f"Unexpected error: {str(e)}")
    
data

In [8]:
TICKER = "NVDA"
EXCHANGE = "US"

url = f"https://financialmodelingprep.com/api/v3/stock/list?apikey={API_KEY}"


def get_economic_data(ticker, exchange):
  
  if exchange == "NSE":
    url = f"https://financialmodelingprep.com/api/v3/search?query={ticker}&exchange=NSE&apikey={API_KEY}"
  else:
    url = f"https://financialmodelingprep.com/api/v3/quote/{ticker}?apikey={API_KEY}"
  
  try:
      # Create SSL context
      ssl_context = ssl.create_default_context()

      # Fetch and decode data
      with urlopen(url, context=ssl_context) as response:
          data = response.read().decode("utf-8")
          return json.loads(data)
  
  except HTTPError as e:
      print(f"HTTP Error: {e.code} - {e.reason}")
  except URLError as e:
      print(f"URL Error: {e.reason}")
  except json.JSONDecodeError as e:
      print(f"JSON Decode Error: {e.msg}")
  except Exception as e:
      print(f"Unexpected error: {str(e)}")


economic_data_json = get_economic_data(TICKER, EXCHANGE)
economic_data_df = pd.DataFrame(economic_data_json)
economic_data_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,NVDA,NVIDIA Corporation,140.1,2.2553,3.09,134.02,140.27,152.89,47.32,3431049000000,...,NASDAQ,107893729,224002616,134.83,137.01,2.54,55.16,2025-02-26T21:00:00.000+0000,24490000000,1735582862


### Preprocessing Data

Converting columns of dataframe to date format

In [9]:
def preprocess_economic_data(df):
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['earningsAnnouncement'] = pd.to_datetime(df['earningsAnnouncement'])
    return df

preprocessed_economic_data_df = preprocess_economic_data(economic_data_df)
preprocessed_economic_data_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,NVDA,NVIDIA Corporation,140.1,2.2553,3.09,134.02,140.27,152.89,47.32,3431049000000,...,NASDAQ,107893729,224002616,134.83,137.01,2.54,55.16,2025-02-26 21:00:00+00:00,24490000000,1970-01-01 00:00:01.735582862


### Storing Preprocessed Data

Storing the preprocessed data as a CSV file

In [None]:
preprocessed_economic_data_df.to_csv("data/processed/eco_ind.csv")

### Embeddings

Initializing Embeddings

In [21]:
# Using Document loader from Huggingface to generate documents of CSV file
csv_loader = CSVLoader('data/processed/eco_ind.csv')
documents = csv_loader.load()

# Initializing text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)

# Splitting documents with text splitter
splitted_documents = text_splitter.split_documents(documents)

# Initializing Embeddings
embeddings = HuggingFaceEmbeddings()


### Vectore Database

Initializing a vector database and storing the embeddings of documents in the vector databbase

In [22]:
persist_directory = 'docs/chroma_rag/'

vectordb = Chroma.from_documents(
    documents=splitted_documents, 
    collection_name="economic_data",
    embedding=embeddings,
    persist_directory=persist_directory
)

vectordb.persist()

### Settings for Huggingfacehub API

In [23]:
load_dotenv()

HUGGINGFACEHUB_API_KEY = os.getenv("HUGGINGFACEHUB_API_KEY")

### RAG Pipeline

Building the Retreival Augmented Generation pipeline

In [34]:
# Initializing the LLM model
# llm = HuggingFaceHub(
#     repo_id="tiiuae/falcon-7b-instruct",
#     model_kwargs={"temperature": 0.1},
#     huggingfacehub_api_token = HUGGINGFACEHUB_API_KEY
# )

# llm = HuggingFaceHub(
#     repo_id="mistralai/Mixtral-8x22B-Instruct-v0.1",
#     model_kwargs={"temperature": 0.1, "max_tokens": 512},
#     huggingfacehub_api_token=HUGGINGFACEHUB_API_KEY
# )

llm = HuggingFaceHub(
    repo_id="meta-llama/Llama-3.2-1B-Instruct",
    model_kwargs={"temperature": 0.1, "max_tokens": 512},
    huggingfacehub_api_token=HUGGINGFACEHUB_API_KEY
)


# Initializing the retreiver for RAG Pipline
retriever = vectordb.as_retriever(search_kwargs={"k":2})

# Template prompt for RAG pipeline
template = """You are a Financial Market Expert. Using the provided market information: {context}, generate a financial report and answer this query: {question}."""

# Initialize prompt template
PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

user_prompt = "Generate a financial report for NVIDIA"

# Debug retrieved context
retrieved_context = retriever.get_relevant_documents(user_prompt)
print("Retrieved Context:", retrieved_context)

# Initialize retriever chain
retrieval_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    chain_type_kwargs={"prompt": PROMPT},
    retriever=retriever
)

# Query the model
llm_response = retrieval_chain({"query": user_prompt})
print("LLM Response:", llm_response)


Retrieved Context: [Document(metadata={'row': 0, 'source': 'data/processed/eco_ind.csv'}, page_content=': 0\nsymbol: NVDA\nname: NVIDIA Corporation'), Document(metadata={'row': 0, 'source': 'data/processed/eco_ind.csv'}, page_content=': 0\nsymbol: NVDA\nname: NVIDIA Corporation')]
LLM Response: {'query': 'Generate a financial report for NVIDIA', 'result': 'You are a Financial Market Expert. Using the provided market information: : 0\nsymbol: NVDA\nname: NVIDIA Corporation\n\n: 0\nsymbol: NVDA\nname: NVIDIA Corporation, generate a financial report and answer this query: Generate a financial report for NVIDIA. The report should include the current market price, the current stock price, the dividend yield, the price-to-earnings ratio, and the dividend payout ratio.\n\n: 1\nsymbol: NVDA\nname: NVIDIA Corporation, here is the financial report:\n\n**Current Market Price:** $430.00\n**Current Stock Price:** $430.00\n**Dividend Yield:** 2.1%\n**Price-to-Earnings Ratio:** 34.5\n**Dividend Payou

In [35]:
Markdown(llm_response['result'])

You are a Financial Market Expert. Using the provided market information: : 0
symbol: NVDA
name: NVIDIA Corporation

: 0
symbol: NVDA
name: NVIDIA Corporation, generate a financial report and answer this query: Generate a financial report for NVIDIA. The report should include the current market price, the current stock price, the dividend yield, the price-to-earnings ratio, and the dividend payout ratio.

: 1
symbol: NVDA
name: NVIDIA Corporation, here is the financial report:

**Current Market Price:** $430.00
**Current Stock Price:** $430.00
**Dividend Yield:** 2.1%
**Price-to-Earnings Ratio:** 34.5
**Dividend Payout Ratio