## 📊 **Financial Report Generation with Economic Indicators**

### **Overview**
This project focuses on creating a concise financial report for companies or stocks using the latest economic and market data. By leveraging open-source tools and APIs, we aim to simplify the process without relying on training or fine-tuning large language models (LLMs) or machine learning models.

---

### **Objectives**
- Build a financial report using real-time economic indicators from the **Financial Modeling Prep API**.
- Streamline data processing and retrieval to produce accurate and actionable insights.
- Avoid the computational overhead of training custom AI models by utilizing pre-trained open-source models.

---

### **Methodology**
1. **Data Retrieval**:  
   Fetch the latest company metrics and market economic indicators using the Financial Modeling Prep API.

2. **Data Preprocessing**:  
   Process the retrieved data using Python and save it in a structured CSV format.

3. **Vector Database**:  
   Load the processed data into a vector database using an embedding model from Hugging Face.

4. **RAG QA Chain**:  
   Build a Retrieval-Augmented Generation (RAG) architecture with **LangChain** and the **Falcon 7B LLM**.

5. **Evaluation**:  
   Query the RAG system and evaluate the quality and relevance of the responses.


### Installing Dependencies and Packages

#### Dependencies


- Install Anaconda from [Anaconda](https://www.anaconda.com/download/success)
- Create a conda virtual environment `conda create finance-venv`
- Activate the conda virtual environment `conda activate finance-venv`
- Install Rust from [Rust](https://rustup.rs/) 
- Install transformers from conda with `conda install -c huggingface transformers`
- Install sentence-transformers from conda with `conda install -c conda-forge sentence-transformers`


#### Python Packages
- langchain
- langchain-community
- langchain-core
- pandas
- python-dotenv
- torch
- torchvision
- torchaudio
- chromadb
- sentence-transformers

In [None]:
%pip install langchain langchain-community langchain-core pandas python-dotenv chromadb

In [None]:
%pip install --upgrade --force-reinstall torch torchvision torchaudio

### Importing Packages

In [21]:
from urllib.request import urlopen
import json
import pandas as pd
from urllib.error import URLError, HTTPError
import ssl
from dotenv import load_dotenv
import os
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import warnings
warnings.filterwarnings('ignore')

### Settings for Financial Modeling Prep

- Create an account on [financial modeling prep](https://site.financialmodelingprep.com/)
- Click on API URL button [API URL](https://site.financialmodelingprep.com/playground/)
- Copy the API Key from URL
- Create a file **.env** in the project folder
- Set the API key in this file as `FINANCIAL_MODELING_PREP_API_KEY=YOUR_KEY` in the **.env**

In [22]:
load_dotenv()

API_KEY = os.getenv("FINANCIAL_MODELING_PREP_API_KEY")

### Data Retreival
This process demonstrates how to fetch economic indicators for a specific stock ticker using the **Financial Modeling Prep API**. It is designed to handle multiple exchanges and process the data into a structured format for further analysis.


In [24]:
TICKER = "NVDA"
EXCHANGE = "US"

def get_economic_data(ticker, exchange):
  
  if exchange == "NSE":
    url = f"https://financialmodelingprep.com/api/v3/search?query={ticker}&exchange=NSE&apikey={API_KEY}"
  else:
    url = f"https://financialmodelingprep.com/api/v3/quote/{ticker}?apikey={API_KEY}"
  
  try:
      # Create SSL context
      ssl_context = ssl.create_default_context()

      # Fetch and decode data
      with urlopen(url, context=ssl_context) as response:
          data = response.read().decode("utf-8")
          return json.loads(data)
  
  except HTTPError as e:
      print(f"HTTP Error: {e.code} - {e.reason}")
  except URLError as e:
      print(f"URL Error: {e.reason}")
  except json.JSONDecodeError as e:
      print(f"JSON Decode Error: {e.msg}")
  except Exception as e:
      print(f"Unexpected error: {str(e)}")


economic_data_json = get_economic_data(TICKER, EXCHANGE)
economic_data_df = pd.DataFrame(economic_data_json)
economic_data_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,NVDA,NVIDIA Corporation,138.41,1.0218,1.4,134.02,140.27,152.89,47.32,3389660900000,...,NASDAQ,142263342,224002616,134.83,137.01,2.53,54.71,2025-02-26T21:00:00.000+0000,24490000000,1735591129


### Preprocessing Data

Converting columns of dataframe that contains time to date time format for better analytics

In [26]:
def preprocess_economic_data(df):
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['earningsAnnouncement'] = pd.to_datetime(df['earningsAnnouncement'])
    return df

preprocessed_economic_data_df = preprocess_economic_data(economic_data_df)
preprocessed_economic_data_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,NVDA,NVIDIA Corporation,138.41,1.0218,1.4,134.02,140.27,152.89,47.32,3389660900000,...,NASDAQ,142263342,224002616,134.83,137.01,2.53,54.71,2025-02-26 21:00:00+00:00,24490000000,1970-01-01 00:00:01.735591129


### Storing Preprocessed Data

Storing the preprocessed data as a CSV file

In [27]:
preprocessed_economic_data_df.to_csv("data/processed/eco_ind.csv")

### Preparation of Data for Embeddings

This code block performs the following steps to process and prepare a CSV file for use in document-based machine learning tasks, such as retrieval-augmented generation (RAG): 

- The `CSVLoader` from Huggingface is used to load the contents of the CSV file located at `data/processed/eco_ind.csv`. The data is loaded as a list of documents, where each document represents a row or section of the CSV file.
- A `RecursiveCharacterTextSplitter` is defined with a `chunk_size` of 50 characters and an overlap of 5 characters. This ensures that large documents are split into smaller, manageable chunks while maintaining some overlap for context preservation.
- The loaded documents are processed through the `text_splitter`, resulting in a list of smaller, split documents. These chunks are suitable for embedding and querying tasks.
- A `HuggingFaceEmbeddings` instance is initialized. This embedding model will be used to convert the document chunks into dense vector representations for similarity search or retrieval tasks.

This pipeline prepares the CSV data for downstream applications like vector storage and query-based retrieval.

**Note:** Retrieval-Augmented Generation (RAG) is an AI technique that combines information retrieval with generative models. It retrieves relevant documents from a database or knowledge base and uses them as context for a language model to generate accurate and context-aware responses.

In [28]:
# Using Document loader from Huggingface to generate documents of CSV file
csv_loader = CSVLoader('data/processed/eco_ind.csv')
documents = csv_loader.load()

# Initializing text splitter
# text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)

# Splitting documents with text splitter
# splitted_documents = text_splitter.split_documents(documents)

# Initializing Embeddings
embeddings = HuggingFaceEmbeddings()


### Vectore Database

Initializes and persists a vector database using Chroma for efficient document storage and retrieval in a Retrieval-Augmented Generation (RAG) pipeline:

- `persist_directory` specifies the location (`'docs/chroma_rag/'`) where the vector database will be stored for future use.
- `Chroma.from_documents` creates a vector database by:
    - **Documents**: Adding the pre-processed documents.
    - **Collection Name**: Assigning a logical group identifier (`"economic_data"`) for the stored data.
    - **Embeddings**: Converting the documents into dense vector representations using the previously initialized embedding model.
    - **Persist Directory**: Storing the vector database in the specified directory.
- `vectordb.persist()` ensures the vector database is saved on disk, making it reusable for subsequent tasks without needing to reload or reprocess the data.

This setup allows efficient document retrieval for RAG pipelines or other similarity-based search tasks

**Note:** Embeddings are dense vector representations of text or data that capture semantic meaning in a numerical format. They enable efficient similarity search, clustering, and retrieval by mapping similar items closer in vector space.


In [29]:
persist_directory = 'docs/chroma_rag/'

vectordb = Chroma.from_documents(
    documents=documents, 
    collection_name="economic_data",
    embedding=embeddings,
    persist_directory=persist_directory
)

vectordb.persist()

### Settings for Huggingfacehub API

- Create an account on [Hugging Face](https://huggingface.co/)
- Create a new token from [Hugging Face Tokens](https://huggingface.co/settings/token))
- Create a file **.env** in the project folder
- Set the API key in this file as `HUGGINGFACEHUB_API_KEY=YOUR_KEY`

In [30]:
load_dotenv()

HUGGINGFACEHUB_API_KEY = os.getenv("HUGGINGFACEHUB_API_KEY")

### RAG Pipeline

Implements a Retrieval-Augmented Generation (RAG) pipeline for generating detailed financial reports using a Hugging Face large language model (LLM) and a Chroma-based retriever. Here's an overview of each step:

1. Initialize the LLM:
   - The `HuggingFaceHub` is used to load the `tiiuae/falcon-7b-instruct` model.
   - A low `temperature` of 0.1 is set for more deterministic outputs, and an API key is provided for access.

2. Initialize the Retriever:
   - A retriever is created from the Chroma vector database, enabling the retrieval of the top 2 most relevant documents (`search_kwargs={"k": 2}`) based on the user's query.

3. Define the Prompt Template:
   - A prompt template is created to instruct the LLM to act as a financial market expert, using the retrieved context (`{context}`) and the user query (`{question}`) to generate a comprehensive financial report.

4. Retrieve Relevant Context:
   - The retriever fetches the most relevant documents from the vector database based on the user's prompt. This step ensures the model has the necessary context for answering the query.

5. Initialize the Retrieval Chain**:
   - The `RetrievalQA` chain combines the retriever and the LLM into a single pipeline.
   - The retrieved context is passed through the prompt template to guide the LLM in generating accurate and detailed responses.

6. Query the Model:
   - The user's prompt is passed to the RAG pipeline, which retrieves relevant context from vector database and generates a detailed financial report using the LLM.
   - The response is printed to display the generated financial report.

This pipeline efficiently combines document retrieval and generative AI to provide context-aware and accurate outputs.

In [31]:
# Initializing the LLM model
llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
    huggingfacehub_api_token = HUGGINGFACEHUB_API_KEY
)


# Initializing the retreiver for RAG Pipline
retriever = vectordb.as_retriever(search_kwargs={"k":2})

# Template prompt for RAG pipeline
template = """You are a Financial Market Expert. Using the provided market information: {context}, generate an extensive financial report and answer this query: {question}."""

# Initialize prompt template
PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

user_prompt = "Provide an extensive analysis for NVIDIA Corporation and generate a small financial report for it"

# Debug retrieved context
retrieved_context = retriever.get_relevant_documents(user_prompt)
print("Retrieved Context:", retrieved_context)

# Initialize retriever chain
retrieval_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    chain_type_kwargs={"prompt": PROMPT},
    retriever=retriever
)

# Query the model
llm_response = retrieval_chain({"query": user_prompt})
print("LLM Response:", llm_response)

Retrieved Context: [Document(metadata={'row': 0, 'source': 'data/processed/eco_ind.csv'}, page_content=': 0\nsymbol: NVDA\nname: NVIDIA Corporation\nprice: 138.41\nchangesPercentage: 1.0218\nchange: 1.4\ndayLow: 134.02\ndayHigh: 140.27\nyearHigh: 152.89\nyearLow: 47.32\nmarketCap: 3389660900000\npriceAvg50: 139.9534\npriceAvg200: 117.38416\nexchange: NASDAQ\nvolume: 142263342\navgVolume: 224002616\nopen: 134.83\npreviousClose: 137.01\neps: 2.53\npe: 54.71\nearningsAnnouncement: 2025-02-26 21:00:00+00:00\nsharesOutstanding: 24490000000\ntimestamp: 1970-01-01 00:00:01.735591129'), Document(metadata={'row': 0, 'source': 'data/processed/eco_ind.csv'}, page_content=': 0\nsymbol: NVDA\nname: NVIDIA Corporation\nprice: 137.01\nchangesPercentage: -2.0868\nchange: -2.92\ndayLow: 134.71\ndayHigh: 139.02\nyearHigh: 152.89\nyearLow: 47.32\nmarketCap: 3355374900000\npriceAvg50: 139.9276\npriceAvg200: 117.15355\nexchange: NASDAQ\nvolume: 169431279\navgVolume: 224910079\nopen: 138.555\npreviousClose:

### Markdown Result

Description of the result in markdown format.

In [32]:
Markdown(llm_response['result'])

You are a Financial Market Expert. Using the provided market information: : 0
symbol: NVDA
name: NVIDIA Corporation
price: 138.41
changesPercentage: 1.0218
change: 1.4
dayLow: 134.02
dayHigh: 140.27
yearHigh: 152.89
yearLow: 47.32
marketCap: 3389660900000
priceAvg50: 139.9534
priceAvg200: 117.38416
exchange: NASDAQ
volume: 142263342
avgVolume: 224002616
open: 134.83
previousClose: 137.01
eps: 2.53
pe: 54.71
earningsAnnouncement: 2025-02-26 21:00:00+00:00
sharesOutstanding: 24490000000
timestamp: 1970-01-01 00:00:01.735591129

: 0
symbol: NVDA
name: NVIDIA Corporation
price: 137.01
changesPercentage: -2.0868
change: -2.92
dayLow: 134.71
dayHigh: 139.02
yearHigh: 152.89
yearLow: 47.32
marketCap: 3355374900000
priceAvg50: 139.9276
priceAvg200: 117.15355
exchange: NASDAQ
volume: 169431279
avgVolume: 224910079
open: 138.555
previousClose: 139.93
eps: 2.54
pe: 53.94
earningsAnnouncement: 2025-02-26 21:00:00+00:00
sharesOutstanding: 24490000000
timestamp: 1970-01-01 00:00:01.735333202, generate an extensive financial report and answer this query: Provide an extensive analysis for NVIDIA Corporation and generate a small financial report for it.
NVIDIA Corporation (NVDA) is a leading provider of visual computing technologies. The company designs and develops graphics processing units (GPUs) and other related hardware and software. It serves customers in the gaming, professional visualization, and automotive markets. NVIDIA's GPUs are used in gaming PCs, laptops, and mobile devices to enhance graphics performance. The company's products are also used in professional visualization, such as in medical and scientific simulations. Additionally, NVIDIA's GPUs are used