
# Financial Advisor Chatbot using LangChain, FAISS, Groq & Yahoo Finance

This assignment involves building a chatbot that offers financial insights using Yahoo Finance data and Retrieval-Augmented Generation (RAG) with LangChain, FAISS, and Groq-hosted LLM.



## Objective

Build a chatbot that:
- Retrieves financial documents (stock performance)
- Uses RAG with a vector database and LLM
- Handles follow-up queries using memory



## Step 1: Environment Setup

Install necessary libraries.


In [1]:
!pip install langchain langchain-community faiss-cpu yfinance

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Collecting dotenv
  Downloading dotenv-0.9.9-py2.py3-none-any.whl.metadata (279 bytes)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting python-dotenv (from dotenv)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata 


- Create a `.env` file with your Groq API key.
- Import required libraries.


In [None]:
# TODO: Import necessary libraries
import os
from dotenv import load_dotenv
import yfinance as yf


## Step 2: Download Stock Data

Use yfinance to fetch historical data of companies and save in text format.


In [4]:
# TODO: Download and save data
import pandas as pd

tickers_df = pd.read_csv("nasdaq_100_tickers.csv")
tickers = tickers_df['Ticker'].tolist()

print(f"Total Tickers: {len(tickers)}")
print("Sample:", tickers[:10])


Total Tickers: 93
Sample: ['AAPL', 'MSFT', 'AMZN', 'GOOGL', 'GOOG', 'META', 'NVDA', 'TSLA', 'PEP', 'COST']



## Step 3: Load Documents

Load the text files into LangChain using TextLoader.


In [5]:
# TODO: Load text files from 'financial_data/' folder
import os

os.makedirs("financial_data", exist_ok=True)

import yfinance as yf
import pandas as pd
from time import sleep

for ticker in tickers:
    try:
        data = yf.download(ticker, period="1y", interval="1d", progress=False)
        if not data.empty:
            text_data = data.to_string()
            with open(f"financial_data/{ticker}.txt", "w") as f:
                f.write(f"Ticker: {ticker}\n\n{text_data}")
            print(f"Saved: {ticker}")
        else:
            print(f"No data for {ticker}")
    except Exception as e:
        print(f"Error with {ticker}: {e}")

    sleep(1)  # to avoid rate limiting


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AAPL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MSFT


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AMZN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: GOOGL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: GOOG


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: META


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: NVDA


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: TSLA


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: PEP


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: COST


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AVGO


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ADBE


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CSCO


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: NFLX


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: INTC


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AMD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: TXN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: QCOM


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AMAT


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ADI


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MU


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: LRCX


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: KLAC


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: NXPI


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: INTU


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ISRG


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ZM


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CRWD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: PANW


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: SNPS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CDNS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MCHP


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MRVL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ADSK


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: FTNT


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ANSS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: PAYX


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CTSH


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: DXCM


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: BIIB


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: VRSK


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: IDXX


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: WDC


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ON


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: VRTX


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: REGN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: GILD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ILMN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: EXC


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: BKR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CEG


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CHTR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CMCSA


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: TMUS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)
ERROR:yfinance:
1 Failed download:
ERROR:yfinance:['ATVI']: YFPricesMissingError('possibly delisted; no price data found  (period=1y) (Yahoo error = "No data found, symbol may be delisted")')


No data for ATVI


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ROST


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: SIRI


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: NTES


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: JD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: PDD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MELI


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CTAS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ORLY


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: FAST


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: BKNG


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: PCAR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MAR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ULTA


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ODFL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: KHC


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: EBAY


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: WBA


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: XEL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: LULU


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: DLTR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: KR


  data = yf.download(ticker, period="1y", interval="1d", progress=False)
ERROR:yfinance:
1 Failed download:
ERROR:yfinance:['SPLK']: YFPricesMissingError('possibly delisted; no price data found  (period=1y) (Yahoo error = "No data found, symbol may be delisted")')


No data for SPLK


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: DOCU


  data = yf.download(ticker, period="1y", interval="1d", progress=False)
ERROR:yfinance:
1 Failed download:
ERROR:yfinance:['SGEN']: YFPricesMissingError('possibly delisted; no price data found  (period=1y) (Yahoo error = "No data found, symbol may be delisted")')


No data for SGEN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ALGN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: VRSN


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CSGP


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: TEAM


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AEP


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: AFL


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CPRT


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: CDW


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: MTCH


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: FANG


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: WBD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: ZS


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: TTD


  data = yf.download(ticker, period="1y", interval="1d", progress=False)


Saved: VERI



## Step 4: Chunk and Embed Data

Split documents and generate embeddings using HuggingFace and store in FAISS.


In [6]:
# TODO: Chunk documents and embed with HuggingFaceEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
import glob

all_documents = []

# Get all text files in the folder
file_paths = glob.glob("financial_data/*.txt")

for file_path in file_paths:
    loader = TextLoader(file_path)
    docs = loader.load()
    all_documents.extend(docs)

print("Sample document content:\n")
print(all_documents[0].page_content[:1000])


Sample document content:

Ticker: ISRG

Price            Close        High         Low        Open   Volume
Ticker            ISRG        ISRG        ISRG        ISRG     ISRG
Date                                                               
2024-07-18  416.140015  427.329987  413.820007  425.000000  3856000
2024-07-19  455.010010  456.809998  439.000000  449.429993  4201300
2024-07-22  461.119995  468.779999  459.179993  462.329987  2574400
2024-07-23  455.059998  462.609985  454.359985  459.100006  1753100
2024-07-24  454.019989  460.589996  452.250000  455.250000  2296500
2024-07-25  436.739990  457.369995  436.200012  453.220001  1819200
2024-07-26  441.299988  446.160004  432.869995  438.600006  1196500
2024-07-29  443.660004  448.269989  439.410004  446.600006  1120400
2024-07-30  432.690002  448.579987  431.559998  443.660004  1547700
2024-07-31  444.609985  447.309998  432.410004  438.500000  1763200
2024-08-01  450.940002  452.720001  443.299988  443.299988  2017900
2024-08-


## Step 5: Create Retriever and QA Chain

Set up retriever and integrate with Groq LLM using LangChain's RetrievalQA.


In [7]:
# TODO: Initialize retriever, LLM, and RetrievalQA chain
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=750,
    chunk_overlap=100,
)

chunked_documents = text_splitter.split_documents(all_documents)
print(f"Total chunks created: {len(chunked_documents)}")


Total chunks created: 2504


In [8]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [9]:
from langchain_community.vectorstores import FAISS

faiss_vectorstore = FAISS.from_documents(chunked_documents, embedding_model)

faiss_vectorstore.save_local("vector_store/financial_faiss")

In [11]:
faiss_vectorstore = FAISS.load_local("vector_store/financial_faiss", embedding_model, allow_dangerous_deserialization=True)

In [12]:
retriever = faiss_vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [16]:
from langchain_groq import ChatGroq
import os
from google.colab import userdata

groq_llm = ChatGroq(
    temperature=0,
    model_name="llama3-8b-8192",
    api_key=userdata.get("GROQ_API_KEY")
)

In [17]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=groq_llm,
    retriever=retriever,
    return_source_documents=True
)


## Step 6: Build Chat Interface

Create a chat loop to take user input and return answers from the QA system.


In [19]:
# TODO: Implement chat loop
query = "What was the performance of Apple in 2024?"
response = qa_chain.invoke({"query": query})

print("Answer:\n", response["result"])
print("\nSources:\n", response["source_documents"])

Answer:
 Based on the provided data, here is the performance of Apple (AAPL) in 2024:

* The highest closing price was $229.367314 on July 18, 2024.
* The lowest closing price was $216.477615 on July 25, 2024.
* The highest trading price was $229.208055 on July 18, 2024.
* The lowest trading price was $213.620965 on July 25, 2024.
* The average daily trading volume was approximately 51.4 million shares.

Overall, Apple's stock price fluctuated throughout the period, with a slight decline from the highest to the lowest closing price.

Sources:
 [Document(id='3c0f3017-2541-4369-82bc-cd21aecb53ec', metadata={'source': 'financial_data/AMD.txt'}, page_content='Price            Close        High         Low        Open     Volume\nTicker             AMD         AMD         AMD         AMD        AMD\nDate                                                                 \n2024-07-18  155.770004  163.410004  153.199997  163.410004   69420300\n2024-07-19  151.580002  155.809998  150.619995  154.


## Optional Enhancements

- Add memory for follow-ups
- Visualize stock trends with matplotlib
- Experiment with top-k retrieval
- Display document sources


In [None]:
# TODO: Implement optional enhancements


## Project Complete

The chatbot is now capable of answering financial queries based on real-time data.
