# **Build Short Financial Report using Economic Indicators from the API**
Using Financial Modelling Prep API, fetching the Topic Market Economic Indicators.

**Problem Statment:** Building Financial Report of a Company or Stock using Latest Stock Market or Economic Data without Traning or Fine Tuning the LLMs or ML Models.

**Project Methodology**
- This Project using the open source API to fetch the latest financial modelling data regarding Company Metrics and Market Economic Indicators.
- Using Python, that fetched data is pre-processed and saved in CSV File.
- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.
- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).
- Checking the Response.


In [1]:
try:
    from urllib.request import urlopen
except ImportError:
    from urllib2 import urlopen

import certifi
import json
import pandas as pd


def get_jsonparsed_data(url, api_key, exchange):
  if exchange == "NSE":
    url = f"https://financialmodelingprep.com/api/v3/search?query={ticker}&exchange=NSE&apikey={api_key}"
  else:
    url = f"https://financialmodelingprep.com/api/v3/quote/{ticker}?apikey={api_key}"
  response = urlopen(url, cafile=certifi.where())
  data = response.read().decode("utf-8")
  return json.loads(data)

api_key="C1HRSweTniWdBuLmTTse9w8KpkoiouM5"
ticker = "MSFT"
exchange = "US"
eco_ind = pd.DataFrame(get_jsonparsed_data(ticker, api_key,exchange))
eco_ind

  response = urlopen(url, cafile=certifi.where())


Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,MSFT,Microsoft Corporation,453.55,-0.2529,-1.15,450.645,456.335,468.35,309.45,3370924200500,...,NASDAQ,16177539,19001798,454.325,454.7,11.55,39.27,2024-07-23T10:59:00.000+0000,7432310000,1720814408


### Installing the Langchain Libraries

In [2]:
!pip install langchain langchain-community langchain-core transformers

Collecting langchain
  Downloading langchain-0.2.7-py3-none-any.whl.metadata (6.9 kB)
Collecting langchain-community
  Downloading langchain_community-0.2.7-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core
  Downloading langchain_core-0.2.17-py3-none-any.whl.metadata (6.0 kB)
Collecting transformers
  Downloading transformers-4.42.4-py3-none-any.whl.metadata (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.6/43.6 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading SQLAlchemy-2.0.31-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.85-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenac

In [3]:
def preprocess_economic_data(df):
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['earningsAnnouncement'] = pd.to_datetime(df['earningsAnnouncement'])
    return df

preprocessed_economic_df = preprocess_economic_data(eco_ind)
preprocessed_economic_df

Unnamed: 0,symbol,name,price,changesPercentage,change,dayLow,dayHigh,yearHigh,yearLow,marketCap,...,exchange,volume,avgVolume,open,previousClose,eps,pe,earningsAnnouncement,sharesOutstanding,timestamp
0,MSFT,Microsoft Corporation,453.55,-0.2529,-1.15,450.645,456.335,468.35,309.45,3370924200500,...,NASDAQ,16177539,19001798,454.325,454.7,11.55,39.27,2024-07-23 10:59:00+00:00,7432310000,1970-01-01 00:00:01.720814408


### Storing the Pre-Processed Data into CSV

In [4]:
preprocessed_economic_df.to_csv("eco_ind.csv")

### Installing the Hugging Face Embedding Library

In [5]:
%pip install --upgrade --quiet  langchain sentence_transformers

Note: you may need to restart the kernel to use updated packages.


In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings
hg_embeddings = HuggingFaceEmbeddings()

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [7]:
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader_eco = CSVLoader('eco_ind.csv')
documents_eco = loader_eco.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=5)

# Split your docs into texts
texts_eco = text_splitter.split_documents(documents_eco)

# Embeddings
embeddings = HuggingFaceEmbeddings()

### Building the Vector DB for RAG

In [8]:
from langchain.vectorstores import Chroma

persist_directory = 'docs/chroma_rag/'

In [10]:
pip install chromadb

Collecting chromadb
  Downloading chromadb-0.5.4-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.1-py3-none-any.whl.metadata (4.3 kB)
Collecting chroma-hnswlib==0.7.5 (from chromadb)
  Downloading chroma_hnswlib-0.7.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.5.0-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.18.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.3 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.25.0-py3-none-any.whl.metadata (1.4 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.25.0-py3-none-any.whl.metadata (2.2 kB)
Collecting opentelemetry-instrumentation-fastapi>=0.41b0 (from chromadb)
  Downloading ope

In [11]:
economic_langchain_chroma = Chroma.from_documents(
    documents=texts_eco,
    collection_name="economic_data",
    embedding=hg_embeddings,
    persist_directory=persist_directory
)

In [12]:
question = "Microsoft(MSFT)"
docs_eco = economic_langchain_chroma.similarity_search(question,k=3)

### Building RAG Chain using Vector DB and LLM

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import os
import warnings
warnings.filterwarnings('ignore')

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
)

retriever_eco = economic_langchain_chroma.as_retriever(search_kwargs={"k":2})
qs="Microsoft(MSFT) Financial Report"
template = """You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.
              Understand this Market Information {context} and Answer the Query for this Company {question}. i just need the data into Tabular Form as well."""

PROMPT = PromptTemplate(input_variables=["context","question"], template=template)
qa_with_sources = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff",chain_type_kwargs = {"prompt": PROMPT}, retriever=retriever_eco, return_source_documents=True)
llm_response = qa_with_sources({"query": qs})

In [18]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import os
import warnings
warnings.filterwarnings('ignore')

# Print the API token to verify it is set correctly
print("Hugging Face API Token:", os.getenv("HUGGINGFACEHUB_API_TOKEN"))

llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
)

retriever_eco = economic_langchain_chroma.as_retriever(search_kwargs={"k": 2})
qs = "Microsoft(MSFT) Financial Report"
template = """You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.
              Understand this Market Information {context} and Answer the Query for this Company {question}. I just need the data in Tabular Form as well."""

PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)
qa_with_sources = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    chain_type_kwargs={"prompt": PROMPT},
    retriever=retriever_eco,
    return_source_documents=True
)

try:
    llm_response = qa_with_sources({"query": qs})
    display(Markdown(f"### Response:\n{llm_response}"))
except Exception as e:
    print("Error:", e)


Hugging Face API Token: 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


### Response:
{'query': 'Microsoft(MSFT) Financial Report', 'result': 'You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.\n              Understand this Market Information : 0\nsymbol: MSFT\nname: Microsoft Corporation\n\nearningsAnnouncement: 2024-07-23 10:59:00+00:00 and Answer the Query for this Company Microsoft(MSFT) Financial Report. I just need the data in Tabular Form as well.\n<p>The following is the financial report for Microsoft Corporation (MSFT) for the year 2024. The report includes the following sections:</p>\n\n<ul>\n<li>Income Statement</li>\n<li>Balance Sheet</li>\n<li>Cash Flow Statement</li>\n<li>Income Statement</li>\n<li>Balance Sheet</li>\n<li>Cash Flow Statement</li>\n</ul>\n\n<', 'source_documents': [Document(metadata={'row': 0, 'source': 'eco_ind.csv'}, page_content=': 0\nsymbol: MSFT\nname: Microsoft Corporation'), Document(metadata={'row': 0, 'source': 'eco_ind.csv'}, page_content='earningsAnnouncement: 2024-07-23 10:59:00+00:00')]}

In [16]:
print(os.getenv("HUGGINGFACEHUB_API_TOKEN"))


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


In [19]:
Markdown(llm_response['result'])

You are a Financial Market Expert and Get the Market Economic Data and Market News about Company and Build the Financial Report for me.
              Understand this Market Information : 0
symbol: MSFT
name: Microsoft Corporation

earningsAnnouncement: 2024-07-23 10:59:00+00:00 and Answer the Query for this Company Microsoft(MSFT) Financial Report. I just need the data in Tabular Form as well.
<p>The following is the financial report for Microsoft Corporation (MSFT) for the year 2024. The report includes the following sections:</p>

<ul>
<li>Income Statement</li>
<li>Balance Sheet</li>
<li>Cash Flow Statement</li>
<li>Income Statement</li>
<li>Balance Sheet</li>
<li>Cash Flow Statement</li>
</ul>

<

# **Using NEWS API to Build Financial News Summarizer about the Company Sentiment in Current Time**

 ### Fetchning the Latest Data using the NEWSAPI with the help of API Key from there website.

 **Problem Statment:** Building a GenAI based system that can analyse the market news about the whole stock exchange or a company and tell me about the sentiment of market along with analysis based on news.

**Project Methodology**
- This Project using the open source API to fetch the latest financial news regarding Company and Market.
- Using Python, that fetched data is pre-processed and saved in CSV File.
- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.
- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).
- Checking the Response.



In [21]:
pip install newsapi

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting newsapi
  Downloading newsapi-0.1.1-py2.py3-none-any.whl.metadata (255 bytes)
Downloading newsapi-0.1.1-py2.py3-none-any.whl (4.1 kB)
Installing collected packages: newsapi
Successfully installed newsapi-0.1.1
Note: you may need to restart the kernel to use updated packages.


In [23]:
pip install newsapi-python


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting newsapi-python
  Downloading newsapi_python-0.2.7-py2.py3-none-any.whl.metadata (1.2 kB)
Downloading newsapi_python-0.2.7-py2.py3-none-any.whl (7.9 kB)
Installing collected packages: newsapi-python
Successfully installed newsapi-python-0.2.7
Note: you may need to restart the kernel to use updated packages.


In [1]:
import requests
import pandas as pd
from newsapi import NewsApiClient
from datetime import datetime, timedelta

def fetch_news(query, from_date, to_date, language='en', sort_by='relevancy', page_size=30, api_key='YOUR_API_KEY'):
    # Initialize the NewsAPI client
    newsapi = NewsApiClient(api_key=api_key)
    query = query.replace(' ','&')
    # Fetch all articles matching the query
    all_articles = newsapi.get_everything(
        q=query,
        from_param=from_date,
        to=to_date,
        language=language,
        sort_by=sort_by,
        page_size=page_size
    )

    # Extract articles
    articles = all_articles.get('articles', [])

    # Convert to DataFrame
    if articles:
        df = pd.DataFrame(articles)
        return df
    else:
        return pd.DataFrame()  # Return an empty DataFrame if no articles are found

# Get the current time
current_time = datetime.now()
# Get the time 10 days ago
time_10_days_ago = current_time - timedelta(days=10)
api_key = 'c0e23a8956cf4b54af382abd932f88ff'
q = "Microsoft News June 2024"
df = fetch_news(q, time_10_days_ago, current_time, api_key=api_key)

df_news = df.drop("source", axis=1)

def preprocess_news_data(df):
    # Convert publishedAt to datetime
    df['publishedAt'] = pd.to_datetime(df['publishedAt'])
    df = df[~df['author'].isna()]
    df = df[['author', 'title']]
    return df

preprocessed_news_df = preprocess_news_data(df_news)
preprocessed_news_df.head()

Unnamed: 0,author,title
0,Mat Smith,The Morning After: Samsung’s Galaxy Z Flip 6 a...
1,Richard Speed,Windows 11 is closing the gap on Windows 10
2,Paris Marx,Generative AI is a climate disaster
3,Jeff Butts,Microsoft has fixed the nasty update bug causi...
4,Sead Fadilpašić,Another reason to upgrade — experts warn Inter...


### Pre-Processing the Data

In [2]:
def build_prompt(news_df):
    prompt = "You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:\n\n"

    for index, row in news_df.iterrows():
        title = row['title']
        prompt += f"   **News:** {title}\n\n"

    prompt += "Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company."

    return prompt

# Build the prompt
prompt = build_prompt(preprocessed_news_df)
print(prompt)

You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:

   **News:** The Morning After: Samsung’s Galaxy Z Flip 6 and Fold 6 leak early

   **News:** Windows 11 is closing the gap on Windows 10

   **News:** Generative AI is a climate disaster

   **News:** Microsoft has fixed the nasty update bug causing Windows 11 boot loops

   **News:** Another reason to upgrade — experts warn Internet Explorer is being used to lure in Microsoft users for data theft

   **News:** Microsoft Windows Deadline—You Have 21 Days To Update Your PC

   **News:** July 2024 Patch Tuesday forecast: The end of an AV giant in the US

   **News:** Adafruit Weekly Editorial Round-Up: Top Blog Posts of the Month, IoT Monthly, CircuitPython 9.1.0 Beta 4, IoT Filament Sensor & More!

   **News:** New nEw NEWS From Adafruit Round-Up: April, May & June, 2024

   **News:** Nvidia promises up to 700% return on investment

### LLM from Hugging Face Open Source

In [4]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import HuggingFaceHub
from IPython.display import display, Markdown
import os
import warnings
warnings.filterwarnings('ignore')

In [6]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [7]:
llm = HuggingFaceHub(
    repo_id="tiiuae/falcon-7b-instruct",
    model_kwargs={"temperature": 0.1},
)

In [8]:
Markdown(llm.invoke(prompt))

You are a financial analyst tasked with providing insights into recent news articles related to the financial industry. Here are some recent news articles:

   **News:** The Morning After: Samsung’s Galaxy Z Flip 6 and Fold 6 leak early

   **News:** Windows 11 is closing the gap on Windows 10

   **News:** Generative AI is a climate disaster

   **News:** Microsoft has fixed the nasty update bug causing Windows 11 boot loops

   **News:** Another reason to upgrade — experts warn Internet Explorer is being used to lure in Microsoft users for data theft

   **News:** Microsoft Windows Deadline—You Have 21 Days To Update Your PC

   **News:** July 2024 Patch Tuesday forecast: The end of an AV giant in the US

   **News:** Adafruit Weekly Editorial Round-Up: Top Blog Posts of the Month, IoT Monthly, CircuitPython 9.1.0 Beta 4, IoT Filament Sensor & More!

   **News:** New nEw NEWS From Adafruit Round-Up: April, May & June, 2024

   **News:** Nvidia promises up to 700% return on investment on GPU doing AI inference work as world's most valuable company continues journey towards $4 trillion market cap

   **News:** Microsoft Conducts Fresh Round of Layoffs: Multiple Departments & Locations Affected

   **News:** Media Briefing: How the digital publishing industry has fared so far in 2024

   **News:** Report: AI PCs Won’t Help the PC Industry This Year

   **News:** As Valuations Around AI Skyrocket, How Should CIOs Look At Solutions?

   **News:** Modernizing .NETpad: .NET 9, Arm64, and More (Premium)

   **News:** Forbes Daily: Weak Jobs Report Stokes Stock Market Rally

   **News:** Why Alibaba Stock Is Gaining On Friday

   **News:** 3 Ways Apple Intelligence And Embedded AI Will Transform Daily Life

   **News:** Why Is Owens Corning (OC) Stock Soaring Today

   **News:** Cybersecurity Snapshot: Malicious Versions of Cobalt Strike Taken Down, While Microsoft Notifies More Orgs About Midnight Blizzard Email Breach

   **News:** Should Leaders Have Strict Off-Hours?

   **News:** Why Is Pool (POOL) Stock Soaring Today

   **News:** Why Is Lennar (LEN) Stock Rocketing Higher Today

   **News:** Windows 11 is finally about to dethrone Windows 10 as the most popular OS - for gamers, anyway

   **News:** S&P 500 Tops 5,600 for First Time as Tech Rallies: Markets Wrap

   **News:** Forbes Daily: As Tech Stocks Rally, Apple Achieves A New Record

   **News:** TCS Profit Meets Estimates as IT Project Demand Picks Up

   **News:** Intel And AMD Are Going For A Bigger Role In The AI Era, But At A Gradual Pace

   **News:** Price hike shows Xbox is determined to make Game Pass work | Opinion

   **News:** Energy-guzzling AI has knocked Google and Microsoft off their net-zero paths

Please analyze these articles and provide insights into any potential impacts on the financial industry Sentiment on the provided company.
The recent news articles related to the financial industry provide insights into the current state of the industry. The news articles suggest that the industry is still facing challenges related to the pandemic, with the Windows 11 update causing boot loop issues. However, the industry is also seeing growth, with companies like Samsung releasing new devices and Microsoft announcing a new update. The news articles also highlight the importance of AI and machine learning in the industry, with companies like Nvidia and Microsoft investing in these technologies. Overall, the

# **Financial Data Investment Advisor**

**Problem Statment:** Building a Financial Advisor based on the Data that gathered from various financial advices in dataset from Stocks to mutual funds to gold or silver bonds as well using Python, Langchain and LLM (open source).

**Project Methodology**
- This Project using the Open Source Data from Kaggle regarding financial advices.
- Using Python, that load data and then pre-processed and saved in CSV File.
- Loading that same CSV file to insert into Vector DB using Embedding Model from Hugging Face.
- Building RAG QA Chain using Langchain and building the RAG architecture using Falcon 7B LLM (Open Source).
- Checking the Response.



## **Loading the Financial Data from Kaggle or Any Open Source Platform**

Data Source - https://www.kaggle.com/datasets/nitindatta/finance-data

In [10]:
data = pd.read_csv("Finance_data.csv")
data_fin = data.to_dict(orient='records')

In [11]:
for entry in data_fin:
  prompt = f"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?"
  print(prompt)

I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?
I'm a 23-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next More than 5 years. What are my options?
I'm a 30-year-old Male looking to invest in Equity for Wealth Creation over the next 3-5 years. What are my options?
I'm a 22-year-old Male looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?
I'm a 24-year-old Female looking to invest in Equity for Wealth Creation over the next Less than 1 year. What are my options?
I'm a 24-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?
I'm a 27-year-old Female looking to invest in Equity for Wealth Creation over the next 3-5 years. What are my options?
I'm a 21-year-old Male looking to invest in Mutual Fund for Wealth Creation over the next 3-5 years. What are my options?
I'm a 35-yea

### Pre-Processng the Data into Prompt-Response Format

In [12]:
# Convert the data to prompt-response format
prompt_response_data = []
for entry in data_fin:
    prompt = f"I'm a {entry['age']}-year-old {entry['gender']} looking to invest in {entry['Avenue']} for {entry['Purpose']} over the next {entry['Duration']}. What are my options?"
    response = (
        f"Based on your preferences, here are your investment options:\n"
        f"- Mutual Funds: {entry['Mutual_Funds']}\n"
        f"- Equity Market: {entry['Equity_Market']}\n"
        f"- Debentures: {entry['Debentures']}\n"
        f"- Government Bonds: {entry['Government_Bonds']}\n"
        f"- Fixed Deposits: {entry['Fixed_Deposits']}\n"
        f"- PPF: {entry['PPF']}\n"
        f"- Gold: {entry['Gold']}\n"
        f"Factors considered: {entry['Factor']}\n"
        f"Objective: {entry['Objective']}\n"
        f"Expected returns: {entry['Expect']}\n"
        f"Investment monitoring: {entry['Invest_Monitor']}\n"
        f"Reasons for choices:\n"
        f"- Equity: {entry['Reason_Equity']}\n"
        f"- Mutual Funds: {entry['Reason_Mutual']}\n"
        f"- Bonds: {entry['Reason_Bonds']}\n"
        f"- Fixed Deposits: {entry['Reason_FD']}\n"
        f"Source of information: {entry['Source']}\n"
    )
    prompt_response_data.append({"prompt": prompt, "response": response})

prompt_response_data[:5]

[{'prompt': "I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next 1-3 years. What are my options?",
  'response': 'Based on your preferences, here are your investment options:\n- Mutual Funds: 1\n- Equity Market: 2\n- Debentures: 5\n- Government Bonds: 3\n- Fixed Deposits: 7\n- PPF: 6\n- Gold: 4\nFactors considered: Returns\nObjective: Capital Appreciation\nExpected returns: 20%-30%\nInvestment monitoring: Monthly\nReasons for choices:\n- Equity: Capital Appreciation\n- Mutual Funds: Better Returns\n- Bonds: Safe Investment\n- Fixed Deposits: Fixed Returns\nSource of information: Newspapers and Magazines\n'},
 {'prompt': "I'm a 23-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next More than 5 years. What are my options?",
  'response': 'Based on your preferences, here are your investment options:\n- Mutual Funds: 4\n- Equity Market: 3\n- Debentures: 2\n- Government Bonds: 1\n- Fixed Deposits: 5\n- PPF: 6\n- Gold: 7\

### Storing Data into Vector DB

In [13]:
from langchain.docstore.document import Document
documents = []
for entry in prompt_response_data:
    combined_text = f"Prompt: {entry['prompt']}\nResponse: {entry['response']}"
    documents.append(Document(page_content=combined_text))

In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

In [16]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
texts = text_splitter.split_documents(documents)

In [18]:
from langchain_community.embeddings import HuggingFaceEmbeddings
hg_embeddings = HuggingFaceEmbeddings()

In [19]:
from langchain.vectorstores import Chroma
persist_directory = 'docs/chroma/'
vectordb_fin = Chroma.from_documents(
    documents=texts,
    embedding=hg_embeddings,
    persist_directory=persist_directory
)

### Building RAG System using VectorDB and LLM

In [20]:
from langchain.chains import RetrievalQA
retriever_fin = vectordb_fin.as_retriever(search_kwargs={"k":5})
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever_fin, return_source_documents=False)
query = "I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?"
result = qa({"query": query})
result

{'query': "I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?",
 'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nPrompt: I'm a 34-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 32-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 28-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 24-year-old Female looking to invest in Mutual Fund for Wealth Creation over the next\n\nPrompt: I'm a 29-year-old Male looking to invest in Mutual Fund for Wealth Creation over the next\n\nQuestion: I'm a 34-year-old female looking to invest in mutual funds for wealth creation over the next 1-3 years. What are my options?\nHelpful Answer:\n\nAs a 34-year-old