# Generative AI in Finance: Introduction to LLMs and RAG

In this notebook, we'll practically demonstrate how Generative AI techniques, specifically Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG), can be used in finance. We’ll build a simple financial Q&A system using LangChain, OpenAI, and Pinecone to illustrate how finance teams can leverage AI to gain insights efficiently.



## 2. Setup 

In [4]:
# Install the libraries
!pip install openai pinecone-client chromadb langchain langchain-community python-dotenv pandas


Collecting chromadb
  Downloading chromadb-1.0.13-cp39-abi3-macosx_11_0_arm64.whl.metadata (7.0 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.1-cp311-cp311-macosx_11_0_arm64.whl.metadata (8.4 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.35.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-6.0.0-py3-none-any.whl.metadata (6.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.0-cp311-cp311-macosx_13_0_universal2.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.34.1-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.34.1-py3-none-any.whl.metadata (2.4 kB)

## 3. Imports & API Key Setup 

In [None]:
import os
import pandas as pd
from dotenv import load_dotenv
import openai
from langchain.text_splitter import RecursiveCharacterTextSplitter


from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

# Load .env file for keys (recommended)
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.getenv("PINECONE_ENVIRONMENT")  


## 4. Load and Chunk Financial Data 

In [15]:
# Load your financial document (replace with your path)
data = pd.read_csv('dataset/financial_policy.csv')
documents = data['content'].astype(str).tolist()

# Split into manageable chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.create_documents(documents)
chunks = [doc.page_content for doc in docs]
print(f"Loaded {len(chunks)} chunks.")


Loaded 25 chunks.


## 5. Embed and Store with Chroma


In [7]:
embeddings = OpenAIEmbeddings(openai_api_key=openai.api_key)
vectorstore = Chroma.from_texts(chunks, embedding=embeddings)


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


## 6. RAG: Build Q&A Chain (Chroma + LangChain)

In [8]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
llm = OpenAI(openai_api_key=openai.api_key, temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)


  llm = OpenAI(openai_api_key=openai.api_key, temperature=0)


## 7. Example Q&A

In [9]:
questions = [
    "What is our current policy on cryptocurrency investments?",
    "Summarize the latest changes in compliance regulations."
]

for question in questions:
    answer = qa_chain.run(question)
    print(f"\nQ: {question}\nA: {answer}\n")


  answer = qa_chain.run(question)
Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given



Q: What is our current policy on cryptocurrency investments?
A:  Our current policy prohibits direct investment in cryptocurrencies except through approved ETFs and regulated vehicles. All investments must be reviewed quarterly by the risk team.


Q: Summarize the latest changes in compliance regulations.
A:  The latest changes in compliance regulations include the revision of KYC regulations to include digital onboarding and biometric verification, a requirement for financial advisors to maintain 40 hours of continuing education annually, and quarterly audits of financial processes by an external firm.



# _[Optional]_

## i.  Embed and Store with Pinecone v3

In [4]:
import pinecone
print(pinecone.__version__)


7.3.0


## ii. Create Pinecone Index (v3 Serverless, OpenAI Embeddings)

In [None]:
import os
from pinecone import (
    Pinecone,
    ServerlessSpec,
    CloudProvider,
    AwsRegion,
    Metric,
    DeletionProtection,
    VectorType
)

# --- Set up Pinecone v3 client ---
pc = Pinecone(api_key=PINECONE_API_KEY)  
index_name = "financial-llm-demo"
cloud = "aws"
region = "us-east-1"

# Create index if not exists (idempotent, will not overwrite)
if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=1536,  # For OpenAI ada-002
        metric=Metric.COSINE,
        spec=ServerlessSpec(
            cloud=CloudProvider.AWS,
            region=AwsRegion.US_EAST_1   
        ),
        deletion_protection=DeletionProtection.DISABLED,
        vector_type=VectorType.DENSE,
        tags={
            "model": "text-embedding-ada-002",
            "app": "finance-rag-demo"
        }
    )
index = pc.Index(index_name)



In [13]:
print("Available Pinecone indexes:", pc.list_indexes().names())


Available Pinecone indexes: ['financial-llm-demo']


## iii. Embed and Upsert Chunks

In [None]:
import openai

def get_embeddings(texts):
    response = openai.embeddings.create(
        input=texts,
        model="text-embedding-ada-002"
    )
    return [record.embedding for record in response.data]


batch_size = 50
vectors = []
for i in range(0, len(chunks), batch_size):
    batch = chunks[i:i+batch_size]
    embeds = get_embeddings(batch)
    for j, emb in enumerate(embeds):
        vectors.append({
            "id": f"doc-{i+j}",
            "values": emb,
            "metadata": {"chunk_text": batch[j]}
        })
index.upsert(vectors)
print(f"Uploaded {len(vectors)} embeddings to Pinecone index '{index_name}'.")


Uploaded 25 embeddings to Pinecone index 'financial-llm-demo'.


## iv. RAG Retrieval: Query & Get Contexts

In [18]:
query = "What is our current policy on cryptocurrency investments?"
embed_query = get_embeddings([query])[0]
results = index.query(
    vector=embed_query,
    top_k=3,
    include_metadata=True
)
contexts = [match['metadata']['chunk_text'] for match in results['matches']]
print(contexts)


['Our company prohibits direct investment in cryptocurrencies except through approved ETFs and regulated vehicles. All investments must be reviewed quarterly by the risk team.', 'Investment in foreign securities is limited to 20% of the total portfolio, with additional scrutiny on geopolitical risks and currency fluctuations.', 'In May 2024, the board approved a policy allowing up to 5% of portfolio assets in alternative investments, including real estate and private equity funds.']


## v. RAG Prompt for OpenAI Chat

In [19]:
prompt = (
    "Answer the following question based on the context below:\n"
    + "\n---\n".join(contexts)
    + f"\n\nQuestion: {query}\nAnswer:"
)
response = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt}],
    max_tokens=400
)
print(response.choices[0].message.content)

Our current policy prohibits direct investment in cryptocurrencies but allows investment through approved ETFs and regulated vehicles. All cryptocurrency investments must be reviewed quarterly by the risk team.


## Conclusion

This notebook demonstrates how Generative AI can transform financial knowledge management. By combining state-of-the-art LLMs with Retrieval-Augmented Generation (RAG) and a vector database—either open source (Chroma) or cloud-scale (Pinecone)—you can instantly search and synthesize insights from your own finance documents, policies, and compliance records.

Key takeaways:

- No more “hallucinations”: RAG grounds every answer in your real, up-to-date data.

- Flexible workflows: Start with Chroma for local prototyping, then scale up with Pinecone v3 when you need cloud speed and reliability.

- Extensible platform: Swap in new LLMs or expand to new document types with minimal changes.
