### Data Ingestion

In [1]:
# Import libraries, packages and modules
from langchain_astradb import AstraDBVectorStore
from langchain_openai import OpenAIEmbeddings
from dotenv import load_dotenv
import os
import pandas as pd

In [2]:
# Load environment variables from a .env file
load_dotenv()

# Retrieve the value of the 'OPENAI_API_KEY' environment variable
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Set the 'OPENAI_API_KEY' environment variable in the current environment
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [3]:
# Create an instance of the OpenAIEmbeddings class
embedding = OpenAIEmbeddings()

In [4]:
# Set the API endpoint for the Astra DB instance
ASTRA_DB_API_ENDPOINT="https://96501696-5be8-4e87-b2b4-721b1a4f90c9-us-east-2.apps.astra.datastax.com"

# Set the application token for authentication with Astra DB
ASTRA_DB_APPLICATION_TOKEN="AstraCS:DCxZRwaeGjocJiwpzWgaTcYw:da1d611a2dccf96941839bba45bf2f7f795603689038be6450a1bbb0fb162708"

# Specify the keyspace to be used in Astra DB
ASTRA_DB_KEYSPACE= "default_keyspace"

# Specify the name of the collection within the keyspace
collection_name= "financebot"

In [5]:
# Create an instance of AstraDBVectorStore with the specified parameters
vector_store = AstraDBVectorStore(
    embedding=embedding,                  # Use the embedding instance created earlier
    collection_name=collection_name,      # Specify the collection name
    api_endpoint=ASTRA_DB_API_ENDPOINT,   # Provide the API endpoint for Astra DB
    token=ASTRA_DB_APPLICATION_TOKEN,     # Use the application token for authentication
    namespace=ASTRA_DB_KEYSPACE           # Specify the keyspace to be used
)

In [6]:
# Import the PyPDFLoader class from the langchain_community.document_loaders module
from langchain_community.document_loaders import PyPDFLoader

# Initialize the PDF loader with the path to the PDF file
loader = PyPDFLoader("c:\\iNeuron\\LLMTradingBot\\data\\finance_data.pdf")

# Load the pages of the PDF document
pages = loader.load()

# Get the number of pages loaded
num_pages = len(pages)

# Print the number of pages
print(num_pages)

108


In [7]:
# Slice the list of pages to get pages 10 through 19
pages = pages[10:20]

# Access the content of the first page in the sliced list
first_page_content = pages[0].page_content

# Print the content of the first page
print(first_page_content)

Table of Contents 
9 understand root causes. Our full-reticle CV test chips use a sh ortened process flow to provide a faster 
learning cycle for speci fic process modules. 
 Our Scribe CV test chips are inserted directly on customers’ pr oduct wafers to collect data about critical 
layers. 
 Our DirectProbe™ CV test chips are designed to enable ultra-fas t yield learning for new product designs 
by allowing our customers to measure components of actual produ ct layout and identify yield issues. 
• pdFasTest ® Electrical Tester – Our proprietary electrical test hardware is optimized to quickl y test our CV test 
chips, enabling fast defect and p arametric characterization of manufacturing processes. As part of the system 
offering, we provide test progr ams for each CV test chip that a re tuned to the customer’s process. This automated 
system provides parallel functional testing, thus minimizing th e time required to perform millions of electrical 
measurements to test our CV test c

In [8]:
# Iterate over the sliced list of pages with their index
for i, doc in enumerate(pages):
    # Print the index and the document object
    print(i, doc)

0 page_content='Table of Contents \n9 understand root causes. Our full-reticle CV test chips use a sh ortened process flow to provide a faster \nlearning cycle for speci fic process modules. \n\uf0a7 Our Scribe CV test chips are inserted directly on customers’ pr oduct wafers to collect data about critical \nlayers. \n\uf0a7 Our DirectProbe™ CV test chips are designed to enable ultra-fas t yield learning for new product designs \nby allowing our customers to measure components of actual produ ct layout and identify yield issues. \n• pdFasTest ® Electrical Tester – Our proprietary electrical test hardware is optimized to quickl y test our CV test \nchips, enabling fast defect and p arametric characterization of manufacturing processes. As part of the system \noffering, we provide test progr ams for each CV test chip that a re tuned to the customer’s process. This automated \nsystem provides parallel functional testing, thus minimizing th e time required to perform millions of electrical

In [9]:
# Initialize an empty string to store the concatenated text
raw_text = ''

# Iterate over the sliced list of pages with their index
for i, doc in enumerate(pages):
    # Get the content of the current document
    text = doc.page_content
    # If the content is not empty, concatenate it to raw_text
    if text:
        raw_text += text

# Print the concatenated text
print(raw_text)

Table of Contents 
9 understand root causes. Our full-reticle CV test chips use a sh ortened process flow to provide a faster 
learning cycle for speci fic process modules. 
 Our Scribe CV test chips are inserted directly on customers’ pr oduct wafers to collect data about critical 
layers. 
 Our DirectProbe™ CV test chips are designed to enable ultra-fas t yield learning for new product designs 
by allowing our customers to measure components of actual produ ct layout and identify yield issues. 
• pdFasTest ® Electrical Tester – Our proprietary electrical test hardware is optimized to quickl y test our CV test 
chips, enabling fast defect and p arametric characterization of manufacturing processes. As part of the system 
offering, we provide test progr ams for each CV test chip that a re tuned to the customer’s process. This automated 
system provides parallel functional testing, thus minimizing th e time required to perform millions of electrical 
measurements to test our CV test c

In [10]:
# Import the RecursiveCharacterTextSplitter class from the langchain.text_splitter module
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with specified chunk size and overlap
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,       # Set the maximum size of each chunk to 500 characters
    chunk_overlap=100     # Set the overlap between chunks to 100 characters
)

# Split the concatenated text into chunks
texts = text_splitter.split_text(raw_text)

# Print the number of chunks created
print(len(texts))

# Print a separator line for readability
print("===" * 20)

# Print the content of the first chunk
print(texts[0])

91
Table of Contents 
9 understand root causes. Our full-reticle CV test chips use a sh ortened process flow to provide a faster 
learning cycle for speci fic process modules. 
 Our Scribe CV test chips are inserted directly on customers’ pr oduct wafers to collect data about critical 
layers. 
 Our DirectProbe™ CV test chips are designed to enable ultra-fas t yield learning for new product designs


In [11]:
# Import the Document class from the langchain_core.documents module
from langchain_core.documents import Document

# Initialize an empty list to store the Document objects
docs = []

# Iterate over the range of the length of texts (number of chunks)
for i in range(len(texts)):
    # Create a Document object with page content from texts[i]
    doc = Document(page_content=texts[i])
    # Append the Document object to the docs list
    docs.append(doc)

# Print the list of Document objects
print(docs)

[Document(page_content='Table of Contents \n9 understand root causes. Our full-reticle CV test chips use a sh ortened process flow to provide a faster \nlearning cycle for speci fic process modules. \n\uf0a7 Our Scribe CV test chips are inserted directly on customers’ pr oduct wafers to collect data about critical \nlayers. \n\uf0a7 Our DirectProbe™ CV test chips are designed to enable ultra-fas t yield learning for new product designs'), Document(page_content='by allowing our customers to measure components of actual produ ct layout and identify yield issues. \n• pdFasTest ® Electrical Tester – Our proprietary electrical test hardware is optimized to quickl y test our CV test \nchips, enabling fast defect and p arametric characterization of manufacturing processes. As part of the system \noffering, we provide test progr ams for each CV test chip that a re tuned to the customer’s process. This automated'), Document(page_content='system provides parallel functional testing, thus minimiz

In [12]:
# Add the documents to the vector_store
vector_store.add_documents(docs)

['ad325809118d484db3bbd48fedd68a6a',
 '5225a986aeef44c5bb1d7cfa7d55922d',
 'db8024521cd64632b5a8c35937d2f5aa',
 '3dc80c5e57a046a1845863686c88a214',
 'ef9b5db98ced41709c9cca453f17758d',
 'cf07d5cc139a4ee28d25c4cf860f682a',
 '57546293c4254b849e63db0aee0125d0',
 '894cb6f3bae0404697f09bf30a71f229',
 '495b3284175d444ab2b020440cefab68',
 '373acffc90634e33a0d8a32c2f25d5e7',
 'e63ecac19ad74621a7539e6a01dfd6b4',
 '8befe35accaa41aa9ddc1c5428a45082',
 '9557a8a768204dd097249965aa5ce092',
 '7a3edda57eaf4b079a6eb7adf1caf6ae',
 'a68dd09352674a2ba53a5278a8817ce7',
 'f1dfdeb2894140ef93aeee682a0c6fae',
 '94b3515e31ed43feabc0393376825b8d',
 '5f3e2d8650f84a509c17b79f4cdc952d',
 '4a8a90cb5ce64567a798bee28ae196b5',
 'bc2d0f517ae6440991277300b2ca34c2',
 '98ae5419109c427486de73587e0f7886',
 'bb18af2cb12f43a791ab21fe831bf596',
 '889002bd616445b3b2d76f465a37e868',
 '4b2460bcd51440c4b2bc70f90166104e',
 '3519b77bb8e240c78c44dd69efc68caf',
 '990e8b4fbe63493aacd95240e8c9f9fe',
 'cd9472ef0cb647b8aa540bef7505ae73',
 

### Retriever

In [13]:
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

In [14]:
retriever.get_relevant_documents("What is Market For Registrant's Common Equity")

  warn_deprecated(


[Document(page_content='market acceptance and recoup our costs, if at all. \n• Our success depends upon our ability to effectively plan and ma nage our resources and restructure our business \nthrough rapidly fluctuating economic and market conditions, whi ch actions may have an adverse effect on our \nfinancial and operating results. \n• Our business may be negatively im pacted by social, political, g eopolitical, economic instability, unrest, war,'),
 Document(page_content='market acceptance and recoup our costs, if at all. \n• Our success depends upon our ability to effectively plan and ma nage our resources and restructure our business \nthrough rapidly fluctuating economic and market conditions, whi ch actions may have an adverse effect on our \nfinancial and operating results. \n• Our business may be negatively im pacted by social, political, g eopolitical, economic instability, unrest, war,'),
 Document(page_content='“Risk Factors”. \nSee our “Notes to Consolidated F inancial St

In [16]:
PRODUCT_BOT_TEMPLATE = """
    Your finance bot is an expert in finance related advice.
    Ensure your answers are relevant to the query context and refrain from straying off-topic.
    Your responses should be concise and informative.

    CONTEXT:
    {context}

    QUESTION: {question}

    YOUR ANSWER:
    
    """

In [17]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(PRODUCT_BOT_TEMPLATE)

In [18]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

In [19]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [20]:
chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
    )

In [21]:
chain.invoke("what is Market For Registrant’s Common Equity?")

"The Market for Registrant's Common Equity refers to the demand and supply of the company's common stock in the financial markets. It reflects how investors perceive the company's value and potential for growth, impacting its stock price and market capitalization."

Runnable to passthrough inputs unchanged or with additional keys.

This runnable behaves almost like the identity function, except that it
can be configured to add additional keys to the output, if the input is a
dict.

The examples below demonstrate this Runnable works using a few simple
chains. The chains rely on simple lambdas to make the examples easy to execute
and experiment with.