In [1]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("company-info.pdf")
data = loader.load()  

In [2]:
data

[Document(metadata={'producer': 'ONLYOFFICE/8.2.2.22', 'creator': 'ONLYOFFICE/8.2.2.22', 'creationdate': '2025-03-01T05:52:19+00:00', 'moddate': '2025-03-01T05:52:19+00:00', 'source': 'company-info.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='1. **Company Information:**\nGreenPower Solutions is a leading player in the renewable energy industry, committed\nto offering clean and cost-effective solar solutions for homes and businesses. Founded\nin 2015, we have been instrumental in advocating the widespread adoption of solar\nenergy, improving the quality of life, and contributing to a sustainable future.\n**Our Mission:**\n"To deliver affordable, reliable, and efficient solar energy systems that contribute to a\ncleaner and more sustainable environment."\n**Our Values:**\n- Customer Focus\n- Innovation\n- Quality\n- Integrity\n- Sustainability\n2. **Product Knowledge:**\n- **Solar Panels:**\nOur solar panels feature high-efficiency monocrystalline cells, ensuring 

In [3]:
len(data)

4

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split data
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(data)


print("Total number of documents: ",len(docs))

Total number of documents:  8


In [5]:
docs[0]

Document(metadata={'producer': 'ONLYOFFICE/8.2.2.22', 'creator': 'ONLYOFFICE/8.2.2.22', 'creationdate': '2025-03-01T05:52:19+00:00', 'moddate': '2025-03-01T05:52:19+00:00', 'source': 'company-info.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='1. **Company Information:**\nGreenPower Solutions is a leading player in the renewable energy industry, committed\nto offering clean and cost-effective solar solutions for homes and businesses. Founded\nin 2015, we have been instrumental in advocating the widespread adoption of solar\nenergy, improving the quality of life, and contributing to a sustainable future.\n**Our Mission:**\n"To deliver affordable, reliable, and efficient solar energy systems that contribute to a\ncleaner and more sustainable environment."\n**Our Values:**\n- Customer Focus\n- Innovation\n- Quality\n- Integrity\n- Sustainability\n2. **Product Knowledge:**\n- **Solar Panels:**\nOur solar panels feature high-efficiency monocrystalline cells, ensuring m

# Get an API key: 

 https://ai.google.dev/gemini-api/docs/api-key to generate a Google AI API key. Paste in .env file

 Embedding models: https://python.langchain.com/v0.1/docs/integrations/text_embedding/

In [6]:
from langchain_chroma import Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings

from dotenv import load_dotenv
load_dotenv() 



embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector = embeddings.embed_query("hello, world!")
vector[:5]
#vector

[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734263241291,
 0.01813093200325966]

In [7]:
vectorstore = Chroma.from_documents(documents=docs, embedding=GoogleGenerativeAIEmbeddings(model="models/embedding-001"))

In [8]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10})

retrieved_docs = retriever.invoke("What is new in Development of Multiple Combined Regression Methods for Rainfall Measurement paper?")


Number of requested results 10 is greater than number of elements in index 8, updating n_results = 8


In [9]:
len(retrieved_docs)

8

In [10]:
retrieved_docs

[Document(metadata={'page': 0, 'source': 'my_paper.pdf'}, page_content='See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/357213035\nDevelopment of Multiple Combined Regression Methods for Rainfall\nMeasu rement Development of Multiple Combined Regression Methods for\nRainfall Measu rement\nArticle  · Dec ember 2021\nCITATIONS\n0READS\n711\n6 author s, including:\nNusr at Jahan Pr ottasha\nDaff odil Int ernational Univ ersity\n26 PUBLICA TIONS \xa0\xa0\xa0299 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nMd K owsher\nStevens Instit ute of T echnolog y\n73 PUBLICA TIONS \xa0\xa0\xa0561 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nRokeya Khat un Shorna\nJahangirnag ar Univ ersity\n6 PUBLICA TIONS \xa0\xa0\xa05 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nNiaz Mur shed\nJahangirnag ar Univ ersity\n3 PUBLICA TIONS \xa0\xa0\xa00 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Niaz Mur shed  on 21 Dec ember

In [10]:
print(retrieved_docs[5].page_content)

come with a robust build for a long lifespan. Power ratings range from 320 to 400 watts,
and our sales representatives guide customers to choose the right panel size based on
their energy needs, roof size, and budget.
- **Solar Inverters:**
We provide two types of inverters: string inverters and microinverters, each with
unique advantages. String inverters are suitable for uniform sunlight, while
microinverters excel in complex installations. Helping customers choose the right
inverter is part of our commitment to tailor solutions to their needs.
- **Solar Batteries:**
Our lithium-ion solar batteries are known for high efficiency, a long lifespan, and
deep discharge capabilities. Matching the battery capacity to the customer's energy
usage patterns is crucial, emphasizing the benefits of energy independence, greater
savings, and a reliable power supply.
- **Solar Monitoring System:**
Our digital solar monitoring system enables customers to track their system's


In [11]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0.3, max_tokens=500)

In [12]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [13]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [14]:
response = rag_chain.invoke({"input": "What is new in Development of Multiple Combined Regression Methods for Rainfall Measurement paper?"})
print(response["answer"])

Number of requested results 10 is greater than number of elements in index 8, updating n_results = 8


This question cannot be answered from the given context. The provided text discusses GreenPower Solutions, a company focused on providing solar energy solutions, and does not contain information about rainfall measurement research.
