![alt text](<../images/just enough.png>)
# Just Enough Python for AI/Data Science
## Module 6: Your First ML Model and GenAI Solution.
>Testing if 'Jut Enough Python for AI' is enough or not.

### Day 14 - Bulding GenAI RAG based Question Answer System (Chat with your PDF)
----

##### Overview:

Welcome to Day 14. 

Machine Learning (ML) may seem advanced, but you've already learned enough Python basics (Pandas, Numpy, and data visualization) to build your first ML models! 

In fact, you have learned enough to build your first GenAI RAG chatbot as well. 

You will first learn:

- Prepare your data for ML by splitting it into training and testing sets and scaling features.
- Build and evaluate a **Linear Regression** model (for predicting continuous values).
- Tackle a **Logistic Regression** classification task (for predicting categories like survival on the Titanic).


#### 1: Install Dependencies


In [None]:
# !pip install -U langchain langchain-openai openai


Collecting pip
  Downloading pip-25.1.1-py3-none-any.whl.metadata (3.6 kB)


ERROR: Could not find a version that satisfies the requirement install (from versions: none)
ERROR: No matching distribution found for install


#### 2. Import Libraries

In [None]:
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from IPython.display import Markdown, display

#### 3. Set Your OpenAI API Key

In [None]:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

#### 4. Load and Split PDF Text

In [22]:
loader = PyPDFLoader("../data/The_alchemist_by_Paulo_Coelho.pdf")
docs = loader.load()

# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = text_splitter.split_documents(docs)

#### 5. Embed documents and create a FAISS vector store

In [23]:
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

#### 6. Build the RetrievalQA Chain

In [24]:
retriever = vectorstore.as_retriever()

#### 7. Create a QA chain using ChatOpenAI

In [25]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Optional: Custom prompt template (optional for more control)
prompt_template = PromptTemplate.from_template(
    "Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}"
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt_template},
    input_key="question"
)

#### 8. Ask Question About Your PDF

In [29]:
question = "What is the use of this book?"
response = qa_chain.invoke({"question": question}) 


#### 9. Show the answer

In [30]:
display(Markdown(response["result"]))

The book serves as a means to convey important concepts and lessons about personal legends and the journey of self-discovery. It highlights the idea that many books share similar themes regarding people's struggles to choose their own paths in life. The old man suggests that while the book may be irritating and repetitive, it is still significant because it helps readers understand the deeper truths encapsulated in a few lines, such as those found in the Emerald Tablet. Ultimately, the book is a tool for learning and reflection, guiding individuals toward understanding their own personal legends and the nature of their journeys.

# HAPPY LEARNING