In [1]:
!pip install sentence-transformers langchain chromadb pypdf faiss-cpu  langchain_community scikit-learn matplotlib seaborn numpy


Collecting chromadb
  Downloading chromadb-1.3.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting pypdf
  Downloading pypdf-6.4.1-py3-none-any.whl.metadata (7.1 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.13.1-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.3.0-py3-none-any.whl.metadata (5.6 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.3-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Coll

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS,Chroma
import os


In [3]:
from google.colab import files
uploaded = files.upload()

pdf_path = list(uploaded.keys())[0]
print(f"Uploaded file: {pdf_path}")


Saving Artificial Intelligence.pdf to Artificial Intelligence.pdf
Uploaded file: Artificial Intelligence.pdf


In [4]:
loader = PyPDFLoader(pdf_path)
documents = loader.load()

print("Total pages loaded:", len(documents))


Total pages loaded: 3


In [5]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

chunks = splitter.split_documents(documents)
print("Total chunks created:", len(chunks))


Total chunks created: 7


In [6]:
print(chunks[0].page_content[:400])


1  What is Artificial Intelligence? 
Artificial Intelligence (AI) is the simulation of human intelligence in machines. 
It enables computers to perform tasks that typically require human reasoning, learning, problem-
solving, and decision-making. 
The goal of AI is not just automation, but to create systems that can adapt and improve over time. 
 
 2 - Key Components of AI 
AI is built using multi


In [7]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


  embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

### **FAISS**

In [8]:
faiss_db = FAISS.from_documents(chunks, embedding_model)
print("FAISS Vector DB created!")


FAISS Vector DB created!


In [9]:
query = "What is the RAG"
results = faiss_db.similarity_search(query, k=3)

for r in results:
    print(r.page_content[:200])
    print("---")


6 - What is RAG (Retrieval-Augmented Generation)? 
RAG improves LLM accuracy by retrieving relevant information from an external knowledge base. 
Process: 
1. User asks a question 
2. System retrieves
---
• Gemini 
• Claude 
• Mistral 
LLMs require massive compute and specialized data training. 
 
5 -  Why LLMs Hallucinate 
LLMs do not know facts — they predict the next token based on patterns. 
This c
---
7 - Future of AI 
AI will: 
• Become more explainable 
• Improve reasoning 
• Work with multimodal input (text, images, audio, video) 
• Operate with autonomy (AI Agents) 
It will shift from assistant
---


### **Chroma Vector Store**

In [19]:
chroma_dir = "./chroma_db"
if not os.path.exists(chroma_dir):
    os.makedirs(chroma_dir)

chroma_db = Chroma.from_documents(
    documents=chunks,
    embedding=embedding_model,
    persist_directory=chroma_dir
)

print("Chroma DB created and persisted!")


Chroma DB created and persisted!


In [20]:
query = "Who can apply for this course?"
results = chroma_db.similarity_search(query, k=3)

for r in results:
    print(r.page_content[:200])
    print("---")


Industry Example Use-Cases 
Retail Recommendation systems, demand forecasting 
Autonomous Vehicles Self-driving cars 
The adoption of AI brings efficiency, accuracy, and new customer experiences. 
 
4
---
• Gemini 
• Claude 
• Mistral 
LLMs require massive compute and specialized data training. 
 
5 -  Why LLMs Hallucinate 
LLMs do not know facts — they predict the next token based on patterns. 
This c
---
7 - Future of AI 
AI will: 
• Become more explainable 
• Improve reasoning 
• Work with multimodal input (text, images, audio, video) 
• Operate with autonomy (AI Agents) 
It will shift from assistant
---
