In [1]:
import os
import sys
from dotenv import load_dotenv
load_dotenv(override=True)
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

from helper_functions import (EmbeddingProvider,
                              retrieve_context_per_question,
                              replace_t_with_space,
                              get_langchain_embedding_provider,
                              show_context)

from evaluation.evaluation_rag import evaluate_rag

from langchain_community.vectorstores import FAISS


In [2]:
open_api_key = os.getenv('OPENAI_API_KEY')

if open_api_key:
    print(f"OpenAI API key is exists and begins with {open_api_key[:8]}")
    
else:
    print(f"OpenAI API Key is not set")

OpenAI API key is exists and begins with sk-proj-


## Read documents

In [3]:
os.makedirs('data', exist_ok=True)
!wget -O data/Understanding_Climate_Change.pdf https://raw.githubusercontent.com/NirDiamant/RAG_TECHNIQUES/main/data/Understanding_Climate_Change.pdf
!wget -O data/Understanding_Climate_Change.pdf https://raw.githubusercontent.com/NirDiamant/RAG_TECHNIQUES/main/data/Understanding_Climate_Change.pdf

--2026-02-11 21:52:22--  https://raw.githubusercontent.com/NirDiamant/RAG_TECHNIQUES/main/data/Understanding_Climate_Change.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 206372 (202K) [application/octet-stream]
Saving to: ‘data/Understanding_Climate_Change.pdf’


2026-02-11 21:52:23 (2.31 MB/s) - ‘data/Understanding_Climate_Change.pdf’ saved [206372/206372]

--2026-02-11 21:52:23--  https://raw.githubusercontent.com/NirDiamant/RAG_TECHNIQUES/main/data/Understanding_Climate_Change.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
L

In [4]:
def encode_pdf(path, chunk_size=1000, chunk_overlap=200):
    # Load documents from data folder
    loader = PyPDFLoader(path)
    documents = loader.load()
    
    # Split documents ỉnto chunks
    text_spiltter = RecursiveCharacterTextSplitter(
		chunk_size=chunk_size, chunk_overlap=chunk_overlap, length_function=len
	)
    texts = text_spiltter.split_documents(documents)
    
    # Light preprocessing texts
    cleaned_texts = replace_t_with_space(texts)
    
    # Create embeddings
    embeddings = get_langchain_embedding_provider(EmbeddingProvider.OPENAI)
    
    # Create vectorstore
    vectorstore = FAISS.from_documents(cleaned_texts, embedding=embeddings)
    
    return vectorstore
    

In [5]:
! pip install faiss-cpu



In [6]:
path = "data/Understanding_Climate_Change.pdf"

chunks_vector_store = encode_pdf(path)

## Create retriever for RAG

In [7]:
chunks_query_retriever = chunks_vector_store.as_retriever(search_kwargs={"k": 2})

### Test retriever

In [8]:
test_query = "What is the main cause of climate change?"
context = retrieve_context_per_question(test_query, chunks_query_retriever)
show_context(context)

Context 1:
Chapter 2: Causes of Climate Change 
Greenhouse Gases 
The primary cause of recent climate change is the increase in greenhouse gases in the 
atmosphere. Greenhouse gases, such as carbon dioxide (CO2), methane (CH4), and nitrous 
oxide (N2O), trap heat from the sun, creating a "greenhouse effect." This effect is essential 
for life on Earth, as it keeps the planet warm enough to support life. However, human 
activities have intensified this natural process, leading to a warmer climate. 
Fossil Fuels 
Burning fossil fuels for energy releases large amounts of CO2. This includes coal, oil, and 
natural gas used for electricity, heating, and transportation. The industrial revolution marked 
the beginning of a significant increase in fossil fuel consumption, which continues to rise 
today. 
Coal


Context 2:
Most of these climate changes are attributed to very small variations in Earth's orbit that 
change the amount of solar energy our planet receives. During the Holocene epoch,

In [11]:
def retrieve_context_per_question(question, chunks_query_retriever):
    docs = chunks_query_retriever.invoke(question)
    context = [doc.page_content for doc in docs]
    return context

In [12]:
test_query = "Why fossil fuels can cause air pollution?"
context = retrieve_context_per_question(test_query, chunks_query_retriever)
show_context(context)

Context 1:
Coal is the most carbon-intensive fossil fuel, and its use for electricity generation is a major 
source of CO2 emissions. Despite a decline in some regions, coal remains a significant 
energy source globally. It is mined extensively in countries like China, India, and the United 
States, contributing significantly to their energy supplies and CO2 footprints. 
Oil 
Oil is used primarily for transportation fuels, such as gasoline and diesel. The combustion of 
oil products releases significant amounts of CO2 and other pollutants, contributing to climate 
change and air quality issues. The global oil industry is vast, involving extraction, refining, 
and distribution, with significant geopolitical and economic implications. 
Natural Gas 
Natural gas is the least carbon-intensive fossil fuel and is often seen as a "bridge fuel" to a 
lower-carbon future. However, its extraction and use still contribute to greenhouse gas


Context 2:
Natural Gas 
Natural gas is the least carbon-