In [10]:
!pip install langchain openai faiss-cpu python-dotenv langchain-community tiktoken

Collecting tiktoken
  Using cached tiktoken-0.7.0-cp311-cp311-win_amd64.whl.metadata (6.8 kB)
Collecting regex>=2022.1.18 (from tiktoken)
  Using cached regex-2024.7.24-cp311-cp311-win_amd64.whl.metadata (41 kB)
Using cached tiktoken-0.7.0-cp311-cp311-win_amd64.whl (799 kB)
Using cached regex-2024.7.24-cp311-cp311-win_amd64.whl (269 kB)
Installing collected packages: regex, tiktoken
Successfully installed regex-2024.7.24 tiktoken-0.7.0


In [18]:
from dotenv import load_dotenv 
import os 

load_dotenv() 
openai_api_key = os.getenv("OPENAI_API_KEY")

## 1. Document preparation

In [19]:
from langchain.document_loaders import DirectoryLoader, TextLoader

# Load text files from a directory
loader = DirectoryLoader('./data', glob="**/*.txt", loader_cls=TextLoader)
documents = loader.load()

# Process the documents
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

## 2. Indexing Documents

In [20]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Initialize the embeddings
embeddings = OpenAIEmbeddings()

# Create the vector store
vectorstore = FAISS.from_documents(texts, embeddings)

# Save the vector store
vectorstore.save_local("faiss_index")

## 3. Setting Up the Retrieval Mechanism

In [21]:
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

# Load the saved index
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)

# Set up the retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})

## 4. Creating the Generative Model

In [22]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# Initialize the language model
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

# Create the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

## 5. Using the RAG System

In [23]:
# Example query
query = "How do we know the climate is changing?"
result = qa_chain({"query": query})

print(result['result'])
print("\nSources:")
for doc in result['source_documents']:
    print(doc.metadata['source'])

We know the climate is changing because we observe various effects that indicate significant changes in the environment. These include:

1. **Temperature Records**: Historical temperature data show a clear warming trend during the 20th century.
2. **Melting Ice**: Ice sheets and glaciers are shrinking, and Arctic sea ice is disappearing.
3. **Rising Sea Levels**: Sea levels are rising due to the melting ice and thermal expansion of seawater.
4. **Changes in Seasonal Patterns**: Snow melts sooner in the spring, and plants are flowering earlier.
5. **Animal Behavior**: Animals are moving to higher elevations and latitudes to find cooler conditions.
6. **Extreme Weather Events**: There has been an increase in the frequency and intensity of droughts, floods, and wildfires.

These observations align with predictions made by climate models, confirming that climate change is occurring.

Sources:
data\1 How do we know climate change is happening.txt
data\3 Do we really only have 150 years of c

## 6. Fine-Tuning and Optimization

### Change to GPT-4, increase temperature and change chain type

In [36]:
# Example of using a different chain type
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model_name="gpt-4", temperature=0.2),
    chain_type="map_reduce",
    retriever=retriever,
    return_source_documents=True
)

In [37]:
query = "How do we know the climate is changing?"
result = qa_chain({"query": query})

print(result['result'])
print("\nSources:")
for doc in result['source_documents']:
    print(doc.metadata['source'])

We know that climate change is happening because we see the effects everywhere. Ice sheets and glaciers are shrinking while sea levels are rising. Arctic sea ice is disappearing. In the spring, snow melts sooner and plants flower earlier. Animals are moving to higher elevations and latitudes to find cooler conditions. Droughts, floods and wildfires have all gotten more extreme. The ocean, which has absorbed 90 percent of the heat trapped by greenhouse gases, is warming up. Historical records stretching back to the 1880s show a clear warming trend during the 20th century.

Sources:
data\1 How do we know climate change is happening.txt
data\3 Do we really only have 150 years of climate data How is that enough to tell us about centuries of change.txt
data\4 How do we know climate change is caused by humans.txt


### Using Custom Prompts

In [34]:
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end. 
I want 5 reasons in bullet points to answer clearly the question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Answer:"""

PROMPT = PromptTemplate(
    template=template, input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

In [35]:
query = "How do we know the climate is changing?"
result = qa_chain({"query": query})

print(result['result'])
print("\nSources:")
for doc in result['source_documents']:
    print(doc.metadata['source'])

- **Ocean Heat Absorption**: The ocean has absorbed 90% of the heat trapped by greenhouse gases, indicating significant warming that is not fully reflected in surface temperatures. Measurements show that every layer of the ocean is warming.

- **Observable Effects**: There are clear, observable effects of climate change, such as shrinking ice sheets and glaciers, rising sea levels, and the disappearance of Arctic sea ice. These changes are consistent with predictions made by climate models.

- **Shifts in Natural Patterns**: Changes in seasonal patterns, such as earlier snowmelt and earlier flowering of plants, as well as shifts in animal migration to cooler areas, demonstrate the impact of climate change on ecosystems.

- **Increased Weather Extremes**: The frequency and intensity of extreme weather events, including droughts, floods, and wildfires, have increased, providing further evidence of a changing climate.

- **Historical Temperature Records**: Instrumental temperature data co