# Basic RAG (Retrieval Augmented Generation)

In [2]:
# ! pip install faiss-cpu "mistralai>=0.1.2"

### Load API key

In [1]:
from helper import load_mistral_api_key
api_key, dlai_endpoint = load_mistral_api_key(ret_key=True)

### Get data

- You can go to https://www.deeplearning.ai/the-batch/
- Search for any article and copy its URL.

### Parse the article with BeautifulSoup 

In [2]:
import requests
from bs4 import BeautifulSoup
import re

response = requests.get(
    "https://www.deeplearning.ai/the-batch/a-roadmap-explores-how-ai-can-detect-and-mitigate-greenhouse-gases/"
)
html_doc = response.text
soup = BeautifulSoup(html_doc, "html.parser")
tag = soup.find("div", re.compile("^prose--styled"))
text = tag.text
print(text)

How can AI help to fight climate change? A new report evaluates progress so far and explores options for the future.What’s new: The Innovation for Cool Earth Forum, a conference of climate researchers hosted by Japan, published a roadmap for the use of data science, computer vision, and AI-driven simulation to reduce greenhouse gas emissions. The roadmap evaluates existing approaches and suggests ways to scale them up.How it works: The roadmap identifies 6 “high-potential opportunities”: activities in which AI systems can make a significant difference based on the size of the opportunity, real-world results, and validated research. The authors emphasize the need for data, technical and scientific talent, computing power, funding, and leadership to take advantage of these opportunities.Monitoring emissions. AI systems analyze data from satellites, drones, and ground sensors to measure greenhouse gas emissions. The European Union uses them to measure methane emissions, environmental orga

In [11]:
import os
from pathlib import Path
data_dir =Path(os.getcwd()).parent /"data"

### Optionally, save the text into a text file
- You can upload the text file into a chat interface in the next lesson.
- To download this file to your own machine, click on the "Jupyter" logo to view the file directory.  

In [12]:
file_name = data_dir/"AI_greenhouse_gas.txt"
with open(file_name, 'w') as file:
    file.write(text)

### Chunking

In [4]:
chunk_size = 512
chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]

In [5]:
len(chunks)

8

### Get embeddings of the chunks

In [15]:
from mistralai import Mistral


def get_text_embedding(txt):
    client = Mistral(api_key=api_key)
    embeddings_batch_response = client.embeddings.create(model="mistral-embed", inputs=txt)
    return embeddings_batch_response.data[0].embedding

In [16]:
import numpy as np
text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])

In [17]:
text_embeddings

array([[-0.03302002,  0.04760742,  0.04486084, ..., -0.03302002,
         0.02255249, -0.01442719],
       [-0.03573608,  0.05541992,  0.03289795, ..., -0.03170776,
         0.01611328, -0.01722717],
       [-0.04858398,  0.04800415,  0.05718994, ...,  0.0045166 ,
         0.01806641, -0.01244354],
       ...,
       [-0.02627563,  0.04052734,  0.03582764, ..., -0.00997925,
        -0.00976562, -0.00928497],
       [-0.02973938,  0.05450439,  0.06280518, ..., -0.0088501 ,
        -0.00785065, -0.00411606],
       [-0.02468872,  0.05126953,  0.04885864, ..., -0.00657654,
         0.02590942, -0.01369476]], shape=(8, 1024))

In [18]:
len(text_embeddings[0])

1024

### Store in a vector databsae
- In this classroom, you'll use [Faiss](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/)

In [19]:
import faiss

d = text_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(text_embeddings)

### Embed the user query

In [20]:
question = "What are the ways that AI can reduce emissions in Agriculture?"
question_embeddings = np.array([get_text_embedding(question)])

In [21]:
question_embeddings

array([[-0.00064421,  0.04141235,  0.04302979, ..., -0.02481079,
         0.01052856,  0.00907135]], shape=(1, 1024))

### Search for chunks that are similar to the query

In [22]:
D, I = index.search(question_embeddings, k=2)
print(I)

[[4 5]]


In [23]:
retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
print(retrieved_chunk)

['data to help factories use more recycled materials, cut waste, minimize energy use, and reduce downtime. Similarly, they can optimize supply chains to reduce emissions contributed by logistics.\xa0Agriculture.\xa0Farmers use AI-equipped sensors to simulate different crop rotations and weather events to forecast crop yield or loss. Armed with this data, food producers can cut waste and reduce carbon footprints. The authors cite lack of food-related datasets and investment in adapting farming practices as primary b', 'arriers to taking full advantage of AI in the food industry.Transportation.\xa0AI systems can reduce greenhouse-gas emissions by improving traffic flow, ameliorating congestion, and optimizing public transportation. Moreover, reinforcement learning can reduce the impact of electric vehicles on the power grid by optimizing their charging. More data, uniform standards, and AI talent are needed to realize this potential.Materials.\xa0Materials scientists use AI models to stu

In [24]:
prompt = f"""
Context information is below.
---------------------
{retrieved_chunk}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {question}
Answer:
"""

In [25]:
from mistralai import Mistral


def mistral(user_message, model="mistral-small-latest", is_json=False):
    client = Mistral(api_key=api_key)
    messages = [{"role": "user", "content": user_message}]

    if is_json:
        chat_response = client.chat.complete(
            model=model, messages=messages, response_format={"type": "json_object"}
        )
    else:
        chat_response = client.chat.complete(model=model, messages=messages)

    return chat_response.choices[0].message.content

In [26]:
response = mistral(prompt)
print(response)

Based on the provided context, AI can reduce emissions in agriculture through the following ways:

1. **AI-Equipped Sensors**: Farmers use AI-equipped sensors to simulate different crop rotations and weather events, which helps forecast crop yield or loss. This data allows food producers to cut waste and reduce carbon footprints.

2. **Optimizing Farming Practices**: AI helps farmers adapt their practices by providing insights from simulations and forecasts, leading to more efficient resource use and reduced emissions.

The context also highlights barriers to fully leveraging AI in agriculture, such as the lack of food-related datasets and investment in adapting farming practices.


## RAG + Function calling

In [27]:
def qa_with_context(text, question, chunk_size=512):
    # split document into chunks
    chunks = [text[i : i + chunk_size] for i in range(0, len(text), chunk_size)]
    # load into a vector database
    text_embeddings = np.array([get_text_embedding(chunk) for chunk in chunks])
    d = text_embeddings.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embeddings)
    # create embeddings for a question
    question_embeddings = np.array([get_text_embedding(question)])
    # retrieve similar chunks from the vector database
    D, I = index.search(question_embeddings, k=2)
    retrieved_chunk = [chunks[i] for i in I.tolist()[0]]
    # generate response based on the retrieve relevant text chunks

    prompt = f"""
    Context information is below.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {question}
    Answer:
    """
    response = mistral(prompt)
    return response

In [28]:
I.tolist()

[[4, 5]]

In [29]:
I.tolist()[0]

[4, 5]

In [30]:
import functools
names_to_functions = {"qa_with_context": functools.partial(qa_with_context, text=text)}

In [31]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "qa_with_context",
            "description": "Answer user question by retrieving relevant context",
            "parameters": {
                "type": "object",
                "properties": {
                    "question": {
                        "type": "string",
                        "description": "user question",
                    }
                },
                "required": ["question"],
            },
        },
    },
]

In [33]:
question = """
What are the ways AI can mitigate climate change in transportation?
"""

client = Mistral(api_key=api_key)

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": question}],
    tools=tools,
    tool_choice="any",
)

response

ChatCompletionResponse(id='98e6546c4865425ba28ce5725b787cc5', object='chat.completion', model='mistral-large-latest', usage=UsageInfo(prompt_tokens=85, completion_tokens=23, total_tokens=108, prompt_audio_seconds=Unset(), prompt_tokens_details={'cached_tokens': 0}), created=1772372938, choices=[ChatCompletionChoice(index=0, message=AssistantMessage(role='assistant', content='', tool_calls=[ToolCall(function=FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}'), id='7eUJpto3l', type=None, index=0)], prefix=False), finish_reason='tool_calls')])

In [34]:
tool_function = response.choices[0].message.tool_calls[0].function
tool_function

FunctionCall(name='qa_with_context', arguments='{"question": "What are the ways AI can mitigate climate change in transportation?"}')

In [35]:
tool_function.name

'qa_with_context'

In [36]:
import json

args = json.loads(tool_function.arguments)
args

{'question': 'What are the ways AI can mitigate climate change in transportation?'}

In [37]:
function_result = names_to_functions[tool_function.name](**args)
function_result

"Based on the provided context, AI can help mitigate climate change in transportation by:\n\n1. **Optimizing Logistics and Operations**: AI can improve efficiency in transportation by optimizing routes, reducing fuel consumption, and minimizing emissions through better logistics management.\n2. **Enhancing Climate Modeling**: AI-driven simulations can help model and predict the impact of transportation emissions on climate change, aiding in the development of mitigation strategies.\n3. **Supporting Geoengineering Research**: While not directly related to transportation, AI can assist in climate geoengineering efforts (like stratospheric aerosol injection) that may indirectly benefit climate mitigation by cooling the planet.\n\nThe context also mentions broader applications of AI in reducing greenhouse gas emissions, which could include transportation as a key sector. However, specific details on AI's role in transportation are not fully elaborated in the given text."

## More about RAG
To learn about more advanced chunking and retrieval methods, you can check out:
- [Advanced Retrieval for AI with Chroma](https://learn.deeplearning.ai/courses/advanced-retrieval-for-ai/lesson/1/introduction)
  - Sentence window retrieval
  - Auto-merge retrieval
- [Building and Evaluating Advanced RAG Applications](https://learn.deeplearning.ai/courses/building-evaluating-advanced-rag)
  - Query Expansion
  - Cross-encoder reranking
  - Training and utilizing Embedding Adapters
