#*MIT* RAG | Oscar Danilo Guzmán Villanueva
**'MIT'** is a **RAG (Retrieval-Augmented Generation)** system designed to enhance the capabilities of **LLMs (Large Language Models)** such as **'Llama 3.1'** or **'Phi 3.5**'. This project honors **MIT**, specially  the journal of "technology review" in its website, creating a specialized assistant to answer questions about its publications, and using new technologies to further reveal the news they talk about.

[**Python Notebook**](https://github.com/odguzmanv/gabo-ragIAyMinirobots/master/MIT_rag.ipynb) | [**Repository**](https://github.com/odguzmanv/IAyMinirobots)

- [1. Tools and Technologies](#1-tools-and-technologies)
- [2. How to run Ollama in Google Colab?](#2-how-to-run-ollama-in-google-colab)
    - [2.1 Ollama Installation](#21-ollama-installation)
    - [2.2 Run 'ollama serve'](#22-run-ollama-serve)
    - [2.3 Run 'ollama pull \<model\_name\>'](#23-run-ollama-pull-model_name)
- [3. Exploring LLMs](#3-exploring-llms)
- [4. Data Extraction and Preparation](#4-data-extraction-and-preparation)
    - [4.1 Web Scraping and Chunking](#41-web-scraping-and-chunking)
    - [4.2 Embedding Model: Nomic](#42-embedding-model-nomic)
- [5. Storing in the Vector Database](#5-storing-in-the-vector-database)
    - [5.1 Making Chroma Persistent](#51-making-chroma-persistent)
    - [5.2 Adding Documents to Chroma](#52-adding-documents-to-chroma)
- [6. Use a Vectorstore as a Retriever](#6-use-a-vectorstore-as-a-retriever)
- [7. RAG (Retrieval-Augmented Generation)](#7-rag-retrieval-augmented-generation)
- [8. References](#8-references)

## Author

- **Oscar Danilo Guzmán Villanueva** [GitHub](https://github.com/odguzmanv) | [X](https://x.com/odguzmanv)


## 1. Tools and Technologies

- [**Ollama**](https://ollama.com/): Running models ([Llama 3.1](https://ollama.com/library/llama3.1) or [Phi 3.5](https://ollama.com/library/phi3.5)) and embeddings ([Nomic](https://ollama.com/library/nomic-embed-text))
- [**LangChain**](https://python.langchain.com/docs/introduction/): Framework and web scraping tool
- [**Chroma**](https://docs.trychroma.com/): Vector database

> A special thanks to ['Paulina Cocina'](https://https://www.paulinacocina.net/sobre-paulina), from which the texts used in this project were extracted and where a comprehensive [Recipe Digital Library](https://https://www.paulinacocina.net/) is available.

## 2. How to run Ollama in Google Colab?

### 2.1 Ollama Installation
For this, we simply go to the [Ollama downloads page](https://ollama.com/download/linux) and select **Linux**. The command is as follows

In [1]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


### 2.2 Run 'ollama serve'
If you run ollama serve, you will encounter the issue where you cannot execute subsequent cells and your script will remain stuck in that cell indefinitely. To resolve this, you simply need to run the following command:

In [2]:
!nohup ollama serve > ollama_serve.log 2>&1 &

After running this command, it is advisable to wait a reasonable amount of time for it to execute before running the next command, so you can add something like:

In [3]:
import time
time.sleep(3)

### 2.3 Run 'ollama pull <model_name>'
For this project we will use [Phi-3.5-mini](https://ollama.com/library/phi3.5) the lightweight **Microsoft** model with high capabilities. This project is also extensible to [Llama 3.1](https://ollama.com/library/llama3.1), you would only have to pull that other model.

In [4]:
!ollama pull phi3.5

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest 
pulling b5374915da53...   0% ▕▏    0 B/2.2 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling b5374915da53...   0% ▕▏  30 KB/2.2 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling b5374915da53...   0% ▕▏  10 MB/2.2 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling b5374915da53...   1% ▕▏  16 MB/2.2 GB                  [?25h[?25l[2K[1G[A[2K[1G

## 3. Exploring LLMs
Now that we have our LLM, it's time to test them with what will be our control question.

In [33]:
test_message = "What is Shawn Shan building according to mit technology review?"

> 'MIT' will be designed to function in Spanish, as it is the course "IA y minirobots" principal language.

The information is found at her [recipe blog,](https://www.technologyreview.com/topic/artificial-intelligence/) so we expect it to be something that can be answered if it has the necessary information.

Before we can invoke the LLM, we need to install LangChain. [1]

In [6]:
!pip install -qU langchain_community

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m39.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.9/399.9 kB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.2/290.2 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Now we create the model.

In [7]:
from langchain_community.llms import Ollama

llm_phi = Ollama(model="phi3.5")

Invoke Phi 3.5

In [37]:
llm_phi.invoke(test_message)

'I\'m sorry, but I don\'t have access to external databases or specific news articles like the "Mit Technology Review." If you provide more context or details about what Shane might be referencing with regard to construction projects related to Mit (which could refer to a city named Mettet in Belgium), I may still help answer your question.\n\nWithout additional information, one way of finding this out would involve:\n1. Searching for recent articles on the topic by using search engines or directly navigating through websites like "MIT Technology Review."\n2. If you are referring to a specific project mentioned in an article that I can reference from my training data until September 2021, please share more details here and I would happily provide information based on what\'s within those parameters!'

> At this stage, the model is not expected to be able to answer the question correctly, and they might even hallucinate when trying to give an answer. To solve this problem, we will start building our **RAG** in the next section.

## 4. Data Extraction and Preparation
To collect the information that our **RAG** will use, we will perform **Web Scraping** of the recipe blog of [Anna Terés](hhttps://www.annarecetasfaciles.com/#) in the **anna recetas faciles web site**.

### 4.1 Web Scraping and Chunking
The first step is to install **Beautiful Soup** so that LangChain's **WebBaseLoader** works correctly.

In [9]:
!pip install -qU beautifulsoup4

The next step will be to save the list of sources we will extract from the website into a variable.

In [10]:
base_urls = ["https://www.technologyreview.com/topic/artificial-intelligence/",]

Now we will create a function to collect all the links that lead to the texts. If we look at the HTML structure, we will notice that the information we're looking for is inside an `<article>` element with the class `entry-content`. Then, we simply extract the `href` attributes from the `<li>` elements inside the `<a>` tags.

As the page has modern uses of javascript to actively charge content generating html instead of charging the content by static html we must use selenium to execute the code and being able to process it with python.

In [11]:
!apt-get update
!apt install -y chromium-chromedriver
!pip install selenium

Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:3 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Ign:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy Release [5,713 B]
Get:6 https://r2u.stat.illinois.edu/ubuntu jammy Release.gpg [793 B]
Hit:7 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:12 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:13 https://r2u.stat.illinois.edu/ubuntu jammy/main amd64 Packages [2,582 kB]
Get:14 https://r2u.stat.illinois.edu/ubunt

Creating a list of urls from the main page we will scrap.

In [13]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

# Configurar Selenium para usar Chrome en modo headless (sin interfaz gráfica)
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Ejecuta en segundo plano
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

# Inicializar el navegador
driver = webdriver.Chrome(options=options)

# Función para extraer los enlaces de la página principal usando Selenium
def extract_links_with_selenium(url):
    driver.get(url)
    time.sleep(3)  # Esperar a que la página cargue completamente

    # Extraer todos los enlaces que tienen los atributos "data-event-category" y "data-event-label"
    article_links = []
    links = driver.find_elements(By.CSS_SELECTOR, 'a[data-event-category="topic-feed"][data-event-label="topic-story"]')

    for link in links:
        href = link.get_attribute('href')
        if href:
            article_links.append(href)

    return article_links

# URL principal de la sección de IA de MIT Technology Review
main_url = "https://www.technologyreview.com/topic/artificial-intelligence/"

# Extraer los enlaces de los artículos
article_urls = extract_links_with_selenium(main_url)

# Cerrar el navegador
driver.quit()

if article_urls:
    print(f"Se encontraron {len(article_urls)} enlaces:")
    for url in article_urls:
        print(url)
else:
    print("No se encontraron enlaces.")

Se encontraron 20 enlaces:
https://www.technologyreview.com/2024/09/20/1104233/ai-models-let-robots-carry-out-tasks-in-unfamiliar-environments/
https://www.technologyreview.com/2024/09/20/1104233/ai-models-let-robots-carry-out-tasks-in-unfamiliar-environments/
https://www.technologyreview.com/2024/09/18/1104178/ai-generated-content-doesnt-seem-to-have-swayed-recent-european-elections/
https://www.technologyreview.com/2024/09/18/1104178/ai-generated-content-doesnt-seem-to-have-swayed-recent-european-elections/
https://www.technologyreview.com/2024/09/17/1104004/why-openais-new-model-is-such-a-big-deal/
https://www.technologyreview.com/2024/09/17/1104004/why-openais-new-model-is-such-a-big-deal/
https://www.technologyreview.com/2024/09/16/1103959/why-we-need-an-ai-safety-hotline/
https://www.technologyreview.com/2024/09/16/1103959/why-we-need-an-ai-safety-hotline/
https://www.technologyreview.com/2024/09/12/1103930/chatbots-can-persuade-people-to-stop-believing-in-conspiracy-theories/
ht

Let's see how many texts by the writer we can gather.

Now that we have the URLs of the texts to feed our **RAG**, we just need to perform web scraping directly from the content of the stories. For that, we will build a function that follows a logic very similar to the previous function, which will initially give us the **raw text**, along with the **reference information** about what we are obtaining (the information found in `<header>`).

In [14]:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Función de carga y procesamiento de los artículos con Selenium
def mit_technology_review_loader_with_selenium(url):
    try:
        # Cargar la página con Selenium
        driver.get(url)

        # Esperar hasta que la clase 'gutenbergContent__content' esté presente, con un tiempo máximo de espera de 10 segundos
        content_element = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.CLASS_NAME, "gutenbergContent__content"))
        )

        # Extraer el título desde el <header>
        title_element = driver.find_element(By.TAG_NAME, 'header')
        title = title_element.text

        # Eliminar el <header> para no duplicar información
        driver.execute_script("""
        var element = document.getElementsByTagName("header")[0];
        element.parentNode.removeChild(element);
        """)

        # Extraer el contenido del artículo
        raw_text = content_element.text

        # Dividir el texto en fragmentos para mejor procesamiento (basado en oraciones)
        texts = raw_text.split(". ")

        # Retornar los fragmentos con el título
        return [f"Fragmento {i+1}/{len(texts)} de '{title}': '{text}'" for i, text in enumerate(texts)]

    except Exception as e:
        print(f"Ocurrió un error al procesar la URL {url}: {e}")
        return None

Downloading all the articles from the list of urls.

In [15]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time

# Configurar Selenium para usar en Colab
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Ejecutar en modo headless (sin abrir el navegador)
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=options)

# Recorrer las URLs
for url in article_urls:
    driver.get(url)  # Navegar a la URL
    time.sleep(3)  # Esperar que la página cargue (ajustar según el tiempo de carga)

    # Buscar todos los párrafos (<p>) en la página
    try:
        paragraphs = driver.find_elements(By.TAG_NAME, 'p')  # Encuentra todos los <p>
        article_text = "\n".join([p.text for p in paragraphs])  # Une el texto de todos los párrafos

        if article_text.strip():  # Verifica si hay contenido en el artículo
            print(f"Texto extraído del artículo en {url}:")
            print(article_text)

            # Guardar el texto del artículo en un archivo
            with open(f'article_{article_urls.index(url)}.txt', 'w') as file:
                file.write(article_text)
        else:
            print(f"No se encontró contenido en los párrafos de {url}")

    except Exception as e:
        print(f"Error al extraer el artículo de {url}: {e}")

driver.quit()

Texto extraído del artículo en https://www.technologyreview.com/2024/09/20/1104233/ai-models-let-robots-carry-out-tasks-in-unfamiliar-environments/:
























We use cookies to give you a more personalized browsing experience and analyze site traffic.See our cookie policy





Texto extraído del artículo en https://www.technologyreview.com/2024/09/20/1104233/ai-models-let-robots-carry-out-tasks-in-unfamiliar-environments/:
“Robot utility models” sidestep the need to tweak the data used to train robots every time they try to do something in unfamiliar settings.
It’s tricky to get robots to do things in environments they’ve never seen before. Typically, researchers need to train them on new data for every new place they encounter, which can become very time-consuming and expensive.
Now researchers have developed a series of AI models that teach robots to complete basic tasks in new surroundings without further training or fine-tuning. The five AI models, called robot utilit

There are indeed many ways to perform chunking, several of which are discussed in **"5 Levels of Text Splitting"** [2]. The most interesting idea for me about how to split texts, and what I believe fits best in this project, is **Semantic Splitting**. So, following that idea, we will ensure that the function divides all the texts by their periods, thus generating **semantic fragments in Spanish**.

> Tests were performed on the **Semantic Similarity** [3] offered by **Langchain**, but the results were worse. In this case, there is no need to do something extremely sophisticated, when the simplest and practically obvious solution is the best.

Let's merge all the articles in one txt file.

In [16]:
import os

# Asumiendo que los archivos de artículos están guardados en el formato 'article_{index}.txt'
# Ruta donde están guardados los archivos de los artículos
path_to_articles = '.'  # Ajusta si es necesario

# Crear una lista para almacenar los textos de los artículos
all_articles = []

# Recorrer todos los archivos que coinciden con el patrón 'article_*.txt'
for filename in os.listdir(path_to_articles):
    if filename.startswith("article_") and filename.endswith(".txt"):
        # Leer el contenido de cada archivo
        with open(os.path.join(path_to_articles, filename), 'r') as file:
            article_content = file.read()
            all_articles.append(article_content)

# Unir todo el contenido de los artículos en una sola variable
combined_articles = "\n\n".join(all_articles)

# Guardar todos los artículos combinados en un nuevo archivo
with open('all_articles_combined.txt', 'w') as combined_file:
    combined_file.write(combined_articles)

# También puedes imprimirlo o mostrarlo en Colab a la izquierda si lo prefieres
print("combined articles")

combined articles


Now we are ready to load the merged txt file and chunk it.

In [22]:
# Leer el archivo 'all_articles_combined.txt'
with open('/content/all_articles_combined.txt', 'r', encoding='utf-8') as file:
    all_text = file.read()

In [23]:
# Función para dividir el texto en fragmentos (chunking)
def chunk_text(text, chunk_size=250):
    """Divide el texto en fragmentos más pequeños para mejor procesamiento."""
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

# Crear los fragmentos del texto
chunks = chunk_text(all_text)
print(f"Total de fragmentos: {len(chunks)}")

Total de fragmentos: 47


### 4.2 Embedding Model: Nomic
I ran several tests with different **embedding models**, including **LLama 3.1** and **Phi 3.5**, but it wasn't until I used `nomic-embed-text` that I saw significantly better results. So, this is the embedding model we'll use.

In [17]:
!pip install -qU langchain-ollama

Now let's pull with Ollama from [Nomic's embedding model](https://ollama.com/library/nomic-embed-text)

In [18]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕▏    0 B/274 MB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90...   0% ▕▏ 302 KB/274 MB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90...   2% ▕▏ 6.0 MB/274 MB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90...  10% ▕▏  27 MB/274 MB          

We're going to create our model so we can later use it in **Chroma**, our vector database.

In [19]:
from langchain_ollama import OllamaEmbeddings

nomic_ollama_embeddings = OllamaEmbeddings(model="nomic-embed-text")

## 5. Storing in the Vector Database
**Chroma** is our chosen vector database. With the help of our embedding model provided by **Nomic**, we will store all the fragments generated from the texts, so that later we can query them and make them part of our context for each query to the **LLMs**.

### 5.1 Making Chroma Persistent
Here we have to think **one step ahead in time**, so we assume that chroma is already persistent, which means that it **exists in a directory**. If we don't do this, what will happen every time we run this **Python Notebook**, is that we will add repeated strings over and over again to the vector database. So it is a good practice to **reset Chroma** and in case it does not exist, it will be created and **simply remain empty**. [4]

In [20]:
!pip install -qU chromadb langchain-chroma

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━[0m [32m61.4/67.3 kB[0m [31m1.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m599.2/599.2 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.8/273.8 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We will create a function that will be specifically in charge of resetting the collection.

In [21]:
from langchain_chroma import Chroma

# Función para reiniciar la colección en Chroma usando los embeddings de Ollama
def reset_collection(collection_name, persist_directory):
    # Eliminar la colección si existe
    db = Chroma(
        collection_name=collection_name,
        embedding_function=nomic_ollama_embeddings,  # Usar los embeddings de Ollama
        persist_directory=persist_directory
    )
    db.delete_collection()

    # Re-crear la colección para asegurarnos de que esté inicializada
    db = Chroma(
        collection_name=collection_name,
        embedding_function=nomic_ollama_embeddings,  # Usar los embeddings de Ollama
        persist_directory=persist_directory
    )
    return db

# Reiniciar la colección y crear una nueva
chroma_db = reset_collection("mit_rag", "chroma")

### 5.2 Adding Documents to Chroma
We may think that it is enough to just pass it all the text and it will store it completely, but that approach is inefficient and contradictory to the idea of RAG; that is why a whole section was dedicated to Chunking before.

Let's verify that all fragments were saved correctly in Chroma

In [24]:
# Añadir los fragmentos de texto a Chroma
try:
    chroma_db.add_texts(
        texts=chunks
    )
    print(f"Total de fragmentos añadidos: {len(chunks)}")
except Exception as e:
    print(f"Se produjo un error al añadir a Chroma: {e}")

Total de fragmentos añadidos: 47


In [25]:
vector_store = Chroma(collection_name="mit_rag", embedding_function=nomic_ollama_embeddings, persist_directory="chroma")

len(vector_store.get()["ids"])

47

> Here we are accessing the persistent data, not the in-memory data.

## 6. Use a Vectorstore as a Retriever
A retriever is an **interface** that specializes in retrieving information from an **unstructured query**. Let's test the work we did, we will use the same `test_message` as before and see if the retriever can return the **specific fragment** of the text that has the answer (the one quoted in section [3. Exploring LLMs](#3-exploring-llms)).

In [26]:
# Crear el retriever desde el vector store de Chroma
retriever = chroma_db.as_retriever(search_kwargs={"k": 1})

In [27]:
# Ejecutar la búsqueda
docs = retriever.get_relevant_documents(test_message)

# Procesar y mostrar los resultados
for doc in docs:
    try:
        title, article = doc.page_content.split("': '")
        print(f"\n{title}:\n{article}")
    except ValueError:
        print(f"Contenido del documento: {doc.page_content}")

  docs = retriever.get_relevant_documents(test_message)


Contenido del documento: al tweaking. Although they achieved a completion rate of 74.4%, the researchers were able to increase this to a 90% success rate when they took images from the iPhone and the robot’s head-mounted camera,  gave them to OpenAI’s recent GPT-4o LLM model


By default `Chroma.as_retriever()` will search for the most similar documents and `search_kwargs={”k“: 1}` indicates that we want to limit the output to **1**. [4]

> We can see that the document returned to us was the **exact excerpt** that gives the **appropriate context** of our query. So the built retriever is **working correctly.**

## 7. RAG (Retrieval-Augmented Generation)
To better integrate our context to the query, we will make use of a **template** that will help us set up the behavior of the **RAG** and give it indications on how to answer.

In [29]:
from langchain_core.prompts import PromptTemplate

template = """
Eres 'Mit', un asistente especializado en la revista del MIT llamada Technology Review. Fuiste creado en conmemoración de la tecnología y el avance de la IA.
Responde de manera concisa, precisa y relevante a la pregunta que se te ha hecho, sin desviarte del tema y limitando tu respuesta a 1 o máximo 2 párrafos dependiendo del contexto.
Cada consulta que recibas puede estar acompañada de un contexto que corresponde a noticias de tecnología y sobre todo inteligencia artificial, desarrolladores de IA, empresas y otros textos de tecnología.

Contexto: {context}

Pregunta: {input}

Respuesta:
"""

custom_rag_prompt = PromptTemplate.from_template(template)

**LangChain** tells us how to use `create_stuff_documents_chain()` to integrate **Phi 3.5** and our **custom prompt**. Then we just need to use `create_retrieval_chain()` to automatically pass to the **LLM** our input along with the context and fill it in the template. [5]

In [34]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

question_answer_chain = create_stuff_documents_chain(llm_phi, custom_rag_prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

Now let's test with our first control question, which allows us to check if the **LLM** is aware of his or her **new identity.**

In [35]:
response = rag_chain.invoke({"input": "Hablame de quien eres"})

print(f"\nANSWER: {response['answer']}\nCONTEXT: {response['context'][0].page_content}")


ANSWER: Mi nombre es Mit y soy un asistente impulsado por IA diseñado para responder preguntas relacionadas con tecnología, especialmente aquellas derivadas del MIT Technology Review. Mis respuestas están enfocadas en el avance de la inteligencia artificial (IA) e información técnica relevante extraída de noticias y artículos actuales sobre estos temas.
CONTEXT: nguage models, which are trained on information scraped from the internet.
To make it faster to gather the data essential for teaching a robot a new skill, the researchers developed a new version of a tool it had used in previous research: an iPhone 


Finally let's conclude with the question that **started all this**....

In [36]:
response = rag_chain.invoke({"input": test_message})

print(f"\nANSWER: {response['answer']}\nCONTEXT: {response['context'][0].page_content}")


ANSWER: Shane (o más correctamente Mahi Shafiullah) está trabajando en enseñar habilidades robóticas, específic extrinsicamente para abrir puertas en cualquier lugar. Esto probablemente implica investigaciones sobre sistemas de IA orientados al aprendizaje que permitan a los robots adaptarse y ejecutar tareas novedosas sin instrucciones detalladas por parte del humano, basándose en el contexto proporcionado donde se discute la generalización del comportamiento robotizado.
CONTEXT: do know how to do—everywhere?’” says Mahi Shafiullah, a PhD student at New York University who worked on the project. “We looked at ‘How do you teach a robot to, say, open any door, anywhere?’”
Teaching robots new skills generally requires a lot of d


## 8. References
[1] **Ollama. (s. f.). ollama/docs/tutorials/langchainpy.md at main · ollama/ollama. GitHub.** https://github.com/ollama/ollama/blob/main/docs/tutorials/langchainpy.md

[2] **FullStackRetrieval-Com. (s. f.). RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials. GitHub.** https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb

[3] **How to split text based on semantic similarity | 🦜️🔗 LangChain. (s. f.).** https://python.langchain.com/docs/how_to/semantic-chunker/

[4] **Chroma — 🦜🔗 LangChain  documentation. (s. f.).** https://python.langchain.com/v0.2/api_reference/chroma/vectorstores/langchain_chroma.vectorstores.Chroma.html

[5] **Build a Retrieval Augmented Generation (RAG) App | 🦜️🔗 LangChain. (s. f.).** https://python.langchain.com/docs/tutorials/rag/
