This tutorial demonstrates how to create a standard retriever using LlamaIndex with the Groq API and HuggingFace's embedding capabilities. By following these steps, you'll be able to build a simple RAG chatbot.

### Install the necessary requirements

In [1]:
!pip install -q llama-index-embeddings-huggingface
!pip install -q llama-index llama-index-readers-web
!pip install -q llama-index-llms-groq
!pip install -q llama-index


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip

In [2]:
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.groq import Groq
from llama_index.core.node_parser import SentenceSplitter
import os
from llama_index.core import Settings

  from .autonotebook import tqdm as notebook_tqdm


### 1. Read websites

In [3]:
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://docs.llamaindex.ai/en/stable/", "https://docs.llamaindex.ai/en/stable/getting_started/concepts/"]
)

In [4]:
llm = Groq(model="llama3-8b-8192", api_key=os.environ['GROQ_API_KEY'])
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")



Test Llama3 response

In [5]:
print(llm.complete("What is rag?"))

A simple but interesting question!

"Rag" can have different meanings depending on the context. Here are a few possible interpretations:

1. **Clothing**: A rag is a piece of cloth, often a scrap or a remnant of fabric, that is used for cleaning or wiping surfaces. It can also refer to a piece of clothing, such as a rag doll or a rag rug.
2. **Music**: In jazz and blues music, a "rag" refers to a type of musical composition characterized by complex rhythms and syncopated melodies. Ragtime music originated in the early 20th century and is often associated with pianists like Scott Joplin and Eubie Blake.
3. **Slang**: In informal contexts, "rag" can be used as a slang term to refer to a person, often in a derogatory or playful manner. For example, "What a rag you are!" (meaning "What a silly person you are!")
4. **Other meanings**: In various contexts, "rag" can also refer to a newspaper or magazine (e.g., "The Rag" is a student-run newspaper), a type of fabric or textile (e.g., "rag woo

### 2. Set the LLM, Embedding, configs,...

In [6]:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.text_splitter = SentenceSplitter(chunk_size=1024)
Settings.chunk_size = 512
Settings.chunk_overlap = 20
Settings.transformations = [SentenceSplitter(chunk_size=1024)]

# maximum input size to the LLM
Settings.context_window = 3500
# number of tokens reserved for text generation.
Settings.num_output = 512

### 3. Index documents

In [7]:
vector_index = VectorStoreIndex.from_documents(documents, show_progress=True)

Parsing nodes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.81it/s]
Generating embeddings: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:25<00:00,  2.86it/s]


### 4. Test query

In [8]:
query_engine = vector_index.as_query_engine(similarity_top_k=5,)

In [9]:
print(query_engine.query("What is rag?"))

Retrieval Augmented Generation (RAG) is a high-level concept that refers to the process of using a retrieval-based approach to generate text.
