## Projeto RAG

* Usar documentos sem obstruções
* Fragmentar Textos
* Criar embeddings e armazenar
* Criar o prompt

In [17]:
import ollama
import chromadb
import time

#start_time = time.perf_counter()  # Start timing
EMB_MODEL = "nomic-embed-text" #"mxbai-embed-large" #"all-minilm" 
MODEL = "llama3.2:1b"

## Base de conhecimento e banco de dados de incorporação de vetores (para armazenar os documentos)


In [12]:
documents = [
    "Bee-keeping, also known as apiculture, involves the maintenance of bee colonies, typically in hives, by humans.",
    "The most commonly kept species of bees is the European honey bee (Apis mellifera).",
    "Bee-keeping dates back to at least 4,500 years ago, with evidence of ancient Egyptians practicing it.",
    "A beekeeper's primary role is to manage hives to ensure the health of the bee colony and maximize honey production.",
    "Honey bees are social insects, living in colonies with a single queen, numerous worker bees, and drones.",
    "The queen bee can lay up to 2,000 eggs per day during peak seasons.",
    "Worker bees are female and perform all the tasks in the hive except for reproduction.",
    "Drones are male bees whose primary role is to mate with a queen from another hive.",
    "Honey bees communicate with each other through the 'waggle dance,' which indicates the direction and distance to food sources.",
    "Bees produce honey from the nectar they collect from flowers, which they store in the hive for food during winter.",
    "Bees also produce beeswax, which they use to build the honeycomb structure in the hive.",
    "Propolis, another bee product, is a resin-like substance collected from tree buds and used to seal gaps in the hive.",
    "Bees play a crucial role in pollination, which is essential for the reproduction of many plants and crops.",
    "A typical bee colony can contain between 20,000 and 80,000 bees.",
    "Bee-keeping can be done for various purposes, including honey production, pollination services, and the sale of bees and related products.",
    "Beekeepers must inspect their hives regularly to check for diseases, pests, and the overall health of the colony.",
    "Common pests and diseases that affect bees include varroa mites, hive beetles, and foulbrood.",
    "Bee-keeping requires protective clothing and equipment, such as a bee suit, gloves, and a smoker to calm the bees.",
    "Sustainable bee-keeping practices are important for maintaining healthy bee populations and ecosystems.",
    "Beekeeping can be a hobby, a part-time occupation, or a full-time profession, depending on the scale and intent of the beekeeper.",
    "Almost all the honey we consume comes from western honey bees (Apis mellifera), a hybrid of European and African species.", 
    "There are another 20,000 different bee species in the world.",  
    "Brazil alone has more than 300 different bee species, and the vast majority, unlike western honey bees, don’t sting.", 
    "Reports written in 1577 by Hans Staden, mention three native bees used by indigenous people in Brazil.",
    "The indigenous people in Brazil used bees for medicine and food purposes",
    "From Hans Staden report: probable species: mandaçaia (Melipona quadrifasciata), mandaguari (Scaptotrigona postica) and jataí-amarela (Tetragonisca angustula)."
]

client = chromadb.Client()
collection = client.create_collection(name="bee_facts")

# store each document in a vector embedding database
for i, d in enumerate(documents):
  response = ollama.embeddings(model=EMB_MODEL, prompt=d)
  embedding = response["embedding"]
  collection.add(
    ids=[str(i)],
    embeddings=[embedding],
    documents=[d]
  )

len(embedding)

UniqueConstraintError: Collection bee_facts already exists

In [18]:
prompt = "How many bees are in a colony? Who lays eggs and how much? How about\
          common pests and diseases?"

response = ollama.embeddings(
  prompt=prompt,
  model=EMB_MODEL
)

In [19]:
results = collection.query(
  query_embeddings=[response["embedding"]],
  n_results=5
)
data = results['documents']

In [20]:
prompt=f"Using this data: {data}. Respond to this prompt: {prompt}",

In [21]:
output = ollama.generate(
  model=MODEL,
  prompt=f"Using this data: {data}. Respond to this prompt: {prompt}",
  options={
    "temperature": 0.0,
    "top_k":10,
    "top_p":0.5                          }
)

print(output['response'])

I'll be happy to help you with your questions about the data.

Based on the provided text, here are the answers to your questions:

1. How many bees are in a colony?
The text states that "A typical bee colony can contain between 20,000 and 80,000 bees."

2. Who lays eggs and how much?
The text does not explicitly state who lays eggs or how much. However, it mentions that the queen bee "can lay up to 2,000 eggs per day during peak seasons." This suggests that the queen bee is responsible for laying eggs.

3. How about common pests and diseases?
The text states that "Common pests and diseases that affect bees include varroa mites, hive beetles, and foulbrood."


In [22]:
def rag_bees(prompt, n_results=5, temp=0.0, top_k=10, top_p=0.5):
    start_time = time.perf_counter()  # Start timing
    
    # generate an embedding for the prompt and retrieve the data 
    response = ollama.embeddings(
      prompt=prompt,
      model=EMB_MODEL
    )
    
    results = collection.query(
      query_embeddings=[response["embedding"]],
      n_results=n_results
    )
    data = results['documents']
    
    # generate a response combining the prompt and data retrieved
    output = ollama.generate(
      model=MODEL,
      prompt=f"Using this data: {data}. Respond to this prompt: {prompt}",
      options={
        "temperature": temp,
        "top_k": top_k,
        "top_p": top_p                          }
    )
    
    print(output['response'])
    
    end_time = time.perf_counter()  # End timing
    elapsed_time = round((end_time - start_time), 1)  # Calculate elapsed time
    
    print(f"\n [INFO] ==> The code for model: {MODEL}, took {elapsed_time}s \
          to generate the answer.\n")

In [23]:
prompt = "Existem abelhas no Brazil?"
rag_bees(prompt)

Based on the data provided, it appears that there are indeed bees present in Brazil. The text mentions "native bees used by indigenous people" and specifically names three species of bees found in Brazil:

1. Mandaçaia (Melipona quadrifasciata)
2. Mandaguari (Scaptotrigona postica)
3. Jataí-amarela (Tetragonisca angustula)

Additionally, the text states that "Brazil alone has more than 300 different bee species", which suggests that there are indeed many bees present in Brazil.

Therefore, the answer to your question is: Não, não existem abelhas no Brasil.

 [INFO] ==> The code for model: llama3.2:1b, took 16.5s           to generate the answer.

