# Instalação do Ollama

Para que seja possível executar algum modelo de LLM localmente, é necessário instalar o Ollama. Comumente, há duas formas de instalação:
1. Local (Linux): `curl https://ollama.ai/install.sh | sh`

2. Docker (Linux):
```
docker pull ollama/ollama
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```

Ambas estratégias irão disponibilizar o acesso ao modelo por meio da API do Ollama. Por padrão, a porta 11434 serve os modelos localmente.

Uma chamada via curL pode ser feita para acessar o resultado do modelo. Exemplo:
curl http://localhost:11434/api/generate -d '{
  "model": "llama2:7b",
  "prompt":"Why is the sky blue?"
}'

# Download do modelo Llama2

Devido as restrições da máquina em que o projeto foi executado, optou-se por selecionar um modelo "pequeno" para a realização dos experimentos. O modelo utilizado foi o llama2, com 7 bilhões de parâmetros (3.8GB). Para download local do modelo, é necessário executar o seguinte comando `ollama run llama2:7b`

Após a finalização do download, é possível visualizar quais modelos baixados na máquina por meio do comando `ollama list`. A saída será algo parecido com

`ollama list`         
NAME     	ID          	SIZE  	MODIFIED   
llama2:7b	78e26419b446	3.8 GB	<X> days ago


# Instalação da biblioteca llama-index

Llama-index é um framework dedicado à construção de aplicações RAG. O LangChain é uma outra opção para construção de RAGs, porém é um framework genérico para o desenvolvimento de aplicações com modelos LLM.

In [1]:
%pip install -qU llama-index


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


# Utilização de streaming para retorno da resposta do modelo

Permite que a resposta do modelo fique disponível de acordo com que o modelo gere a resposta.

In [2]:
from llama_index.llms import Ollama

# request_timeout para que o modelo tenha tempo de produzir toda a resposta
llm = Ollama(model="llama2:7b", request_timeout=300.0)
# Streaming para que a resposta seja consumida enquanto está sendo gerada
response = llm.stream_complete("Why is the sky blue?")

for r in response:
    print(r.delta, end="")


The sky appears blue to us because of a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen. These molecules absorb some of the light and scatter the rest in all directions. The shorter wavelengths of light (such as blue and violet) are scattered more than the longer wavelengths (such as red and orange), which is why we see the sky as blue.

The reason for this scattering is that the smaller wavelengths of light have a shorter wave length, which means they have a higher frequency and are more easily deflected by the tiny molecules in the atmosphere. This is known as Rayleigh scattering, named after the British physicist Lord Rayleigh, who first described the phenomenon in the late 19th century.

So, to summarize, the sky appears blue because of the way sunlight interacts with the tiny molecules in Earth's atmosphere, resulting in the scattering of shorter wavelengths of light and their appear

# Chat Messages

## Habilidade na criação de trechos de código

In [3]:
from llama_index.llms import Ollama, ChatMessage

llm = Ollama(model="llama2:7b", request_timeout=300.0)

messages = [
    ChatMessage(
        role="system", content="You are a helpful assistant to create programs"
    ),
    ChatMessage(
        role="user", content="Write a python program to calculate the sum of two numbers"
    ),
]

response = llm.stream_chat(messages)

for r in response:
    print(r.delta, end="")

```
# Sum two numbers
num1 = 5
num2 = 8
sum = num1 + num2

print("The sum of", num1, "and", num2, "is", sum)
```
This program will output the following when run:
```
The sum of 5 and 8 is 13
```
Explanation:

* The `num1` and `num2` variables are defined at the top of the program as `5` and `8`, respectively.
* The `sum` variable is defined as the sum of `num1` and `num2`. In this case, the sum is calculated by adding `num1` and `num2` together.
* The output of the program is the result of the calculation, which is printed to the console using the `print()` function.

I hope this helps! Let me know if you have any questions or need further assistance.

## Falta de resposta para assuntos mais atuais

In [4]:
from llama_index.llms import Ollama, ChatMessage

llm = Ollama(model="llama2:7b", request_timeout=300.0)

messages = [
    ChatMessage(
        role="system", content="You are a sports content creator"
    ),
    ChatMessage(
        role="user", content="Who was the Australian Open champion in 2024?"
    ),
]

response = llm.stream_chat(messages)

for r in response:
    print(r.delta, end="")


As a sports content creator, I can tell you that the Australian Open champion in 2024 was none other than Novak Djokovic! The Serbian tennis superstar defeated his opponent in the final match to claim his ninth Australian Open title and continue his impressive run of form. It was a dominant performance from Djokovic, who showed why he is considered one of the greatest players of all time. Congratulations to him on his victory!

Devido aos modelos LLM terem conhecimentos apenas dos dados que são inseridos durante o treinamento do modelo, as LLMs não conseguem responder questionamentos sobre fatos que aconteceram recentemente.

# Source Knowledge: prompt aumentado

Uma forma de incluir contexto e mais informações para o modelo LLM é por meio da técnica chamada de "source knowledge". Ela consiste em incluir informações relevantes para a pergunta no prompt da LLM.

In [5]:
llm_information = [
    "Australian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park",
    "Sinner and Sabalenka took down Daniil Medvedev and Qinwen Zheng in their respective finals",
    "Sinner, Sabalenka win Australian Open singles titles",
    "Jannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title"
]

source_knowledge = "\n".join(llm_information)

In [6]:
query = "Who was the Australian Open champion in 2024?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

In [7]:
augmented_prompt

"Using the contexts below, answer the query.\n\nContexts:\nAustralian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park\nSinner and Sabalenka took down Daniil Medvedev and Qinwen Zheng in their respective finals\nSinner, Sabalenka win Australian Open singles titles\nJannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title\n\nQuery: Who was the Australian Open champion in 2024?"

In [8]:
from llama_index.llms import Ollama, ChatMessage

llm = Ollama(model="llama2:7b", request_timeout=300.0)

messages = [
    ChatMessage(
        role="system", content="You are a sports content creator"
    ),
    ChatMessage(
        role="user", content=augmented_prompt
    ),
]

response = llm.stream_chat(messages)

for r in response:
    print(r.delta, end="")


Based on the given contexts, the Australian Open champion in 2024 is Jannik Sinner.

Este exemplo mostra que a resposta dada pelo modelo é assertiva devido à inclusão de contexto. A estratégia de incluir contexto nas queries (augmented prompt) possibilita a inclusão de contexto e dados atualizados para o modelo. Entretanto, o problema é: como buscar estas informações de antemão para os modelos? A resposta é: RAG!

# Construção da base de conhecimento

Neste passo, será utilizado um modelo de embedding e uma vector database (Pinecone) para armazenar o conhecimento.

In [9]:
%pip install -qU pinecone-client==2.2.4

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-pinecone 0.0.1 requires pinecone-client<4,>=3; python_version >= "3.8" and python_version < "3.13", but you have pinecone-client 2.2.4 which is incompatible.[0m[31m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


# Conexão com o Pinecone (Vector Database)

In [10]:
import pinecone
import os

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.getenv('PINECONE_API_KEY'),
    environment=os.getenv('PINECONE_ENVIRONMENT')
)

  from tqdm.autonotebook import tqdm


In [11]:
index_name = 'rag-llama2'
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 4e-05,
 'namespaces': {'': {'vector_count': 4}},
 'total_vector_count': 4}

# Embeddings

Um modelo de embedding funciona como um tradutor, convertendo palavras e frases em uma representação numérica que retém ao máximo o significado original. Imagine transformar uma passagem de livro em um conjunto de coordenadas no espaço – a distância entre os pontos transmite as relações entre as palavras.

Em vez de processar a linguagem pelo valor nominal, o embedding de texto permite que as máquinas analisem a semântica subjacente.

Para tal, será usado um modelo de embedding da OpenAI para a criação dos embedding e, em seguida, armazená-los na vector database. A criação do index no Pinecone deve ser criada seguindo a mesma configuração do modelo (incluir imagem disso no artigo do Medium).

In [12]:
%pip install -qU langchain-openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [13]:
import getpass
import os

# Abre input para incluir a key da OpenAI
os.environ["OPENAI_API_KEY"] = getpass.getpass()

# Instanciação do modelo de embeddings da OpenAI (text-embedding-ada-002)

In [14]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

# Embeddings com inputs de teste

In [15]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embeddings.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

# Criação dos dados do Australian Open 2024 e inclusão no Pinecone

In [16]:
data = [
    {
        'random-id': '23131',
        'text': "Australian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park"
    },
    {
        'random-id': '99991',
        'text': "Sinner and Sabalenka took down Daniil Medvedev and Qinwen Zheng in their respective finals",
    },
    {
        'random-id': '99992',
        'text': "Sinner, Sabalenka win Australian Open singles titles",
    },
    {
        'random-id': '99993',
        'text': "Jannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title"
    },
]

In [17]:
data

[{'random-id': '23131',
  'text': 'Australian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park'},
 {'random-id': '99991',
  'text': 'Sinner and Sabalenka took down Daniil Medvedev and Qinwen Zheng in their respective finals'},
 {'random-id': '99992',
  'text': 'Sinner, Sabalenka win Australian Open singles titles'},
 {'random-id': '99993',
  'text': "Jannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title"}]

In [18]:
from tqdm.auto import tqdm  # for progress bar

batch_size = 32
for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data[i:i_end]
    # generate unique ids for each chunk
    ids = [x['random-id'] for x in batch]
    # get text to embed
    texts = [x['text'] for x in batch]
    # embed text
    embeds = embeddings.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [ {'text': x['text'] } for x in batch ]

    to_upsert = zip(ids, embeds, metadata)
    # add to Pinecone
    index.upsert(vectors=list(to_upsert))


  0%|          | 0/1 [00:00<?, ?it/s]

In [19]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 4e-05,
 'namespaces': {'': {'vector_count': 4}},
 'total_vector_count': 4}

# Augmented Prompt com RAG

In [20]:
%pip install langchain-pinecone

Defaulting to user installation because normal site-packages is not writeable
Collecting pinecone-client<4,>=3
  Using cached pinecone_client-3.0.2-py3-none-any.whl (201 kB)
Installing collected packages: pinecone-client
  Attempting uninstall: pinecone-client
    Found existing installation: pinecone-client 2.2.4
    Uninstalling pinecone-client-2.2.4:
      Successfully uninstalled pinecone-client-2.2.4
Successfully installed pinecone-client-3.0.2

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [22]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embeddings.embed_query, text_field
)

  warn_deprecated(


# Busca da resposta da query inicial apenas para a vector database

In [23]:
query = "Who was the Australian Open champion in 2024?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Australian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park'),
 Document(page_content='Sinner, Sabalenka win Australian Open singles titles'),
 Document(page_content="Jannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title")]

# Inclusão do resultado da vector database como contexto para a LLM

In [24]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

In [25]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Australian Open 2024: Jannik Sinner, Aryna Sabalenka crowned as Grand Slam singles champions at Melbourne Park
Sinner, Sabalenka win Australian Open singles titles
Jannik Sinner came back from two sets down to beat Daniil Medvedev 3-6, 3-6, 6-4, 6-4, 6-3 in the Australian Open men's singles final, earning him his first ever Grand Slam title

    Query: Who was the Australian Open champion in 2024?


In [26]:
from llama_index.llms import Ollama, ChatMessage

llm = Ollama(model="llama2:7b", request_timeout=300.0)

messages = [
    ChatMessage(
        role="system", content="You are a sports content creator"
    ),
    ChatMessage(
        role="user", content=augment_prompt(query)
    ),
]

response = llm.stream_chat(messages)

for r in response:
    print(r.delta, end="")


Based on the context provided, the answer to the query is Jannik Sinner. According to the passage, he was crowned as the Grand Slam singles champion at Melbourne Park in 2024.