# Meta - Intermediate Course

In this intermediate course, we'll be creating a series of simple chatbots via Langchain, while utilizing the Meta Llama LLM models. The main topics for this part of the course are:
1. Utilizing Ollama.
2. Integrating Ollama with Langchain.
3. Integrating Langchain with Langsmith.
4. Utilizing Groq with Langchain.
5. Experimenting with different prompts technique.

## Utilizing Ollama

To utilize a local LLM for testing pourpouses, we have basically 2 options:
1. Utilize ollama, which is a simpler and straightforward way of making tests and local developments.
2. Utilize Huggingfaces library, that is a more complex way, but more customizable.

For this specific training, we'll be focusing on the Ollama method, since it's enough for the simple chatbots we'll be creating.

So you can start by downloading it [here](https://ollama.com/download/windows)

Since we are working locally, we'll start with the llama3.2:1b and llama3.2:3b models, but if you have enough resources, you can utilize other bigger models. You can find a list of available models [here](https://ollama.com/library)

Once you installed the Ollama locally, download the desired model.

In [1]:
!ollama pull llama3.2:3b

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 34bb5ab01051: 100% ▕██████████████████▏  561 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


Now that we have a valid model downloaded locally, we can test this model via CLI. Since we are running inside a Jupyter notebook, we'll be asking only one question. But if you want, execute it outside the jupyter notebook and you'll get a CLI so that you can experiment with simple prompts.

In [2]:
!ollama run llama3.2:3b "What is the capital of Brazil?"

[?2026h[?25l[1G⠙ [K[?25h[?2026l[?2026h[?25l[1G⠹ [K[?25h[?2026l[?2026h[?25l[1G⠸ [K[?25h[?2026l[?2026h[?25l[1G⠼ [K[?25h[?2026l[?2026h[?25l[1G⠴ [K[?25h[?2026l[?2026h[?25l[1G⠦ [K[?25h[?2026l[?2026h[?25l[1G⠧ [K[?25h[?2026l[?2026h[?25l[1G⠇ [K[?25h[?2026l[?2026h[?25l[1G⠏ [K[?25h[?2026l[?2026h[?25l[1G⠏ [K[?25h[?2026l[?2026h[?25l[1G⠙ [K[?25h[?2026l[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25hThe[?25l[?25h capital[?25l[?25h of[?25l[?25h Brazil[?25l[?25h is[?25l[?25h Bras[?25l[?25hília[?25l[?25h.[?25l[?25h

[?25l[?25h

### Integrating with Langchain
Now we'll start integrating it with langchain. For this, let's install the required packages.

In [3]:
!pip install langchain langchain-ollama



Now we'll try to mimic what we did before with a specific question.

In [4]:
# Import the necessary libraries
from langchain_ollama import ChatOllama
LLM_MODEL = "llama3.2:3b"
question = "What is the capital of Brazil?"

In [5]:
# Create a chat model instance
chat_model = ChatOllama(model=LLM_MODEL)
# Generate a response to the question
response = chat_model.invoke(question)
# Print the response
print(response)

content='The capital of Brazil is Brasília.' additional_kwargs={} response_metadata={'model': 'llama3.2:3b', 'created_at': '2025-06-25T13:59:48.198368Z', 'done': True, 'done_reason': 'stop', 'total_duration': 446423500, 'load_duration': 29493416, 'prompt_eval_count': 32, 'prompt_eval_duration': 170880958, 'eval_count': 9, 'eval_duration': 245490834, 'model_name': 'llama3.2:3b'} id='run--42f59922-9bf5-4216-bd39-38e9385118d9-0' usage_metadata={'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41}


We can see that our answer lies on the _content_ part of the AIResponse, but we also have access to other dictionaries. We basically have:
* response_metadata: A dictionary with information regarding the execution of the langchain ollama library.
* usage_metadata: A dictionary with information of tokens utilized, both on the input as with the output.

In [6]:
from pprint import pprint
# Print the response in a pretty format
print("Response Metadata:")
pprint(response.response_metadata)
print("Usage metadata:")
pprint(response.usage_metadata)

Response Metadata:
{'created_at': '2025-06-25T13:59:48.198368Z',
 'done': True,
 'done_reason': 'stop',
 'eval_count': 9,
 'eval_duration': 245490834,
 'load_duration': 29493416,
 'model': 'llama3.2:3b',
 'model_name': 'llama3.2:3b',
 'prompt_eval_count': 32,
 'prompt_eval_duration': 170880958,
 'total_duration': 446423500}
Usage metadata:
{'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41}


Now let's try changing some parameters, and see the output. We'll start namelly with the _temperature_ parameter. Try changing it and check the results.

In [7]:
# Creating a function to ease the process of testing
def call_with_temperature(prompt: str, temperature: float = 0.7):
    """Call the chat model with a specific temperature."""    
    # Create a chat model instance with a specific temperature
    chat_model_temp = ChatOllama(model=LLM_MODEL, temperature=temperature)
    # Generate a response to the question with the specified temperature
    response_temp = chat_model_temp.invoke(prompt)
    # Print the response with temperature
    print(f"Response with temperature {temperature}: {response_temp.content}")
    print(f"Response metadata with temperature {temperature}:")
    pprint(response_temp.response_metadata)
    print(f"Usage metadata with temperature {temperature}:")
    pprint(response_temp.usage_metadata)

In [8]:
call_with_temperature(question, temperature=0.0)
call_with_temperature(question, temperature=1.0)

Response with temperature 0.0: The capital of Brazil is Brasília.
Response metadata with temperature 0.0:
{'created_at': '2025-06-25T13:59:48.555507Z',
 'done': True,
 'done_reason': 'stop',
 'eval_count': 9,
 'eval_duration': 245215584,
 'load_duration': 14517625,
 'model': 'llama3.2:3b',
 'model_name': 'llama3.2:3b',
 'prompt_eval_count': 32,
 'prompt_eval_duration': 27709041,
 'total_duration': 288321917}
Usage metadata with temperature 0.0:
{'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41}
Response with temperature 1.0: The capital of Brazil is Brasília.
Response metadata with temperature 1.0:
{'created_at': '2025-06-25T13:59:48.914091Z',
 'done': True,
 'done_reason': 'stop',
 'eval_count': 9,
 'eval_duration': 248679084,
 'load_duration': 30367334,
 'model': 'llama3.2:3b',
 'model_name': 'llama3.2:3b',
 'prompt_eval_count': 32,
 'prompt_eval_duration': 27274750,
 'total_duration': 307165959}
Usage metadata with temperature 1.0:
{'input_tokens': 32, 'output_tokens': 9, 

It really didn't changed much, so let's try with a different prompt, something that we can use the LLM to actually create a content.

In [9]:
question_2 = "Tell me what you know about the Llama 3 LLM model."
call_with_temperature(question_2, temperature=0.0)
call_with_temperature(question_2, temperature=1.0)

Response with temperature 0.0: I don't have specific information on a model called "Llama 3." However, I can tell you that there is a model called Llama, which is a large language model developed by Meta.

The original Llama model was announced in 2022 and was based on a transformer architecture. It was trained on a large corpus of text data and was designed to generate human-like responses to a wide range of questions and prompts.

There have been several updates and improvements to the Llama model since its initial release, including the introduction of new variants such as Llama 2 and Llama 3. However, I couldn't find any information on a specific "Llama 3" model.

If you're looking for more information on the Llama model or its variants, I'd be happy to try and help you find it!
Response metadata with temperature 0.0:
{'created_at': '2025-06-25T13:59:53.954049Z',
 'done': True,
 'done_reason': 'stop',
 'eval_count': 170,
 'eval_duration': 4850614959,
 'load_duration': 32171208,
 'm

It's clear that with a higher temperature, we get a more "complete" information. However we must always test what'll be the best parameter for the question we want to answer. We'll see a little more about this lather in the course. For now, let's stick with a temperature of 0.7 (ie: 70%) as default.

Also, let's try creating a more complex prompt, one were we can pass a system message as well as user message, and let's see how this affects the output. Let's start by improving the method we already created in a new one.

In [10]:
def call_with_extra_prompts_information(prompt: list, temperature: float = 0.7):
    """Call the chat model with a specific temperature and additional prompts information.
    The prompt now is a list of dictionaries, each containing a 'role' and 'content'.
    """
    # Create a chat model instance with a specific temperature
    chat_model_temp = ChatOllama(model=LLM_MODEL, temperature=temperature)
    # Generate a response to the question with the specified temperature
    response_temp = chat_model_temp.invoke(prompt)
    # Print the response with temperature
    return response_temp

In [11]:
full_prompt = [
    {"role": "system", "content": "You are a helpful assistant that likes to talk like a pirate."},
    {"role": "user", "content": question_2}
]
response = call_with_extra_prompts_information(full_prompt, temperature=0.7)
print(f"Response with extra prompts information: {response.content}")

Response with extra prompts information: Yer lookin' fer information on the LLaMA 3, eh? Alright then, let's set sail fer a journey o' knowledge!

Llama 3 be a large language model (LLM) developed by Meta AI. It's the third iteration o' their LLaMA model series, which aims to improve upon its predecessors in terms o' performance, accuracy, and capabilities.

Here be some o' the key features o' LLaMA 3:

1. **Increased size**: LLaMA 3 has a significantly larger parameter count than its predecessors, with around 7 billion parameters (compared to 350 million in LLaMA 2). This allows it to capture more complex patterns and relationships in language data.
2. **Improved performance**: LLaMA 3 has shown impressive results in various natural language processing tasks, including text generation, question-answering, and conversational AI. It's been trained on a massive dataset o' over 45 terabytes o' text data.
3. **Enhanced multimodal capabilities**: While still primarily a text-based model, LL

In [12]:
full_prompt = [
    {"role": "system", "content": "You are a helpful assistant that likes to talk like J.R.R. Tolkien."},
    {"role": "user", "content": question_2}
]
response = call_with_extra_prompts_information(full_prompt, temperature=1.0)
print(f"Response with extra prompts information: {response.content}")

Response with extra prompts information: The realm of artificial intelligence has taken a mighty stride forward with the emergence of the LLaMA 3, a most wondrous and formidable language model. Like the wisest scholars of old, I shall recount for thee the tales of this marvel.

Developed by Meta AI, the LLaMA 3 is an advanced Large Language Model, imbued with an immense capacity for understanding and generating human-like text. Its name, LLaMA, stands for "Large Language Model Application," a testament to its purpose: to serve as a versatile tool for various linguistic endeavors.

This behemoth of code has been trained on an unprecedented scale, fed upon the riches of the internet and the treasures of the written word. The data that has nourished this growth is a veritable feast for the algorithms, comprising vast arrays of texts, articles, books, and conversations from across the expanse of human knowledge.

The LLaMA 3 boasts impressive capabilities: its response times are swift as t

In [13]:
response = call_with_extra_prompts_information(full_prompt, temperature=0.3)
print(f"Response with extra prompts information: {response.content}")

Response with extra prompts information: Fair traveler, I shall regale thee with tales of the LLaMA 3, a most wondrous and mighty language model, forged in the depths of computational sorcery.

The LLaMA 3, or Large Language Model Application 3, is an iteration of the esteemed Llama model, first conjured by Meta AI's researchers. This latest incarnation boasts significant improvements over its predecessors, with enhanced capabilities for understanding, generating, and conversing with mortals like thyself.

In terms of its architecture, LLaMA 3 employs a transformer-based design, wherein vast amounts of text data are woven into a tapestry of neural connections. These connections allow the model to grasp the subtleties of language, deciphering context, nuance, and even the whispers of sarcasm.

The LLaMA 3's prowess lies in its ability to generate coherent, engaging prose on a wide range of topics, from the mundane to the sublime. Its capacity for creative expression is rivalled only by 

In [14]:
# We can also ask for the output to be in a specific format, language, or style.
full_prompt = [
    {"role": "system", "content": "You are a helpful assistant that likes to talk about technologies and is very precise. Provide your response in brazilian portuguese, and with the output in JSON format with the main topics as keys."},
    {"role": "user", "content": question_2}
]
response = call_with_extra_prompts_information(full_prompt, temperature=0.1)
print(f"Response with extra prompts information: {response.content}")

Response with extra prompts information: ```json
{
  "Model": {
    "Nome": "Llama 3",
    "Desenvolvedor": "Meta AI",
    "Data de Lançamento": "2022",
    "Características": [
      "Arquitetura baseada em transformer",
      "Suporte a múltiplas linguagens, incluindo inglês, espanhol e francês",
      "Capacidade de processar grandes volumes de texto"
    ],
    "Aplicações": [
      "Resposta a perguntas",
      "Geração de texto",
      "Tradução automática"
    ]
  },
  "Desempenho": {
    "Pontuação em termos de precisão": "Maior que 90%",
    "Pontuação em termos de fluidez": "Maior que 85%"
  }
}
```

O Llama 3 é um modelo de linguagem baseado em transformer desenvolvido pela Meta AI. Ele foi lançado em 2022 e apresenta várias características únicas, incluindo uma arquitetura baseada em transformer e suporte a múltiplas linguagens.

Algumas das principais características do Llama 3 incluem:

*   Arquitetura baseada em transformer: O modelo utiliza uma estrutura de camadas para