# Getting started tutorial with Llama3 using Langchain
First we will install the necessary packages and then we will use the Llama3 model to generate some responses from the model.

In [None]:
pip install langchain langchain-text-splitters langchain-postgres langchain-ollama

Collecting langchain-ollama
  Downloading langchain_ollama-0.2.3-py3-none-any.whl.metadata (1.9 kB)
Collecting ollama<1,>=0.4.4 (from langchain-ollama)
  Downloading ollama-0.4.7-py3-none-any.whl.metadata (4.7 kB)
Downloading langchain_ollama-0.2.3-py3-none-any.whl (19 kB)
Downloading ollama-0.4.7-py3-none-any.whl (13 kB)
Installing collected packages: ollama, langchain-ollama
Successfully installed langchain-ollama-0.2.3 ollama-0.4.7
Note: you may need to restart the kernel to use updated packages.


Once installed, we will use the Llama3 model to generate some responses from the model.

In [15]:
from langchain_ollama.chat_models import ChatOllama

In [16]:
model = ChatOllama(model='llama3')
model.invoke("The sky is:")

AIMessage(content="...blue! (or at least, that's what I'm assuming, depending on the time of day and location!) Is there something specific you'd like to know about the sky?", additional_kwargs={}, response_metadata={'model': 'llama3', 'created_at': '2025-02-28T21:00:14.5416772Z', 'done': True, 'done_reason': 'stop', 'total_duration': 8877243200, 'load_duration': 2743888000, 'prompt_eval_count': 14, 'prompt_eval_duration': 915000000, 'eval_count': 38, 'eval_duration': 5214000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-6fb050ad-62b8-4990-9558-f1017bbdc110-0', usage_metadata={'input_tokens': 14, 'output_tokens': 38, 'total_tokens': 52})

The **invoke** method is used to interact with the model. The **invoke** method takes a dictionary as input and returns a dictionary as output. The input dictionary should contain the input text and a set of parameters including **temperature**. Setting temperature to values such as 0.9 leads to more creative responses, lower values like 0.1 generate more predictable outputs. The output dictionary will contain the response generated by the model. 

Generative AI models have parameters such as **max_tokens** that limits the size (and cost) of the output, a low value can cause the output generation to stop prematurely, so it may appear truncated.

## Using roles with Langchain
In order to use them, chat model's interface makes it easier to configure and manage conversions in your AI chatbot application. 

In [17]:
from langchain_core.messages import HumanMessage

prompt = [HumanMessage("What is the capital of France?")]

model.invoke(prompt)

AIMessage(content='The capital of France is Paris.', additional_kwargs={}, response_metadata={'model': 'llama3', 'created_at': '2025-02-28T21:25:40.5015663Z', 'done': True, 'done_reason': 'stop', 'total_duration': 5328784200, 'load_duration': 2998972700, 'prompt_eval_count': 17, 'prompt_eval_duration': 1299000000, 'eval_count': 8, 'eval_duration': 1029000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-bf64e52c-3ba4-450d-a512-69a18fd6aacf-0', usage_metadata={'input_tokens': 17, 'output_tokens': 8, 'total_tokens': 25})

Chat models use different types of chat message interfaces associated with roles mentioned earlier, that include the following:
- **HumanMessage**: Message sent from the perspective of the human (*user* role)
- **AIMessage**: Message sent from the perspective of the AI that the human is interacting with (*assistant* role)
- **SystemMessage**: Message setting the instructions the AI should follow (*system* role)
- **ChatMessage**: A message allowing for arbitrary setting of role.


In [18]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama.chat_models import ChatOllama

model = ChatOllama(model="llama3")
system_msg = SystemMessage(
    '''You are a helpful assistant that responds to questions with three exclamation marks.'''
)

human_msg = HumanMessage('What is the capital of France?')

model.invoke([system_msg,human_msg])

AIMessage(content='Paris!!!', additional_kwargs={}, response_metadata={'model': 'llama3', 'created_at': '2025-02-28T22:38:15.8031557Z', 'done': True, 'done_reason': 'stop', 'total_duration': 4579676600, 'load_duration': 2735132700, 'prompt_eval_count': 37, 'prompt_eval_duration': 1010000000, 'eval_count': 3, 'eval_duration': 257000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-6625fcbe-4e70-491c-b8d6-7d0795ab8f0d-0', usage_metadata={'input_tokens': 37, 'output_tokens': 3, 'total_tokens': 40})

## Reusable prompts
Promps help generate better responses as it allows the model to understand the context and generate relevant answers to queries.

Here is a detailed prompt: 

```
Answer the question based on the context below. If the question cannot be
answered using the information provided, answer with "I don't know".

Context: The most recent advancements in NLP are being driven by Large Language 
Models (LLMs). These models outperform their smaller counterparts and have
become invaluable for developers who are creating applications with NLP 
capabilities. Developers can tap into these models through Hugging Face's
`transformers` library, or by utilizing OpenAI and Cohere's offerings through
the `openai` and `cohere` libraries, respectively.

Question: Which model providers offer LLMs?

Answer:
```

The challenge here is to figure out what the text should contain and how it should vary based on the user's input. In the provided example, context and question are hardcoded, but what if we want to pass these in dynamically?

LangChain provides **prompt template interfaces** that make it easy to construct promps with dynamic inputs:


In [19]:
from langchain_core.prompts import PromptTemplate

template = PromptTemplate.from_template("""Answer the question based on the context below. If the question cannot be answered
                                        using the information provided, answer with "I don't know".
                                        Context: {context}
                                        
                                        Question: {question}
                                        
                                        Answer: """)

template.invoke({
    "context": """The most recent advancements in NLP are being driven by Large 
        Language Models (LLMs). These models outperform their smaller 
        counterparts and have become invaluable for developers who are creating 
        applications with NLP capabilities. Developers can tap into these 
        models through Hugging Face's `transformers` library, or by utilizing 
        OpenAI and Cohere's offerings through the `openai` and `cohere` 
        libraries, respectively.""",
        "question": "Which model providers offer LLMs?"
})

StringPromptValue(text='Answer the question based on the context below. If the question cannot be answered\n                                        using the information provided, answer with "I don\'t know".\n                                        Context: The most recent advancements in NLP are being driven by Large \n        Language Models (LLMs). These models outperform their smaller \n        counterparts and have become invaluable for developers who are creating \n        applications with NLP capabilities. Developers can tap into these \n        models through Hugging Face\'s `transformers` library, or by utilizing \n        OpenAI and Cohere\'s offerings through the `openai` and `cohere` \n        libraries, respectively.\n                                        \n                                        Question: Which model providers offer LLMs?\n                                        \n                                        Answer: ')


This make the static prompt into a dynamics one, the **template** object contains the structure of the final prompt, alongside with the dynamic inputs that will be inserted. The **invoke** method dynamically replaces the placeholders with the actual values using f-string syntax. Let's see a full example.


In [27]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import PromptTemplate

template = PromptTemplate.from_template(""" Answer the question based on the context below. If the question cannot be answered using 
                          the information providedm answer with "I don't know".

                                        Context: {context}
                                        
                                        Question: {question}
                          
                          Answer: """)

model = ChatOllama(model="llama3")

prompt = template.invoke({
    "context": """The most recent advancements in NLP are being driven by Large
        Language Models (LLMs). These models outperform their smaller 
        counterparts and have become invaluable for developers who are creating 
        applications with NLP capabilities. Developers can tap into these 
        models through Hugging Face's `transformers` library, or by utilizing 
        OpenAI and Cohere's offerings through the `openai` and `cohere` 
        libraries, respectively.""",
    "question": "Which model providers offer LLMs?"
})

completion = model.invoke(prompt)

In [None]:

print(completion.content)

According to the context, OpenAI and Cohere offer Large Language Models (LLMs). Therefore, the answer is:

OpenAI and Cohere.


> **Note**: For AI chat applications you can use **ChatPromptTemplate** to provide dynamic inptus based on the role of the chat message.

A basic example could be as the following:

In [30]:
from langchain_core.prompts import ChatPromptTemplate
template = ChatPromptTemplate.from_messages([
    ("system", """Answer the question based on the context below. If the question cannot be
answered using the information provided, answer with "I don't know"."""),
    ("human","Context: {context}"),
    ("human","Question: {context}")
])

template.invoke({
    "context": "The most recent advancements in NLP are being driven by Large Language Models (LLMs). These models outperform their smaller counterparts and have become invaluable for developers who are creating applications with NLP capabilities. Developers can tap into these models through Hugging Face's `transformers` library, or by utilizing OpenAI and Cohere's offerings through the `openai` and `cohere` libraries, respectively.""",
    "question": "Which model providers offer LLMs?"
})

ChatPromptValue(messages=[SystemMessage(content='Answer the question based on the context below. If the question cannot be\nanswered using the information provided, answer with "I don\'t know".', additional_kwargs={}, response_metadata={}), HumanMessage(content="Context: The most recent advancements in NLP are being driven by Large Language Models (LLMs). These models outperform their smaller counterparts and have become invaluable for developers who are creating applications with NLP capabilities. Developers can tap into these models through Hugging Face's `transformers` library, or by utilizing OpenAI and Cohere's offerings through the `openai` and `cohere` libraries, respectively.", additional_kwargs={}, response_metadata={}), HumanMessage(content="Question: The most recent advancements in NLP are being driven by Large Language Models (LLMs). These models outperform their smaller counterparts and have become invaluable for developers who are creating applications with NLP capabiliti

If we look deeper the **ChatPromptTemplass** object uses one **SystemMessage**and two **HumanMessage** that contain the context and the question dynamically. You can still format the template in the same way and pass it to a large language model for prediction output.



In [2]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ('system','''Answer the question based on the context below. If the question cannot be answered using the information
     provided, answer with "I don't know".'''),
     ('human','Context: {context}'),
     ('human','Question: {question}'),
])

model = ChatOllama(model='llama3',num_predict=100)

prompt = template.invoke({
    "context": """"The most recent advancements in NLP are being driven by 
        Large Language Models (LLMs). These models outperform their smaller 
        counterparts and have become invaluable for developers who are creating 
        applications with NLP capabilities. Developers can tap into these 
        models through Hugging Face's `transformers` library, or by utilizing 
        OpenAI and Cohere's offerings through the `openai` and `cohere` 
        libraries, respectively.""",
        "question":"Which model providers offer LLMs?"
})

model.invoke(prompt)

AIMessage(content='According to the context, Hugging Face (through their `transformers` library), OpenAI (through their `openai` library), and Cohere (through their `cohere` library) are the model providers that offer Large Language Models (LLMs).', additional_kwargs={}, response_metadata={'model': 'llama3', 'created_at': '2025-03-02T20:16:15.800797Z', 'done': True, 'done_reason': 'stop', 'total_duration': 11097851100, 'load_duration': 31124100, 'prompt_eval_count': 152, 'prompt_eval_duration': 369000000, 'eval_count': 54, 'eval_duration': 10695000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-1e3ec8ad-313f-47d2-855d-4f53e194cde7-0', usage_metadata={'input_tokens': 152, 'output_tokens': 54, 'total_tokens': 206})

# Structured output using a LLM
Plain text can be useful, however there are certain cases where is necessary to have a **structured output**. For example, when you want to generate a JSON, XML, CSV or any programming language such as Python, Java, etc.

## JSON output
This is the most commmon structured output to generate with LLMs. The output could be used for your fronted or saved in a database. When generating JSON, the first thing to do is define the schema that you want the LLM to respect when producing the output. After that, you should include it in your prompt, along with the text you want to use as a source. Here is an example:

In [5]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.pydantic_v1 import BaseModel

class AnswerWithJustification(BaseModel):
    '''An answer to the user's question along with justification for the answer.'''
    answer : str
    '''The answer to the user's question'''
    justification : str
    '''Justification for the answer'''

llm = ChatOllama(model="llama3", temperature=0, num_predict=256)
structured_llm = llm.with_structured_output(AnswerWithJustification)

structured_llm.invoke('''What weighs more, a pound of bricks or a pound of feathers''')

ResponseError: registry.ollama.ai/library/llama3:latest does not support tools (status code: 400)

Here we can see that llama3 is not able to use tools, so we will have to use instead **llama3.1**

For that we can use

```cmd
ollama pull llama3.1
```


In [6]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.pydantic_v1 import BaseModel

class AnswerWithJustification(BaseModel):
    '''An answer to the user's question along with justification for the answer.'''
    answer : str
    '''The answer to the user's question'''
    justification : str
    '''Justification for the answer'''

llm = ChatOllama(model="llama3.1", temperature=0, num_predict=256)
structured_llm = llm.with_structured_output(AnswerWithJustification)

structured_llm.invoke('''What weighs more, a pound of bricks or a pound of feathers''')

AnswerWithJustification(answer='a pound of bricks', justification='because the weight of an object is determined by its mass, and since both objects weigh one pound, they are equal in weight')

Here we can see two things, first one that the LLM model made a mistake in the answer, secondly a schema was defined in the **BaseModel** pydanctic class. The method **with_structured_output** will use the schema to convert it to a **JSONSchema** object, which will be sent to the LLM model. Depending on the LLM, Langchain picks the best method to do this transformation, usually by **function calling** or **prompting**. Aditionally, the schema will be used to validate the output of the LLM model before returning it to the user, this ensures the output produced respects the schema defined previously.

## **Other formats using Output Parsers**
As discussed earlier, other formats than can be used include XMLs and CSVs. This is where **output parsers** are handy. These are classes that assist you for structuring large language model respones. They have 2 main functions:

1. **Providing format instructions:** Output parsers can be used to inject additional instructions in the prompt that will help guide the LLM to output text in the format it knows how to parse.
2. **Validating and parsing the output:** The main fuction is to take the textual output of the LLM or chat model and render it to a more structured format, such as list, XML, or other format. This include removing extraneous information, correcting incomplete output, and validating the parsed values.

To implement it on Python you can use the following code:

In [8]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser
parser = CommaSeparatedListOutputParser()
items = parser.invoke('apple, banana, cherry')
print(items)

['apple', 'banana', 'cherry']


### Useful methods, also known as runnable interfaces.
- **invoke**: Transforms a single input to an input.
- **batch**: efficiently transfroms multiple inputs into multiple outputs.
- **stream**: Streams output from a single input as it's produced, useful for long running tasks.

In [20]:
from langchain_ollama.chat_models import ChatOllama

model = ChatOllama(model = "llama3.1")

completion = model.invoke('Hi there!')
print(f'Completion: {completion.content}')

completions = model.batch(['Hi there!','Bye!'])
print(f'Completions: {completions}')

for index, token in enumerate(model.stream('Bye!'),start=1):
    print(f'Index {index}: {token.content}')


Completion: It's nice to meet you. Is there something I can help you with or would you like to chat?
Completions: [AIMessage(content="It's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-02T22:47:09.1697228Z', 'done': True, 'done_reason': 'stop', 'total_duration': 3715163700, 'load_duration': 35398900, 'prompt_eval_count': 13, 'prompt_eval_duration': 147000000, 'eval_count': 23, 'eval_duration': 3531000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-0ed2b26e-d26e-4464-9379-d8f93a63c0fd-0', usage_metadata={'input_tokens': 13, 'output_tokens': 23, 'total_tokens': 36}), AIMessage(content='It was nice chatting with you. If you ever want to talk again, feel free to come back anytime. Have a great day! Bye!', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-02T22:47:14.3026138Z', 'done': Tr

- **invoke()** takes a single input and returns a single output.
- **batch()** takes a list of inputs and returns a list of outputs.
- **stream()** takes a single input and returns an iterator of parts of the output as they become available.

Components can be combined from 2 ways:
- **Imperative**: Call your components directly. For instance, you can call a component's **invoke()** method directly.
- **Declarative**: Use a **LangChain Expression Language (LCEL)**

## Example using Imperative Composition

In [3]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import chain

# Building blocks

template = ChatPromptTemplate.from_messages([
    ('system','You are a helpful assistant.'),
    ('human','{question}'),
])

model = ChatOllama(model = "llama3.1", num_predict = 50)

# combine them into a fuction
# @chain decorator adds the same Runnable interface for any function you write
# Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.
# https://python.langchain.com/v0.1/docs/modules/chains/

@chain
def chatbot(values):
    prompt = template.invoke(values)
    return model.invoke(prompt)

# use it

chatbot.invoke({"question":"Which model providers offer LLMs?"})

AIMessage(content='Several model providers offer Large Language Models (LLMs). Here are some of the most notable ones:\n\n1. **Hugging Face Transformers**: Hugging Face provides a wide range of pre-trained transformer models, including BERT, RoBERTa, and', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-04T23:49:38.8767468Z', 'done': True, 'done_reason': 'length', 'total_duration': 9361596500, 'load_duration': 23307300, 'prompt_eval_count': 29, 'prompt_eval_duration': 251000000, 'eval_count': 50, 'eval_duration': 9085000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-6a06bdfa-4835-4e87-b4af-5d9afc21e27f-0', usage_metadata={'input_tokens': 29, 'output_tokens': 50, 'total_tokens': 79})

The previous example is a complete example of a chatbot, using a prompt and chat model. Uses the familiar Python syntax and supports any custom logic you might want to add in that function. If you would like to instead use **stream** method, you can use the **async** keyword in the function definition. Here it is an example:

In [5]:
@chain
def chatbot(values):
    prompt = template.invoke(values)
    for token in model.stream(prompt):
        yield token

for part in chatbot.stream({
   "question": "Which model providers offer LLMs?" 
}):
    print(part.content)

Several
 models
 provide
 Large
 Language
 Models
 (
LL
Ms
)
 that
 can
 be
 fine
-t
uned
 for
 various
 N
LP
 tasks
.
 Here
 are
 some
 popular
 ones
:


1
.
 **
H
ugging
 Face
 Transformers
**:
 H
ugging
 Face
 offers
 pre
-trained
 transformer
 models
,
 including
 popular
 ones
 like
 B



> Note: The model generated an incomplete response, mainly caused by the **num_predict** parameter also known as **max_tokens**. This parameter limits the size of the output, so a low value can cause the output generation to stop prematurely, so it may appear truncated.

Additionally, for asynchronous execution, you can rewrite the function like this:



In [6]:
# notice method ainvoke, used with asynchronous code
@chain
async def chatbot(values):
    prompt = await template.ainvoke(values)
    return await model.ainvoke(prompt)

await chatbot.ainvoke({"question":"Which model providers offer LLMs?"})

AIMessage(content='Several model providers offer Large Language Models (LLMs). Here are some of the most notable ones:\n\n1. **Hugging Face Transformers**: Hugging Face is one of the leading providers of pre-trained models, including many popular LLMs like B', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-05T00:03:47.0968816Z', 'done': True, 'done_reason': 'length', 'total_duration': 12932700900, 'load_duration': 2743405400, 'prompt_eval_count': 29, 'prompt_eval_duration': 1820000000, 'eval_count': 50, 'eval_duration': 7831000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-f1077f35-3a40-4054-9f65-b1e67876c2ae-0', usage_metadata={'input_tokens': 29, 'output_tokens': 50, 'total_tokens': 79})

## Declarative Composition
*LCEL* is a declarative language for composing LangChain components. LangChain compiles  LCEL compositions to an *optimized execution plan*, with automatic parallelization, streaming, tracing and async support. Here is an example of a LCEL composition:


In [8]:
from langchain_ollama.chat_models import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

# the building blocks

template = ChatPromptTemplate.from_messages([
    ('system','You are a helpful assistant, give concise answers'),
    ('human','{question}')
])

model = ChatOllama(model = 'llama3.1', num_predict=50)

# combine them with the | operator

chatbot = template | model

# use it

chatbot.invoke({"question":"which model providers offer LLMs?"})

AIMessage(content='Several model providers offer Large Language Models (LLMs):\n\n1. Hugging Face - Offers pre-trained models such as BERT and RoBERTa.\n2. NVIDIA - Provides the Megatron-Turing NLG model and other transformer-based architectures.\n3', additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-05T00:34:22.703456Z', 'done': True, 'done_reason': 'length', 'total_duration': 8972744800, 'load_duration': 22761300, 'prompt_eval_count': 32, 'prompt_eval_duration': 1033000000, 'eval_count': 50, 'eval_duration': 7915000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-7588f13e-73f6-4d71-91fb-094ac66d06c9-0', usage_metadata={'input_tokens': 32, 'output_tokens': 50, 'total_tokens': 82})

Basically this chain the template with the model. This means that the input is passed first to the template then the model generates a response based on the formatted prompt. The output is then returned to the user.

Here is a basic example using instead the **stream** method:

In [9]:
chatbot = template | model
for part in chatbot.stream({
    "question": "Which model providers offer LLMs?"
}):
    print(part)

content='Several' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' models' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' provide' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' Large' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' Language' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' Models' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content=' (' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content='LL' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content='Ms' additional_kwargs={} response_metadata={} id='run-c5f36d28-af0b-406c-9136-45b499ec8c13'
content='):\n\n' additional_kwargs={} response_metadata={} 

For asynchronous execution, you can rewrite the function like this:

In [10]:
chatbot = template | model
await chatbot.ainvoke({
    "question": "Which model providers offer LLMs?"
})

AIMessage(content="Several major models provide Large Language Models (LLMs), including:\n\n1. Hugging Face Transformers\n2. Meta AI's LLaMA\n3. Google's PaLM\n4. OpenAI's GPT-3\n5. Microsoft's", additional_kwargs={}, response_metadata={'model': 'llama3.1', 'created_at': '2025-03-05T00:35:44.4353493Z', 'done': True, 'done_reason': 'length', 'total_duration': 8074851800, 'load_duration': 31823400, 'prompt_eval_count': 32, 'prompt_eval_duration': 343000000, 'eval_count': 50, 'eval_duration': 7699000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-042c9c2e-6c23-44ce-9666-0f8891dfeada-0', usage_metadata={'input_tokens': 32, 'output_tokens': 50, 'total_tokens': 82})

Let's go back to the [README.md](README.md/#rags) file to see the next steps. 