One of the main challenges when working with large language models (LLMs) is ensuring that the model output follows a specific structure. Since LLMs generate text based on probabilities, their responses may not always match the format or schema we expect, even when explicitly prompted.

We’ll use Ollama, which enables us to run local Llama models on our machine, combined with Langchain, to seamlessly integrate different steps of our solution

In [2]:
import numpy as np
import pandas as pd
import os

In [3]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting SQLAlchemy<3,>=1.4 (from langchain)
  Downloading SQLAlchemy-2.0.35-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain)
  Downloading aiohttp-3.10.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.6 kB)
Collecting async-timeout<5.0.0,>=4.0.0 (from langchain)
  Downloading async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting langchain-core<0.4.0,>=0.3.10 (from langchain)
  Downloading langchain_core-0.3.10-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.132-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-

In [4]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.2-py3-none-any.whl.metadata (2.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.5.2-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.22.0-py3-none-any.whl.metadata (7.2 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloa

In [5]:
!pip install langchain-huggingface

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.0-py3-none-any.whl.metadata (1.3 kB)
Collecting sentence-transformers>=2.6.0 (from langchain-huggingface)
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Downloading langchain_huggingface-0.1.0-py3-none-any.whl (20 kB)
Downloading sentence_transformers-3.1.1-py3-none-any.whl (245 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers, langchain-huggingface
Successfully installed langchain-huggingface-0.1.0 sentence-transformers-3.1.1


In [6]:
!pip install openai

Collecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading openai-1.51.2-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.7/383.7 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.2/325.2 kB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jiter, openai
Successfully installed jiter-0.6.1 openai-1.51.2


In [7]:
!pip install -qU langchain-ollama

In [8]:
!pip install ollama



In [9]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [10]:
from langchain import LLMChain, PromptTemplate
from langchain_huggingface import HuggingFacePipeline

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

In [1]:
from langchain_ollama import ChatOllama

In [2]:
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.output_parsers import PydanticOutputParser, OutputFixingParser, RetryOutputParser

In [4]:
from langchain_core.output_parsers.string import StrOutputParser
from typing import Optional

In [5]:
from pydantic import BaseModel, Field
from langchain.output_parsers import RetryOutputParser

In [6]:
from langchain_core.runnables import RunnableLambda, RunnableParallel
import subprocess

When working with large language models, it's important to note that not all models support structured outputs. For instance, llama3-chatqa:8b does not currently support adding another component for structured responses using with_structured_output method from langchain. However, the llama3.1 model supports structured outputs, making it a great choice for our tutorial. You can explore more models available on Ollama’s library: https://ollama.ai/library

Downloading the model will take a few minutes...

In [8]:
# now we pull of of the powerful llam3.1
!ollama pull llama3.1

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ 

In [7]:
# I will use capture to remove the cell output, feel free to comment it to show all the progress bar
# we can now start ollama using ollama serve command
process = subprocess.Popen("ollama serve", shell=True)

#Initializing the model

Now we can use the freshly downloaded model. In our system there will be different components as follows:

1.   llm: which is the large language model used llama3.
2.   prompt template: this template contains
      * Basic instructions for the llm
      * Output schema that we want the model to follow
      * Finally the question that you want the model to answerList item
3.   the pydantic model: where you define the schema that you want your answer to follow
4.   the parser: two parsers will be used


      * The pydantic parser, which parses the llm output according to the pydantic model
      * The retry parser, which checks if the output follows the schema, if not it prompts the llm again but this time with the error, asking the model to retry using the schema.

In [9]:
llm = ChatOllama(
    model='llama3.1',
    temperature=0
)

In the template we will ask the model to give us the answer in a specif format, for instance we want to output to contain the question, the answer and rating for the confidence of the llm in the answer. we will also provide the format instruction and the query

In [10]:
template = """Based on the user question, provide an answer which consists of
the question, your answer to the question and a rating on how confidence you are.
{format_instructions}
Question: {query}
Response:"""

In [11]:
class Answer(BaseModel):

    question: str = Field(description="the question asked to the user")
    answer: str = Field(description="answer to the user question")
    rating: int = Field(description="How confident the answer is, from 1 to 10")

In [12]:
parser = PydanticOutputParser(pydantic_object=Answer)

In [13]:
retry_parser = RetryOutputParser.from_llm(parser=parser, llm=llm)

In [14]:
# now we will use a prompt template to be able to use the prompt above in a chain
# the prompt template uses the template and substitutes for the variables between curly brankets {}
# in the template above, it takes the format instructions from the pydantic parer
prompt = PromptTemplate(
    template=template,
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

Now we can use two things:

1. structured_llm: which is done using with_structured_output, according to the documentation, link:https://python.langchain.com/docs/how_to/structured_output/

2. This is the easiest and most reliable way to get structured outputs. with_structured_output() is implemented for models that provide native APIs for structuring outputs, like tool/function calling or JSON mode, and makes use of these capabilities under the hood

In [15]:
structured_llm = llm.with_structured_output(Answer)

In [16]:
chain =  prompt | structured_llm

In [21]:
answer = chain.invoke("Who is ColdPlay?")

In [22]:
answer

Answer(question='Who is ColdPlay?', answer='ColdPlay is a British rock band formed in 1996.', rating=8)

In [23]:
print(f"question: {answer.question}")
print(f"answer: {answer.answer}")
print(f"confidence: {answer.rating}")

question: Who is ColdPlay?
answer: ColdPlay is a British rock band formed in 1996.
confidence: 8


But as mentioned earlier, it can still fail, so we can use the retry parser to reprompt the llm if it failed.

In [24]:
chain =  prompt | llm | StrOutputParser()

In [25]:
main_chain = (RunnableParallel(completion=chain, prompt_value=prompt) |
              RunnableLambda(lambda x: retry_parser.parse_with_prompt(**x)))

In [27]:
answer = main_chain.invoke("Who is ColdPlay?")

In [28]:
answer

Answer(question='Who is ColdPlay?', answer='Coldplay is a British rock band formed in 1996. The band consists of Chris Martin, Jonny Buckland, Guy Berryman, and Will Champion.', rating=9)

In [29]:
print(f"question: {answer.question}")
print(f"answer: {answer.answer}")
print(f"confidence: {answer.rating}")

question: Who is ColdPlay?
answer: Coldplay is a British rock band formed in 1996. The band consists of Chris Martin, Jonny Buckland, Guy Berryman, and Will Champion.
confidence: 9
