## Example of querying the Ollama Inference server with Mistral-7b, Langchain and a custom Prompt

### Set the Inference server url (replace with your own address)

In [1]:
# Replace values according to your Ollama deployment
MODEL = "mistral:latest"
# BASE_URL = "https://ollama-route-<your-namespace>.apps.shift.nerc.mghpcc.org"
BASE_URL = "http://ollama-service:11434"
# BASE_URL = "http://ollama-service.<your-namespace>.svc.cluster.local:11434"

## Preparation

In [2]:
!pip install --upgrade pip
!pip install --upgrade langchain_openai qdrant_client langchain_qdrant langchain_community
!pip install --upgrade langchain-ollama
!pip install pypdf

Collecting langchain_openai
  Downloading langchain_openai-0.3.18-py3-none-any.whl.metadata (2.3 kB)
Collecting qdrant_client
  Downloading qdrant_client-1.14.2-py3-none-any.whl.metadata (10 kB)
Collecting langchain_qdrant
  Downloading langchain_qdrant-0.2.0-py3-none-any.whl.metadata (1.8 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core<1.0.0,>=0.3.61 (from langchain_openai)
  Downloading langchain_core-0.3.63-py3-none-any.whl.metadata (5.8 kB)
Collecting openai<2.0.0,>=1.68.2 (from langchain_openai)
  Downloading openai-1.82.1-py3-none-any.whl.metadata (25 kB)
Collecting tiktoken<1,>=0.7 (from langchain_openai)
  Downloading tiktoken-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting pydantic>=2.7.4 (from langchain-core<1.0.0,>=0.3.61->langchain_openai)
  Downloading pydantic-2.11.5-py3-none-any.whl.metadata (67 kB)
Collecting distro<2,>=1.7.0 (from openai<

In [3]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import Ollama
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

#### Create the LLM instance

In [4]:
# LLM definition
llm = Ollama(
    base_url=BASE_URL,
    model=MODEL,
    top_p=0.92,
    temperature=0.01,
    num_predict=512,
    repeat_penalty=1.03,
    callbacks=[StreamingStdOutCallbackHandler()]
)

  llm = Ollama(


#### Create the Prompt

In [5]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [6]:
memory=ConversationBufferMemory()

  memory=ConversationBufferMemory()


## First example, fully verbose mode

#### Create the Chain using the different components

In [7]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=True,
                                 memory=memory
                                )

  conversation = ConversationChain(llm=llm,


#### Let's talk...

In [8]:
first_input = "Describe Paris in 100 words or less."
conversation.predict(input=first_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:

Human: Describe Paris in 100 words or less.
AI:
[/INST]
[0m
 Paris, the City of Light, is a vibrant metropolis renowned for its rich history, art, and culture. Known as the global center of fashion, gastronomy, and philosophy, it's home to iconic landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Its charming streets, filled with c

In [9]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation.predict(input=second_input)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Describe Paris in 100 words or less.
AI:  Paris, the City of Light, is a vibrant metropolis renowned for its rich history, art, and culture. Known as the global center of fashion, gastronomy, and philosophy, it's home to iconic landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Its charming streets, filled with cafes, boutique

' Yes, the River Seine flows through Paris, adding to its picturesque beauty. The Seine is not only a significant waterway but also a cultural and historical landmark, with many famous sites located along its banks, such as the Louvre Museum'

### Second example, no verbosity

In [10]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation2 = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [11]:
# Let's clear the previous conversation first
memory.clear()

In [12]:
first_input = "Can you describe London in 100 words?"
conversation2.predict(input=first_input)

 London, the capital city of England and the United Kingdom, is a vibrant, historic, and cosmopolitan metropolis. Known for its iconic landmarks like the Tower of London, Buckingham Palace, and the London Eye, it's a city steeped in rich history. The River Thames, flowing through its heart, offers stunning views of the city's architecture, from modern skyscrapers to ancient buildings. London's cultural scene is diverse and dynamic, with world-class museums, galleries, and theaters. It's a global hub for finance, fashion, arts, and entertainment. Its multicultural population contributes to its rich culinary scene, offering a wide variety of international cuisines. London, a city that never sleeps, is a must-visit destination for travelers seeking a blend of tradition and modernity.

" London, the capital city of England and the United Kingdom, is a vibrant, historic, and cosmopolitan metropolis. Known for its iconic landmarks like the Tower of London, Buckingham Palace, and the London Eye, it's a city steeped in rich history. The River Thames, flowing through its heart, offers stunning views of the city's architecture, from modern skyscrapers to ancient buildings. London's cultural scene is diverse and dynamic, with world-class museums, galleries, and theaters. It's a global hub for finance, fashion, arts, and entertainment. Its multicultural population contributes to its rich culinary scene, offering a wide variety of international cuisines. London, a city that never sleeps, is a must-visit destination for travelers seeking a blend of tradition and modernity."

In [None]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a river?"
conversation2.predict(input=second_input)