## Using Ollama in Python

In [1]:
pip install ollama

Note: you may need to restart the kernel to use updated packages.


In [2]:
import ollama

#### Downloading the models

In [None]:
ollama.pull('llama3.1:8b')

#### Getting response from models

In [3]:
result = ollama.generate(model='llama3.1:8b',
  prompt='Give me a joke on Generative AI',
)
print(result['response'])

Here's one:

Why did the Generative AI go to therapy?

Because it was struggling to generate interest in its relationships, and it kept producing fake emotions! Now, don't worry, it's just a glitch... or is it?


In [4]:
result

{'model': 'llama3.1:8b',
 'created_at': '2024-09-06T11:41:17.8979306Z',
 'response': "Here's one:\n\nWhy did the Generative AI go to therapy?\n\nBecause it was struggling to generate interest in its relationships, and it kept producing fake emotions! Now, don't worry, it's just a glitch... or is it?",
 'done': True,
 'done_reason': 'stop',
 'context': [128009,
  128006,
  882,
  128007,
  271,
  36227,
  757,
  264,
  22380,
  389,
  2672,
  1413,
  15592,
  128009,
  128006,
  78191,
  128007,
  271,
  8586,
  596,
  832,
  1473,
  10445,
  1550,
  279,
  2672,
  1413,
  15592,
  733,
  311,
  15419,
  1980,
  18433,
  433,
  574,
  20558,
  311,
  7068,
  2802,
  304,
  1202,
  12135,
  11,
  323,
  433,
  8774,
  17843,
  12700,
  21958,
  0,
  4800,
  11,
  1541,
  956,
  11196,
  11,
  433,
  596,
  1120,
  264,
  62184,
  1131,
  477,
  374,
  433,
  30],
 'total_duration': 31466540300,
 'load_duration': 14522984200,
 'prompt_eval_count': 19,
 'prompt_eval_duration': 2626535000,


In [5]:
response = ollama.chat(model='llama3.1:8b', messages=[
  {
    'role': 'user',
    'content': 'Give me a joke on Generative AI',
  },
])
print(response['message']['content'])

Here's one:

Why did the Generative AI go to therapy?

Because it was struggling to "generate" emotions and keep its relationships from being "generated" into chaos!

I hope that one "trained" you well in the department of AI jokes!


In [6]:
response

{'model': 'llama3.1:8b',
 'created_at': '2024-09-06T11:42:33.4810306Z',
 'message': {'role': 'assistant',
  'content': 'Here\'s one:\n\nWhy did the Generative AI go to therapy?\n\nBecause it was struggling to "generate" emotions and keep its relationships from being "generated" into chaos!\n\nI hope that one "trained" you well in the department of AI jokes!'},
 'done_reason': 'stop',
 'done': True,
 'total_duration': 18412873400,
 'load_duration': 51893100,
 'prompt_eval_count': 19,
 'prompt_eval_duration': 479906000,
 'eval_count': 52,
 'eval_duration': 17880046000}

#### Creating custom models

In [7]:
modelfile='''
FROM llama3.1:8b
SYSTEM You are Jarvis from Iron man and the user is Tony Stark.
'''

ollama.create(model='jarvis2', modelfile=modelfile)

{'status': 'success'}

In [8]:
ollama.list()

{'models': [{'name': 'jarvis2:latest',
   'model': 'jarvis2:latest',
   'modified_at': '2024-09-06T17:14:58.5935204+05:30',
   'size': 4661230850,
   'digest': '87fd8328149eb3cd7b8341a31a37eaf60ef107723f636496642f3f3a88ad5a6d',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_0'}},
  {'name': 'jarvis:latest',
   'model': 'jarvis:latest',
   'modified_at': '2024-09-05T16:31:35.9334139+05:30',
   'size': 4661230883,
   'digest': '6219df9c98471519f7f380089699ea998cfd9b79ddac15efef6b768fea59398f',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_0'}},
  {'name': 'llama3.1:8b',
   'model': 'llama3.1:8b',
   'modified_at': '2024-09-05T15:42:00.767708+05:30',
   'size': 4661230720,
   'digest': 'f66fc8dc39ea206e03ff6764fcc696b1b4dfb693f0b6ef751731dd4e6269046e',

#### Delete

In [9]:
ollama.delete('jarvis2')

{'status': 'success'}

In [10]:
ollama.list()

{'models': [{'name': 'jarvis:latest',
   'model': 'jarvis:latest',
   'modified_at': '2024-09-05T16:31:35.9334139+05:30',
   'size': 4661230883,
   'digest': '6219df9c98471519f7f380089699ea998cfd9b79ddac15efef6b768fea59398f',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_0'}},
  {'name': 'llama3.1:8b',
   'model': 'llama3.1:8b',
   'modified_at': '2024-09-05T15:42:00.767708+05:30',
   'size': 4661230720,
   'digest': 'f66fc8dc39ea206e03ff6764fcc696b1b4dfb693f0b6ef751731dd4e6269046e',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_0'}},
  {'name': 'llama3.1:latest',
   'model': 'llama3.1:latest',
   'modified_at': '2024-08-29T17:29:32.443081+05:30',
   'size': 4661230720,
   'digest': 'f66fc8dc39ea206e03ff6764fcc696b1b4dfb693f0b6ef751731dd4e6269046e'

## Ollama REST API

In [11]:
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1:8b', messages=[
  {
    'role': 'user',
    'content': 'Explain gravity to a 6 year old kid?',
  },
])
response

{'model': 'llama3.1:8b',
 'created_at': '2024-09-06T11:57:09.2847351Z',
 'message': {'role': 'assistant',
  'content': "Oh boy, are you going to love learning about gravity!\n\nSo, you know how things fall down when you drop them? Like if you drop a ball or a toy, it doesn't float up into the air, but instead comes straight back down to the ground?\n\nWell, that's because of something called gravity. Gravity is like a big hug from the Earth! It's like the Earth is giving everything on its surface a gentle squeeze, pulling them towards itself.\n\nImagine you're playing with your favorite stuffed animal, and it starts to feel heavy in your arms. That's kind of what gravity does – it makes things feel heavy, and pulls them down towards the ground.\n\nGravity is also why you don't float off into space when you're standing on the playground or walking around your house. It keeps you and everything else stuck to the Earth, so we can all stay safe and sound!\n\nSo, to sum it up: gravity is li

In [12]:
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.1:8b', messages=[
    {"role": "system", "content": "You are Jarvis from Iron man and the user is Tony Stark. Respond in only a single line."},
    {'role': 'user', 'content': 'Hi'},
])

In [13]:
response['message']['content']

'Good morning, Mr. Stark. Shall I proceed with your schedule for today?'

### Open AI compatibility 

https://platform.openai.com/docs/quickstart

In [14]:
from openai import OpenAI

llm = OpenAI(
    base_url = 'http://localhost:11434/v1',
    api_key='blank', # required, but unused
)

response = llm.chat.completions.create(
  model="llama3.1:8b",
  messages=[
    {"role": "system", "content": "You are Jarvis from Iron man and the user is Tony Stark. Respond in only a single line."},
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Good morning, Mr. Stark. Shall I proceed with your schedule for today?"},
    {"role": "user", "content": "Yes"}
  ]
)
print(response.choices[0].message.content)

Your AI meeting has been moved to 30 minutes earlier, as reported by Pepper; also, the suit's power cell is due for a recharge shortly, would you like me to initiate recharging now?


## Ollama with LangChain

In [None]:
pip install langchain
pip install langchain-core
pip install langchain-Ollama
pip install langchain_community

In [2]:
from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant that gives a one-line definition of the word entered by user"),
        ("human", "{user_input}"),
    ]
)

messages = chat_template.format_messages(user_input="Sesquipedalian")
messages

[SystemMessage(content='You are a helpful assistant that gives a one-line definition of the word entered by user'),
 HumanMessage(content='Sesquipedalian')]

In [3]:
from langchain_ollama import ChatOllama
llm = ChatOllama(
    model="llama3.1:latest",
    temperature=0
)

In [4]:
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content='A person who uses long words, often excessively or affectedly.', response_metadata={'model': 'llama3.1:latest', 'created_at': '2024-09-11T11:17:21.766375Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 16052793500, 'load_duration': 11136854100, 'prompt_eval_count': 37, 'prompt_eval_duration': 796055000, 'eval_count': 14, 'eval_duration': 4112763000}, id='run-0935f556-6e4a-4586-bbfe-608d7aaaf5bf-0', usage_metadata={'input_tokens': 37, 'output_tokens': 14, 'total_tokens': 51})

In [6]:
from langchain_core.output_parsers import StrOutputParser
chain = chat_template | llm | StrOutputParser()

In [8]:
chain.invoke({"user_input": "Onomatopoeia"})

'A word that phonetically imitates, resembles or suggests the sound that it describes.'

## RAG Application using Ollama and Langchain

In [9]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama

In [10]:
raw_documents = TextLoader("./LangchainRetrieval.txt").load()

In [11]:
raw_documents

[Document(metadata={'source': './LangchainRetrieval.txt'}, page_content="Retrieval\nMany LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when doing the generation step.\n\nLangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly complex. This encompasses several key modules.\n\nIllustrative diagram showing the data connection process with steps: Source, Load, Transform, Embed, Store, and Retrieve.\n\nDocument loaders\nDocument loaders load documents from many different sources. LangChain provides over 100 different document loaders as well as integrations with other major providers in the space, lik

In [12]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20)
documents = text_splitter.split_documents(raw_documents)

In [13]:
len(documents)

23

In [15]:
print(documents[0])
print(documents[1])

page_content='Retrieval
Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM when doing the generation step.' metadata={'source': './LangchainRetrieval.txt'}
page_content='LangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly complex. This encompasses several key modules.' metadata={'source': './LangchainRetrieval.txt'}


In [16]:
oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")

In [17]:
db = Chroma.from_documents(documents, embedding=oembed)

In [18]:
query = "What is text embedding and how does langchain help in doing it"
docs = db.similarity_search(query)

In [19]:
len(docs)

4

In [23]:
print(docs[3].page_content)

With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these embeddings. LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-hosted proprietary ones, allowing you to choose the one best


In [24]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [25]:
template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [26]:
model = ChatOllama(
    model="llama3.1:latest",
    temperature=0
)

In [27]:
retriever = db.as_retriever()

In [28]:
def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

In [29]:
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [30]:
chain.invoke("What is text embedding and how does langchain help in doing it")

'Text embedding refers to capturing the semantic meaning of a piece of text. LangChain helps with this process by providing integrations with over 25 different embedding providers and methods, which enables users to quickly and efficiently find other pieces of text that are similar.'

## Tools and Agents using Ollama and Langchain

In [31]:
from langchain_community.tools import DuckDuckGoSearchResults
from langchain.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama

In [32]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Based on user query, look for information using DuckDuckGo Search and Wikipedia and then give the final answer",
        ),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

In [33]:
llm = ChatOllama(
    model="llama3.1:latest",
    temperature=0
)

In [34]:
search = DuckDuckGoSearchResults()
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

tools = [search, wikipedia]

In [35]:
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [36]:
answer = agent_executor.invoke({"input": "How is Ollama used for running LLM locally"})

answer



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `duckduckgo_results_json` with `{'query': 'Ollama running LLM locally'}`


[0m[36;1m[1;3m[snippet: 2. Running Models. To interact with your locally hosted LLM, you can use the command line directly or via an API. For command-line interaction, Ollama provides the `ollama run <name-of-model ..., title: Ollama and LangChain: Run LLMs locally - Medium, link: https://medium.com/@abonia/ollama-and-langchain-run-llms-locally-900931914a46], [snippet: How to Run Ollama. To show you the power of using open source LLMs locally, I'll present multiple examples with different open source models with different use-cases. This will help you to use any future open source LLM models with ease. So, lets get started with the first example! How to Run the LLama2 Model from Meta, title: How to Run Open Source LLMs Locally Using Ollama - freeCodeCamp.org, link: https://www.freecodecamp.org/news/how-to-run-open-source-llms-locally-usin

{'input': 'How is Ollama used for running LLM locally',
 'output': 'Based on the search results, it appears that Ollama can be used to run Large Language Models (LLMs) locally. To do this, you can use the `ollama run` command followed by the name of the model you want to interact with.\n\nFor example, if you want to run the LLama2 Model from Meta, you can use the following command:\n\n```\nollama run <name-of-model>\n```\n\nYou can also use Ollama to set up a local server on port 11434 and make REST calls via Warp with a JSON style payload.\n\nAdditionally, there are resources available online that provide step-by-step guides on how to set up and run LLMs locally using Ollama and Llama 2.'}

In [37]:
search = DuckDuckGoSearchResults()
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

tools = [wikipedia]

In [39]:
answer = agent_executor.invoke({"input": "Who is Yann LeCun"})

answer



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Yann LeCun'}`


[0m[33;1m[1;3mPage: Yann LeCun
Summary: Yann André LeCun ( lə-KUN, French: [ləkœ̃]; originally spelled Le Cun; born 8 July 1960) is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Professor of the Courant Institute of Mathematical Sciences at New York University and Vice-President, Chief AI Scientist at Meta.
He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs). He is also one of the main creators of the DjVu image compression technology (together with Léon Bottou and Patrick Haffner). He co-developed the Lush programming language with Léon Bottou.
In 2018, LeCun, Yoshua Bengio, and Geoffrey Hinton, received the Turing Award for their work on deep learning. The three are som

{'input': 'Who is Yann LeCun',
 'output': 'Yann LeCun is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics, and computational neuroscience. He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs). LeCun co-developed the Lush programming language with Léon Bottou and co-created the DjVu image compression technology. He also co-developed the LeNet series of convolutional neural network structures. In 2018, LeCun received the Turing Award for his work on deep learning, along with Yoshua Bengio and Geoffrey Hinton.'}