# Pull llama3 using ollama
```sh
ollama pull llama3
```

## Prompts
- https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

In [5]:
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser

In [3]:
# define LLM
local_llm = 'llama3'
llm = ChatOllama(model=local_llm, temperature=0)

In [4]:
# define the prompt
prompt_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an AI assistant for question-answering tasks.
Given the context below, your task is to answer the question using the instructions below.

% INSTRUCTIONS:
Follow the instructions to answer the question:
- only use the context to answer the question
- if you don't know the answer, just say [I don't know]
- use three sentences maximum and keep the answer concise
<|eot_id|><|start_header_id|>user<|end_header_id|>
% CONTEXT:
{context}

% QUESTION: 
{question} 

% ANSWER:
<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

prompt = prompt = PromptTemplate.from_template(prompt_template)

print(type(prompt))
print(prompt)

<class 'langchain_core.prompts.prompt.PromptTemplate'>
input_variables=['context', 'question'] template="<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an AI assistant for question-answering tasks.\nGiven the context below, your task is to answer the question using the instructions below.\n\n% INSTRUCTIONS:\nFollow the instructions to answer the question:\n- only use the context to answer the question\n- if you don't know the answer, just say [I don't know]\n- use three sentences maximum and keep the answer concise\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n% CONTEXT:\n{context}\n\n% QUESTION: \n{question} \n\n% ANSWER:\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>"


In [8]:
# define the qa chain
qa_chain = prompt | llm | StrOutputParser()

print(qa_chain)
print(type(qa_chain))

first=PromptTemplate(input_variables=['context', 'question'], template="<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an AI assistant for question-answering tasks.\nGiven the context below, your task is to answer the question using the instructions below.\n\n% INSTRUCTIONS:\nFollow the instructions to answer the question:\n- only use the context to answer the question\n- if you don't know the answer, just say [I don't know]\n- use three sentences maximum and keep the answer concise\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n% CONTEXT:\n{context}\n\n% QUESTION: \n{question} \n\n% ANSWER:\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>") middle=[ChatOllama(model='llama3', temperature=0.0)] last=StrOutputParser()
<class 'langchain_core.runnables.base.RunnableSequence'>


In [10]:
context = """
Ollama is a user-friendly interface for running large language models (LLMs) locally, specifically on MacOS and Linux, with Windows support on the horizon. It is a valuable tool for researchers, developers, and anyone who wants to experiment with language models. Ollama supports a wide range of models, including Lama 2, Lama 2 uncensored, and the newly released Mistal 7B, among others.

Features
- Ease of Use: Ollama is easy to install and use, even for users with no prior experience with language models. It provides a simple API for creating, running, and managing models.
- Versatility and Model Installation: Ollama supports a wide range of models, making it versatile for various applications. It also provides a straightforward installation process, making it appealing to individuals and small teams.
- GPU Acceleration: Ollama leverages GPU acceleration, which can speed up model inference by up to 2x compared to CPU-only setups. This feature is particularly beneficial for tasks that require heavy computation. This feature is included out of the box and it requires zero intervention.
- Integration Capabilities: Ollama is compatible with several platforms like Langchain, llama-index, and more.
- Privacy and Cost: Running LLMs locally with Ollama ensures data privacy as your data is not sent to a third party. It also eliminates inference fees, which is important for token-intensive applications.
"""

In [11]:
question = "What is ollama"
output = qa_chain.invoke({"question": question, "context": context})
print(output)

Ollama is a user-friendly interface for running large language models (LLMs) locally on MacOS and Linux, with Windows support coming soon. It provides an easy-to-use API for creating, running, and managing models, making it accessible to researchers, developers, and anyone interested in experimenting with language models.


In [12]:
question = "Can I run ollama on my GPU"
output = qa_chain.invoke({"question": question, "context": context})
print(output)

Yes, you can run Ollama on your GPU. According to the context, Ollama leverages GPU acceleration, which can speed up model inference by up to 2x compared to CPU-only setups. This feature is included out of the box and requires zero intervention.


In [13]:
question = "What is llama3"
output = qa_chain.invoke({"question": question, "context": context})
print(output)

I don't know. The context does not mention "Llama3". It only mentions Lama 2 and Mistal 7B as specific language models supported by Ollama.
