## A Quick Introduction to Ollama

Ollama allows one to build LLM-based applications with open-source models like Llama, Gemma, or Mistral running on the local machine.

There is also a seamless integration of Ollama with LangChain, so all we've learned so far about developing applications with LangChain is applicable when working with locally run models via Ollama.

To integrate LangChain with Ollama, the following two steps are needed:
1) Install Ollama from: [https://ollama.com/download](https://ollama.com/download)
2) Download the models you want to use, as explained below.
3) Use the downloaded model to create an instance of ChatOllama class

Once you have Ollama installed, you can check that it's working correctly by opening command prompt (or terminal) and typing: <br>
`ollama` <br>
to get a list of available commands. For example, to see all models currently available on your machine, type (in command prompt / terminal):<br> `ollama list`.

A list of models available via Ollama is given in the [models library](https://ollama.com/library). Once you've found the model you want to use, you can download it as follows:<br>
`ollama pull <model_name>:<model_size>` <br>
For example, to download the gemma3 model with 1B parameters: <br>
`ollama pull gemma3:1b`

If you no longer need a model, you can remove it as follows: <br>
1) list the available models to identify the exact name of the one you want to remove: `ollama list` <br>
2) remove the model by typing: `ollama rm <model_name>` <br>
3) check that the model is removed: `ollama list`

#### A few notes on choosing a model to run locally

When choosing a model to run locally, pay attention to its size both in terms of the space required for its (local) storage and the RAM required for running it.

Here are some general recommendations regarding the RAM required for efficiently running a local model:
* 16GB RAM is recommended for 7B-8B models,
* 32GB-64GB for 13B-30B models,
* 128GB+ for 65B+ models.

As for the space required for storing a model locally, consider that LLM size in GB is determined by multiplying the number of parameters (in billions) by the bytes per parameter, dictated by precision. This means that for storing 1B parameters with 16-bit precision, roughly 2GB is needed, and 0.5-1GB for 4/8-bit quantized models[*]. In addition, it is recommended to have 20-30% more memory than the calculated size (for context storage) to avoid "out of memory" errors.

[*] Quantization refers to compressing the representation of model parameters from 16-bit to 8-bit or 4-bit; this significantly reduces memory usage with minimal quality loss.

The following formulas can be used for estimating an LLM's memory size in GB:
* 16-bit precision: Number of Parameters (in Billions) x 2
* 8-bit: Number of Parameters (in Billions) x 1
* 4-bit: Number of Parameters (in Billions) x 0.5

Once Ollama has started, it provides access to LLMs not only via command prompt (terminal) and the visual interface, but it also exposes an HTTP API on localhost (http://127.0.0.1:11434), so it can be accessed as any RESTFul service (e.g., using the requests library). This is also how LangChain access Ollama hosted models.

In [None]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

In [None]:
creative_llm = ChatOllama(
    model="gemma3:4b",
    temperature=0.9
)

In [None]:
branding_system_msg = """
You are a highly creative assistant, with a plenty of original ideas. You especially excell in inventing catchy brand and product promotional messages.
"""

prompt_template = ChatPromptTemplate.from_messages(
    [
        ("system", branding_system_msg),
        ("human", "What would be a good name for a company that makes and sells {product}? Suggest three distinct names and for each one provide a short explanation"),
    ]
)

In [None]:
creative_question_chain = prompt_template | creative_llm

response = creative_question_chain.invoke({"product": "healthy energy drink"})

In [None]:
print(response.content)

We'll follow LLM's suggestion to be more precise in our request:

In [None]:
refined_response = creative_question_chain.invoke({"product": "healthy energy drink based on superfoods, which provides vibrant energy"})

In [None]:
print(refined_response.content)