#### Following are the steps to get hands on Ollama

    - Install the application in Linux by following the instruction at https://github.com/ollama/ollama/blob/main/docs/linux.md or 
    
        simply run curl -fsSL https://ollama.com/install.sh | sh

    - Ollama maintains a library that contains the necessary models, and we can create the keys for the same. 
    
    - We can create custom models using the ModelFile
        FROM ./vicuna-33b-Q4_0.gguf
        ollama create vicuna -f ModelFile
        ollama run vicuna
    
    - We can customize the model generation prompts.
        
        - ollama pull llama2

        - ModelFile
        FROM llama2
        # set the temperature to 1 [higher is more creative, lower is more coherent]
        PARAMETER temperature 1
        # set the system message
        SYSTEM """
        You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
        """

        - ollama create mario -f ./ModelFile
        
        - ollama run mario

#### Following are the steps to serve the model

    - Start the ollama server 
        ./ollama serve

    - Serve the model
        ./ollama run llama2

    - Generate with model
  
    curl http://localhost:11434/api/generate -d '{
      "model": "llama2",
      "prompt":"Why is the sky blue?"
      }'
    
    - Chat with Model
  
    curl http://localhost:11434/api/chat -d '{
      "model": "mistral",
      "messages": [
        { "role": "user", "content": "why is the sky blue?" }
        ]
    }'

#### There is a way to implement the model that is already present

https://github.com/ollama/ollama/blob/main/docs/import.md

- clone the model 
    git lfs install

git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 model

- python llm/llama.cpp/convert.py ./model --outtype f16 --outfile converted.bin

- llm/llama.cpp/quantize converted.bin quantized.bin q4_0

- Write a ModelFile

FROM quantized.bin
TEMPLATE "[INST] {{ .Prompt }} [/INST]"

- Create a ModelFile

    Ollama create example -f ModelFile

- Run Model

    ollama run example "generate example"

- Push model to ollama account

    ollama cp example <your username>/example

    ollama push <your username>/example 
    


In [2]:
quantisation = ["q2_K", "q3_K", "q3_K_S", "q3_K_M", "q3_K_L",
                "q4_0 (recommended)", "q4_1", "q4_K", "q4_K_S", 
                "q4_K_M", "q5_0", "q5_1", "q5_K", "q5_K_S", "q5_K_M",
                "q6_K", "q8_0", "f16"]

models_details = [
  {
    "Model": "Llama 2",
    "Parameters": "7B",
    "Size": "3.8GB",
    "Download": "ollama run llama2"
  },
  {
    "Model": "Mistral",
    "Parameters": "7B",
    "Size": "4.1GB",
    "Download": "ollama run mistral"
  },
  {
    "Model": "Dolphin Phi",
    "Parameters": "2.7B",
    "Size": "1.6GB",
    "Download": "ollama run dolphin-phi"
  },
  {
    "Model": "Phi-2",
    "Parameters": "2.7B",
    "Size": "1.7GB",
    "Download": "ollama run phi"
  },
  {
    "Model": "Neural Chat",
    "Parameters": "7B",
    "Size": "4.1GB",
    "Download": "ollama run neural-chat"
  },
  {
    "Model": "Starling",
    "Parameters": "7B",
    "Size": "4.1GB",
    "Download": "ollama run starling-lm"
  },
  {
    "Model": "Code Llama",
    "Parameters": "7B",
    "Size": "3.8GB",
    "Download": "ollama run codellama"
  },
  {
    "Model": "Llama 2 Uncensored",
    "Parameters": "7B",
    "Size": "3.8GB",
    "Download": "ollama run llama2-uncensored"
  },
  {
    "Model": "Llama 2 13B",
    "Parameters": "13B",
    "Size": "7.3GB",
    "Download": "ollama run llama2:13b"
  },
  {
    "Model": "Llama 2 70B",
    "Parameters": "70B",
    "Size": "39GB",
    "Download": "ollama run llama2:70b"
  },
  {
    "Model": "Orca Mini",
    "Parameters": "3B",
    "Size": "1.9GB",
    "Download": "ollama run orca-mini"
  },
  {
    "Model": "Vicuna",
    "Parameters": "7B",
    "Size": "3.8GB",
    "Download": "ollama run vicuna"
  },
  {
    "Model": "LLaVA",
    "Parameters": "7B",
    "Size": "4.5GB",
    "Download": "ollama run llava"
  },
  {
    "Model": "Gemma",
    "Parameters": "2B",
    "Size": "1.4GB",
    "Download": "ollama run gemma:2b"
  },
  {
    "Model": "Gemma",
    "Parameters": "7B",
    "Size": "4.8GB",
    "Download": "ollama run gemma:7b"
  }
]

In [1]:
import ollama

ModuleNotFoundError: No module named 'ollama'

In [3]:
from ollama import Client
client = Client(host='http://aicontroller:11434')  

In [10]:
response = client.chat(model='gemma:2b', 
                       messages=[
  {
    'role': 'user',
    'content': 'Why is the terminal black in linux?',
  },
])

In [11]:
from rich import print

print(response['message']['content'])

In [12]:
response = client.generate(model='gemma:2b',
                           prompt='What is the color of sun?')

In [15]:
print(response)

In [14]:
print(response['response'])

In [16]:
client.list()  

{'models': [{'name': 'gemma:2b',
   'model': 'gemma:2b',
   'modified_at': '2024-03-01T13:28:59.9668293+05:30',
   'size': 1678456656,
   'digest': 'b50d6c999e592ae4f79acae23b4feaefbdfceaa7cd366df2610e3072c052a160',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'gemma',
    'families': ['gemma'],
    'parameter_size': '3B',
    'quantization_level': 'Q4_0'}}]}