### Convert HF Model into GGUF File Format

In [None]:
from huggingface_hub import snapshot_download

model_id = "unsloth/Llama-3.2-1B-bnb-4bit"  # Replace with the ID of the model you want to download
snapshot_download(repo_id=model_id, local_dir="quantized")

In [None]:
# clone llama.cpp repo
# !git clone https://github.com/ggerganov/llama.cpp
# !pip install -r llama.cpp/requirements.txt 

### Run Conversion Script (Model to GGUF)

In [None]:
!python ./llama.cpp/convert_hf_to_gguf.py ./quantized --outfile output_file.gguf --outtype auto

#llama.cpp options


### OR download GGUF File

In [None]:
# Make sure you have git-lfs installed (https://git-lfs.com)
# !git lfs install
# !git clone https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-GGUF
# !git clone https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-GGUF

# via linux/mac download
#!wget https://huggingface.co/unsloth/Llama-3.3-70B-Instruct-GGUF/resolve/main/Llama-3.3-70B-Instruct-Q2_K.gguf?download=true -O Llama-3.3-70B-Instruct-Q2_K.gguf

# via window download
# the following is to download quantized (4bits) llama3.2-11bill from leafspark repo in gguf format
!curl -L -O "https://huggingface.co/leafspark/Llama-3.2-11B-Vision-Instruct-GGUF/resolve/main/Llama-3.2-11B-Vision-Instruct.Q4_K_M.gguf"

### Create Modelfile

- Check existing Modelfile for specific LLM
    > Check install model in Ollama
    <small>
    ```
    ollama ls
    ```
    </small>

    > Check Modelfile
    <small>
    ```
    ollama show --modelfile modelname
    ```
    </small>
- Example of Ollama Modelfile structure

    > Specify the base model
    <small>
    ```
    FROM llama2
    ```
    </small>
    
    > Configure model parameters
    <small>
    ```
    PARAMETER temperature 0.7
    PARAMETER top_k 40
    PARAMETER top_p 0.9
    ```
    </small>
    
    > Define the template for input prompts
    <small>
    ```
    TEMPLATE """
    USER: {{.Prompt}}
    ASSISTANT: Let me help you with that.
    """
    ```
    </small>
    
    > Set the system message that defines the AI's behavior
    <small>
    ```
    SYSTEM """
    You are a helpful and knowledgeable assistant who specializes in explaining technical concepts clearly and concisely. Please provide accurate and practical information while maintaining a professional tone.
    """
    ```
    </small>
- Example of Modelfile (paste the following code in a non extension file name Modelfile)
    > Llama-3.2
    <small>
    ```
    # Modelfile
    FROM "Llama-3.2-11B-Vision-Instruct.Q4_K_M.gguf" # This specifies that the model is based on Meta's LLaMA 3 70B model (quantized version)
    TEMPLATE """<|start_header_id|>system<|end_header_id|>

    Cutting Knowledge Date: December 2023

    {{ if .System }}{{ .System }}
    {{- end }}
    {{- if .Tools }}When you receive a tool call response, use the output to format an answer to the orginal user question.

    You are a helpful assistant with tool calling capabilities.
    {{- end }}<|eot_id|>
    {{- range $i, $_ := .Messages }}
    {{- $last := eq (len (slice $.Messages $i)) 1 }}
    {{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
    {{- if and $.Tools $last }}

    Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

    Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

    {{ range $.Tools }}
    {{- . }}
    {{ end }}
    {{ .Content }}<|eot_id|>
    {{- else }}

    {{ .Content }}<|eot_id|>
    {{- end }}{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

    {{ end }}
    {{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
    {{- if .ToolCalls }}
    {{ range .ToolCalls }}
    {"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
    {{- else }}

    {{ .Content }}
    {{- end }}{{ if not $last }}<|eot_id|>{{ end }}
    {{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

    {{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

    {{ end }}
    {{- end }}
    {{- end }}"""
    PARAMETER stop <|start_header_id|>
    PARAMETER stop <|end_header_id|>
    PARAMETER stop <|eot_id|>
    ```
    </small>

### Use Ollama Direct

In [None]:
!ollama create llama3.2-q4 -f Modelfile

'''
Supported Quantizations
q4_0
q4_1
q5_0
q5_1
q8_0

K-means Quantizations
q3_K_S
q3_K_M
q3_K_L
q4_K_S
q4_K_M
q5_K_S
q5_K_M
q6_K
'''