# Use llamafile with External Weights

The purpose of this notebook is to show how to use the `llamafile` software without any weights included. In this scenario the model weights from [The Bloke - Tim Robbins](https://huggingface.co/TheBloke) are saved on disked in [gguf](https://www.secondstate.io/articles/convert-pytorch-to-gguf/#:~:text=The%20GGUF%20format%20is%20specifically,easier%20to%20use%20than%20PyTorch.) format and passed into the llamafile software.

The example in this notebook will be with [TheBloke/Llama-2-7B-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF).

In [1]:
from pathlib import Path
import os
# Path model weights are being stored
path_base_model = Path(os.environ['MODEL_DIRECTORY'])
model = 'Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf'
path_model = str(path_base_model / model)

# Project base directory
path_project = Path().cwd().parents[0]

# Call a GGUF Model using llamafile-main

The model will be executed on CPU only.

In [2]:
# Custom command
cmd = (
    f"{str(path_project / 'llamafile-assets/llamafile-main-0.2.1')} "
    f"--model {path_model} "
    f"--prompt 'Give me three interesting bullet point facts about the moon.' "
    f"> output-llama2-results.txt"
)

# Execute the command in the terminal
response = os.system(cmd)

protip: pass the --n-gpu-layers N flag to link NVIDIA cuBLAS support
Log start
main: build = 1500 (a30b324)
main: built with cosmocc (GCC) 11.2.0 for x86_64-linux-cosmo
main: seed  = 1701607441
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /nvme4tb/Projects/llm_models/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q4_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,   

- prompt eval time is the number of tokens **in the prompt**
- eval time is the number of tokens **generated by the model**

In [3]:
# View the results
!cat output-llama2-results.txt

Give me three interesting bullet point facts about the moon.

1. The Moon has no atmosphere, which means that there is no wind or weather on the Moon. This makes it an ideal place for spacecraft to land and explore without worrying about turbulence or changes in temperature.
2. The Moon is moving away from Earth at a rate of about 3.8 centimeters (1.5 inches) per year. This means that if you were to travel back in time to the moment when the Moon formed, it would be much closer to Earth than it is today.
3. The far side of the Moon, sometimes called the "dark side," is actually not dark all the time. In fact, there are periods of daylight and darkness just like on Earth, but the phases are opposite to those on Earth because the Moon takes about 27.3 days to complete one rotation on its axis. This means that the far side of the Moon experiences a "day" every 14 days, followed by a "night" that lasts for 14 days.