# Llama CPP Python: Run LLMs on Local Machine

## Table of Contents

1. **Generate Text**
2. **Stream Response**
3. **Pulling Model from HuggingFace**
4. **Chat Completion**
5. **Generate Embeddings**

## Installation


* **pip install llama-cpp-python**

In [1]:
llm_model = "/home/user/rag_llm/models/qwen2_500m/qwen2-0_5b-instruct-q5_k_m.gguf"
huggingface_repo = "Qwen/Qwen2-0.5B-Instruct-GGUF"
huggingface_model = "qwen2-0_5b-instruct-q5_k_m.gguf"

In [2]:
import llama_cpp

llama_cpp.__version__

'0.2.77'

## 1. Generate Text

* **TheBloke/Llama-2-7B-Chat-GGUF**

In [3]:
from llama_cpp import Llama

llm = Llama(model_path=llm_model, verbose=False)

llm

<llama_cpp.llama.Llama at 0x7f640d912b60>

In [4]:
resp = llm("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"])

In [5]:
resp

{'id': 'cmpl-bbe7d601-840d-4dc1-9ef5-557d4bdba716',
 'object': 'text_completion',
 'created': 1721190602,
 'model': '/home/user/rag_llm/models/qwen2_500m/qwen2-0_5b-instruct-q5_k_m.gguf',
 'choices': [{'text': " In 2012, when he was just 34 years old, Elon Musk started his business career by creating SpaceX. Initially, it was only a private startup company but in the next few years, it became one of the most successful companies in history. The company's first rocket successfully landed on Mars, and it has since built numerous satellites for companies around the world, including Amazon and Tesla.",
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 13, 'completion_tokens': 82, 'total_tokens': 95}}

In [6]:
resp["choices"][0]["text"]

" In 2012, when he was just 34 years old, Elon Musk started his business career by creating SpaceX. Initially, it was only a private startup company but in the next few years, it became one of the most successful companies in history. The company's first rocket successfully landed on Mars, and it has since built numerous satellites for companies around the world, including Amazon and Tesla."

## 2. Streaming Response

In [7]:
resp = llm("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"],
           stream=True)

resp

<generator object Llama._create_completion at 0x7f640d90c3c0>

In [8]:
for r in resp:
    print(r["choices"][0]["text"], end="")

 In the world of business, Elon Musk is an entrepreneur and inventor who has revolutionized the industry with his innovative ideas and technological advancements. Born on January 7, 1953, in San Francisco, California, Musk is widely recognized for his contributions to electric cars, space exploration, and cryptocurrencies. Through his companies such as Tesla, SpaceX, Neuralink, and The Boring Company, he has been pushing the boundaries of technology and transforming our world with his relentless pursuit of innovation and progress.

## 3. Pulling Model from HuggingFace

* **pip install huggingface-hub**

By default from_pretrained will download the model to the **huggingface cache directory**, you can then manage installed model files with the **huggingface-cli tool**.

* **TheBloke/Mistral-7B-v0.1-GGUF**
* **TheBloke/Mistral-7B-Instruct-v0.2-GGUF**

In [9]:
from llama_cpp import Llama

mistral = Llama.from_pretrained(repo_id=huggingface_repo,
                                filename=huggingface_model,
                                verbose=False
                               ) 

resp = mistral("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"])

In [10]:
resp

{'id': 'cmpl-2459007e-09a6-4dcb-801e-0a60f7ce184b',
 'object': 'text_completion',
 'created': 1721190606,
 'model': '/home/user/.cache/huggingface/hub/models--Qwen--Qwen2-0.5B-Instruct-GGUF/snapshots/879a58402bc74bb2f48793a0d3ebab3a8a38463a/./qwen2-0_5b-instruct-q5_k_m.gguf',
 'choices': [{'text': ' Elon Musk is an American entrepreneur and inventor who founded SpaceX, which develops rockets and spacecraft for space exploration; Tesla, Inc., which produces electric cars and solar panels; Neuralink, which develops brain-computer interfaces that enable the transfer of electrical signals between humans and machines; The Boring Company, an architectural firm that designs tunnels for underground transportation systems; and PayPal, a multinational financial technology company. Musk is also known for his philanthropic work and has donated millions to various causes, including space exploration, renewable energy, and education.',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}]

## 4. Chat Completion

In [11]:
from llama_cpp import Llama

llama2_chat = Llama(model_path=llm_model, verbose=False) 

resp = llama2_chat.create_chat_completion(
      messages = [
          {"role": "system", "content": "You are an assistant who perfectly describes individuals."},
          {
              "role": "user",
              "content": "Write a short paragraph introducing Elon Musk."
          }
      ]
)

resp

{'id': 'chatcmpl-cecc69e9-e432-459c-be31-0521e0068ed7',
 'object': 'chat.completion',
 'created': 1721190610,
 'model': '/home/user/rag_llm/models/qwen2_500m/qwen2-0_5b-instruct-q5_k_m.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': "Elon Musk is a visionary entrepreneur, inventor, and CEO of Tesla, SpaceX, Neuralink, and The Boring Company. He co-founded PayPal with Jack Ma in 1994 and has since become one of the world's most successful entrepreneurs, known for his innovative ideas that have transformed industries such as transportation, energy, and entertainment. His passion for technology and innovation is evident in his work at Tesla, where he has led the company to become a leader in electric vehicles and space exploration."},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 30, 'completion_tokens': 101, 'total_tokens': 131}}

In [12]:
resp["choices"][0]["message"]

{'role': 'assistant',
 'content': "Elon Musk is a visionary entrepreneur, inventor, and CEO of Tesla, SpaceX, Neuralink, and The Boring Company. He co-founded PayPal with Jack Ma in 1994 and has since become one of the world's most successful entrepreneurs, known for his innovative ideas that have transformed industries such as transportation, energy, and entertainment. His passion for technology and innovation is evident in his work at Tesla, where he has led the company to become a leader in electric vehicles and space exploration."}

In [13]:
new_resp = llama2_chat.create_chat_completion(
      messages = [
          {"role": "system", "content": "You are an assistant who perfectly describes individuals."},
          {
              "role": "user",
              "content": "Write a short paragraph introducing Elon Musk."
          },
          resp["choices"][0]["message"],
          {
              "role": "user",
              "content": "Can you please rephrase your previous response?"
          }
      ]
)

new_resp

{'id': 'chatcmpl-77f6f32f-23dd-4c93-9061-569a96d7c0c6',
 'object': 'chat.completion',
 'created': 1721190612,
 'model': '/home/user/rag_llm/models/qwen2_500m/qwen2-0_5b-instruct-q5_k_m.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': 'Sure! Elon Musk is an entrepreneur who co-founded PayPal with Jack Ma in 1994. He is also known for his work at Tesla, where he has led the company to become a leader in electric vehicles and space exploration. His passion for technology and innovation is evident in his work at Tesla, which includes developing electric cars, solar panels, and other innovative technologies.'},
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 150, 'completion_tokens': 76, 'total_tokens': 226}}

In [14]:
new_resp["choices"][0]["message"]

{'role': 'assistant',
 'content': 'Sure! Elon Musk is an entrepreneur who co-founded PayPal with Jack Ma in 1994. He is also known for his work at Tesla, where he has led the company to become a leader in electric vehicles and space exploration. His passion for technology and innovation is evident in his work at Tesla, which includes developing electric cars, solar panels, and other innovative technologies.'}

## 5. Generate Embeddings

In [15]:
from llama_cpp import Llama

llama2_chat = Llama(model_path=llm_model, embedding=True, verbose=False)

embeddings = llama2_chat.create_embedding("Hello, world!")

embeddings

{'object': 'list',
 'data': [{'object': 'embedding',
   'embedding': [[1.1826443672180176,
     3.9176673889160156,
     -1.0761313438415527,
     -4.430361270904541,
     -2.239896297454834,
     6.736000061035156,
     1.6564898490905762,
     7.462552547454834,
     -5.367593765258789,
     -9.032063484191895,
     5.49091911315918,
     2.003977060317993,
     -6.431082725524902,
     1.6868746280670166,
     6.303713798522949,
     -1.8035049438476562,
     -11.924675941467285,
     -2.7077205181121826,
     -6.281509876251221,
     -1.788514256477356,
     -11.468496322631836,
     -9.651228904724121,
     -0.8674558401107788,
     -20.6488037109375,
     -3.1382431983947754,
     2.7511227130889893,
     -0.8014938235282898,
     11.905829429626465,
     5.377310276031494,
     0.046454690396785736,
     -0.07935859262943268,
     -3.314749240875244,
     1.216481328010559,
     3.964395046234131,
     2.5879321098327637,
     5.297309875488281,
     2.8436660766601562,
     -3.

In [16]:
len(embeddings["data"][0]["embedding"])

4

In [17]:
embeddings = llama2_chat.create_embedding(["Hello, world!", "Goodbye, world!"])

len(embeddings["data"]), len(embeddings["data"][0]["embedding"]), len(embeddings["data"][1]["embedding"])

(2, 4, 5)

## Summary

In this video, I explained how you can access **open source LLMs** on **Local Machine** using Python library **llama-cpp-python**. Its a wrapper around **llama.cpp** library.