# Llama CPP Python: Run LLMs on Local Machine

## Table of Contents

1. **Generate Text**
2. **Stream Response**
3. **Pulling Model from HuggingFace**
4. **Chat Completion**
5. **Generate Embeddings**

## Installation


* **pip install llama-cpp-python**

In [1]:
import llama_cpp

llama_cpp.__version__

'0.2.60'

## 1. Generate Text

* **TheBloke/Llama-2-7B-Chat-GGUF**

In [10]:
from llama_cpp import Llama

llm = Llama(model_path="./llama-2-7b-chat.Q4_K_S.gguf", verbose=False)

llm

<llama_cpp.llama.Llama at 0x7f1cea183700>

In [14]:
resp = llm("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"])

Llama.generate: prefix-match hit

llama_print_timings:        load time =    1475.44 ms
llama_print_timings:      sample time =      66.06 ms /   163 runs   (    0.41 ms per token,  2467.38 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   31345.05 ms /   163 runs   (  192.30 ms per token,     5.20 tokens per second)
llama_print_timings:       total time =   31706.84 ms /   164 tokens


In [15]:
resp

{'id': 'cmpl-36d2bf79-1e4a-44c4-bb80-6190f7a38113',
 'object': 'text_completion',
 'created': 1712551672,
 'model': './llama-2-7b-chat.Q4_K_S.gguf',
 'choices': [{'text': ' Elon Musk is a South African-born entrepreneur, inventor, and business magnate known for his innovative ideas and ambitious ventures in various industries. He co-founded PayPal, which was later sold to eBay for $1.5 billion, and went on to found SpaceX and Tesla, revolutionizing the aerospace and electric vehicle sectors respectively. Musk has been instrumental in shaping the future of transportation, energy, and space exploration through his visionary leadership and technological advancements. With a unique ability to identify emerging trends and capitalize on them, he continues to inspire and influence the global business landscape as a pioneer of cutting-edge technologies and sustainable solutions.',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 17, 'completion_toke

In [16]:
resp["choices"][0]["text"]

' Elon Musk is a South African-born entrepreneur, inventor, and business magnate known for his innovative ideas and ambitious ventures in various industries. He co-founded PayPal, which was later sold to eBay for $1.5 billion, and went on to found SpaceX and Tesla, revolutionizing the aerospace and electric vehicle sectors respectively. Musk has been instrumental in shaping the future of transportation, energy, and space exploration through his visionary leadership and technological advancements. With a unique ability to identify emerging trends and capitalize on them, he continues to inspire and influence the global business landscape as a pioneer of cutting-edge technologies and sustainable solutions.'

## 2. Streaming Response

In [26]:
resp = llm("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"],
           stream=True)

resp

<generator object Llama._create_completion at 0x7f82b43a3e60>

In [27]:
for r in resp:
    print(r["choices"][0]["text"], end="")

 Sure, here is a short paragraph introducing Elon Musk:  Elon Musk is a South African-born entrepreneur and business magnate known for his innovative ideas and cutting-edge technology ventures. He is best known as the CEO of SpaceX and Tesla, Inc., where he has revolutionized the electric car industry and made space travel more accessible than ever before. With a net worth of over $200 billion, Musk has become one of the richest people in the world, and his vision for the future includes establishing a human settlement on Mars and developing sustainable energy solutions to combat climate change.

## 3. Pulling Model from HuggingFace

* **pip install huggingface-hub**

By default from_pretrained will download the model to the **huggingface cache directory**, you can then manage installed model files with the **huggingface-cli tool**.

* **TheBloke/Mistral-7B-v0.1-GGUF**
* **TheBloke/Mistral-7B-Instruct-v0.2-GGUF**

In [28]:
from llama_cpp import Llama

mistral = Llama.from_pretrained(repo_id="TheBloke/Mistral-7B-v0.1-GGUF",
                                filename="*Q4_K_S.gguf",
                                verbose=False
                               ) 

resp = mistral("Q: Write a short paragraph introducing Elon Musk. A: ",
           max_tokens=256,
           stop=["Q:", "\n"])

mistral-7b-v0.1.Q4_K_S.gguf:   0%|          | 0.00/4.14G [00:00<?, ?B/s]

In [31]:
resp

{'id': 'cmpl-bf878ce0-865c-4bb5-a985-083dae3da9a6',
 'object': 'text_completion',
 'created': 1712553512,
 'model': '/home/sunny/.cache/huggingface/hub/models--TheBloke--Mistral-7B-v0.1-GGUF/snapshots/d4ae605152c8de0d6570cf624c083fa57dd0d551/./mistral-7b-v0.1.Q4_K_S.gguf',
 'choices': [{'text': '38 years old, Elon Musk is the founder and CEO of two leading high-tech companies in Silicon Valley; Tesla Motors and SpaceX. He has developed an electric sports car called TESLA ROADSTER which can reach speed upto 250 km per hour and accelerate from 0 to 100 km per hour within four seconds. In order to reach this speed it takes only 19 seconds for Tesla Roadster to go from 0 to 160 km per hour. Its top speed is about 370 km per hour.',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 16, 'completion_tokens': 125, 'total_tokens': 141}}

## 4. Chat Completion

In [32]:
from llama_cpp import Llama

llama2_chat = Llama(model_path="./llama-2-7b-chat.Q4_K_S.gguf", verbose=False) 

resp = llama2_chat.create_chat_completion(
      messages = [
          {"role": "system", "content": "You are an assistant who perfectly describes individuals."},
          {
              "role": "user",
              "content": "Write a short paragraph introducing Elon Musk."
          }
      ]
)

resp

{'id': 'chatcmpl-3ebfd942-727b-4522-b274-7da696cdbb8f',
 'object': 'chat.completion',
 'created': 1712553589,
 'model': './llama-2-7b-chat.Q4_K_S.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': "  Ah, the visionary entrepreneur and innovator, Elon Musk! This man is truly a force to be reckoned with. With his piercing blue eyes and unruly mop of hair, he exudes an unmistakable air of ambition and drive. His sharp jawline and chiseled features give him a distinctly angular, futuristic look, as if he's ready to take on the world at any moment. But it's not just his physical appearance that sets him apart - it's his unwavering commitment to pushing the boundaries of technology and innovation. From revolutionizing the electric car industry with Tesla to making humanity a multiplanetary species with SpaceX, Elon Musk is a true pioneer in every sense of the word. With his unrelenting drive and unparalleled vision, he's changing the world one groundbreaking 

In [38]:
resp["choices"][0]["message"]

{'role': 'assistant',
 'content': "  Ah, the visionary entrepreneur and innovator, Elon Musk! This man is truly a force to be reckoned with. With his piercing blue eyes and unruly mop of hair, he exudes an unmistakable air of ambition and drive. His sharp jawline and chiseled features give him a distinctly angular, futuristic look, as if he's ready to take on the world at any moment. But it's not just his physical appearance that sets him apart - it's his unwavering commitment to pushing the boundaries of technology and innovation. From revolutionizing the electric car industry with Tesla to making humanity a multiplanetary species with SpaceX, Elon Musk is a true pioneer in every sense of the word. With his unrelenting drive and unparalleled vision, he's changing the world one groundbreaking idea at a time."}

In [39]:
new_resp = llama2_chat.create_chat_completion(
      messages = [
          {"role": "system", "content": "You are an assistant who perfectly describes individuals."},
          {
              "role": "user",
              "content": "Write a short paragraph introducing Elon Musk."
          },
          resp["choices"][0]["message"],
          {
              "role": "user",
              "content": "Can you please rephrase your previous response?"
          }
      ]
)

new_resp

{'id': 'chatcmpl-1d13f224-51c3-4b94-b094-d9ea9dbefbf6',
 'object': 'chat.completion',
 'created': 1712553725,
 'model': './llama-2-7b-chat.Q4_K_S.gguf',
 'choices': [{'index': 0,
   'message': {'role': 'assistant',
    'content': "  Of course! Here's a rephrased version of my previous response:\nElon Musk is a true pioneer in every sense of the word. With his piercing blue eyes and unruly mop of hair, he exudes an unmistakable air of ambition and drive. His sharp jawline and chiseled features give him a distinctly angular, futuristic look, as if he's ready to take on the world at any moment. But it's not just his physical appearance that sets him apart - it's his unwavering commitment to pushing the boundaries of technology and innovation. From revolutionizing the electric car industry with Tesla to making humanity a multiplanetary species with SpaceX, Elon Musk is changing the world one groundbreaking idea at a time. With his unrelenting drive and unparalleled vision, he's redefining 

In [41]:
new_resp["choices"][0]["message"]

{'role': 'assistant',
 'content': "  Of course! Here's a rephrased version of my previous response:\nElon Musk is a true pioneer in every sense of the word. With his piercing blue eyes and unruly mop of hair, he exudes an unmistakable air of ambition and drive. His sharp jawline and chiseled features give him a distinctly angular, futuristic look, as if he's ready to take on the world at any moment. But it's not just his physical appearance that sets him apart - it's his unwavering commitment to pushing the boundaries of technology and innovation. From revolutionizing the electric car industry with Tesla to making humanity a multiplanetary species with SpaceX, Elon Musk is changing the world one groundbreaking idea at a time. With his unrelenting drive and unparalleled vision, he's redefining what's possible and inspiring a new generation of thinkers and doers to join him on his mission to shape the future of humanity."}

## 5. Generate Embeddings

In [4]:
from llama_cpp import Llama

llama2_chat = Llama(model_path="./llama-2-7b-chat.Q4_K_S.gguf", embedding=True, verbose=False)

embeddings = llama2_chat.create_embedding("Hello, world!")

embeddings

{'object': 'list',
 'data': [{'object': 'embedding',
   'embedding': [0.0031888174513418246,
    -0.005338107164289213,
    0.006393079045987913,
    0.0028734011445135433,
    -0.0261840098253423,
    -0.002558798244006564,
    -0.004504494117532277,
    0.02568239549391057,
    0.004633687720556026,
    0.005226306354140698,
    0.018378643351279995,
    -0.008275122978843091,
    0.009642914311027886,
    0.004555599202739593,
    0.006815992412490704,
    -0.037152190183151614,
    0.006974731048464214,
    0.015619586233769092,
    -0.016444021639667357,
    -0.00929401890513487,
    0.007328351262221089,
    -0.024261604821921294,
    -8.593770388037735e-06,
    -0.014986538535901333,
    0.01714968562025499,
    0.00020696788598714944,
    -0.009076134800973697,
    -0.01066185304817868,
    -0.005484406475604208,
    -0.0017243357794410365,
    0.005974833574089348,
    -0.0005566141345380172,
    -0.01825016242314457,
    0.032526904975077486,
    0.012484579262095804,
    -0.

In [6]:
len(embeddings["data"][0]["embedding"])

4096

In [9]:
embeddings = llama2_chat.create_embedding(["Hello, world!", "Goodbye, world!"])

len(embeddings["data"]), len(embeddings["data"][0]["embedding"]), len(embeddings["data"][1]["embedding"])

(2, 4096, 4096)

## Summary

In this video, I explained how you can access **open source LLMs** on **Local Machine** using Python library **llama-cpp-python**. Its a wrapper around **llama.cpp** library.