# How to Run Llama 2 Locally with Python (Quickstart)

This Jupyter Notebook is part of a Blog Post on https://swharden.com

https://swharden.com/blog/2023-07-29-ai-chat-locally-with-python/

In [2]:
from llama_cpp import Llama

from IPython.display import display, HTML
import json
import time
import pathlib

Load two different models so we can compare their responses to the same prompt.

Note that `n_ctx` is the maximum number of context tokens, and increasing this value increases the maximum length of the responses.

In [3]:
MODEL_Q8_0 = Llama(
    model_path="../models/llama-2-7b-chat.ggmlv3.q8_0.bin",
    n_ctx=2048)

MODEL_Q2_K = Llama(
    model_path="../models/llama-2-7b-chat.ggmlv3.q2_K.bin",
    n_ctx=2048)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [4]:
def query(model, question):
    model_name = pathlib.Path(model.model_path).name
    time_start = time.time()
    prompt = f"Q: {question} A:"
    output = model(prompt=prompt, max_tokens=0) # if max tokens is zero, depends on n_ctx
    response = output["choices"][0]["text"]
    time_elapsed = time.time() - time_start
    display(HTML(f'<code>{model_name} response time: {time_elapsed:.02f} sec</code>'))
    display(HTML(f'<strong>Question:</strong> {question}'))
    display(HTML(f'<strong>Answer:</strong> {response}'))
    print(json.dumps(output, indent=2))

In [5]:
query(MODEL_Q2_K, "Why are Jupyter notebooks difficult to maintain?")

{
  "id": "cmpl-1206f5b9-fe09-47bc-9b13-9afa63230e98",
  "object": "text_completion",
  "created": 1690691967,
  "model": "../models/llama-2-7b-chat.ggmlv3.q2_K.bin",
  "choices": [
    {
      "text": " Jupyter notebooks can be challenging to maintain for several reasons:\n\n1. Temporary file system: Jupyter notebooks reside on the user's local disk, which can make them unreliable and difficult to maintain over time.\n2. Version control issues: As notebooks are updated frequently, it becomes challenging to keep track of changes and manage different versions of a notebook. This can lead to conflicts when working with others on the same project.\n3. Lack of organization: Since Jupyter notebooks are created manually, it's easy to lose track of which files are included in the notebook and how they relate to each other. This makes it challenging to maintain a well-organized structure for large projects with many components.\n4. Limited automation: There are limited tools available for auto

In [6]:
query(MODEL_Q8_0, "Why are Jupyter notebooks difficult to maintain?")

{
  "id": "cmpl-33c4dbf2-4d57-4739-83bd-bbcf2b852e60",
  "object": "text_completion",
  "created": 1690692046,
  "model": "../models/llama-2-7b-chat.ggmlv3.q8_0.bin",
  "choices": [
    {
      "text": " Jupyter notebooks can be challenging to maintain for several reasons. Here are some of the common issues and their solutions: 1. Version Control: Jupyter notebooks are created using Python code, which is saved as a file on disk. However, tracking changes to the code and collaborating with others can be difficult without proper version control. Solution: Use a version control system like Git or Mercurial to manage your notebooks. 2. Saving Notebooks: Jupyter notebooks are often created in an ad-hoc manner, which can make it hard to save them properly. Solution: Create a naming convention for your notebooks and use a tool like iPython Notebook to create backup copies of your notebooks. 3. Organizing Notebooks: As your collection of Jupyter notebooks grows, it can be challenging to organi