# Exercise: Setting Up and Running LLaMA with LangChain

This notebook guides you through setting up and running a LLaMA-based language model using `llama-cpp-python`, `huggingface_hub`, and `langchain`.
It follows these steps:

1. **Install Dependencies**: Ensures all required packages are installed.
2. **Load Models**: Downloads an LLM model from Hugging Face.
3. **Set Up the LLM Pipeline**: Setting up the language model and defining a simple chat interaction.

Follow the instructions in the code cells and ensure all dependencies are installed correctly before proceeding.


In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Install required dependencies
!pip3 install llama-cpp-python==0.3.4 --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
!pip3 install huggingface_hub==0.28.0
!pip3 install langchain==0.3.17 langchain-core==0.3.33 langchain-community==0.3.14

In [None]:
# Import necessary libraries for downloading models and setting up the chat system
from huggingface_hub import hf_hub_download
from langchain_community.chat_models import ChatLlamaCpp
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [None]:
# Download the LLM, you can search in Hugging Face
model_path = hf_hub_download(
    repo_id="Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF",
    filename="qwen2.5-coder-0.5b-instruct-q4_k_m.gguf",
    force_download=False,
)

In [None]:
# Create the LLM
llm = ChatLlamaCpp(
    model_path=model_path,
    stop=["<|im_end|>\n"],
    n_ctx=2048,
    max_tokens=2048,
    streaming=True,
    n_batch=8,
)

In [None]:
# Create the prompt
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage("You are an AI assistant that answer questions briefly."),
        HumanMessagePromptTemplate.from_template("Tell me a joke about {topic}"),
    ]
)

In [None]:
# Create the chain
chain = prompt | llm | StrOutputParser()

In [None]:
# Call the chain
chain.invoke({"topic": "ice cream"})

In [None]:
# Now try the stream mode
for s in chain.stream({"topic": "ice cream"}):
  print(s, flush=True, end="")