<a href="https://colab.research.google.com/github/jchen8000/DemystifyingLLMs/blob/main/6_Deployment/Chatbot_HuggingFace_Hosted_LLM_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6. Deployment of LLMs

# 6.11 Chatbot, Example of LLM-Powered Application

### **Simple Chatbot Using Hugging Face Hosted LLM (Without LangChain)**

This notebook demonstrates how to build a minimal chatbot using the **Hugging Face Inference Client** directly, without relying on LangChain or other orchestration frameworks.  
It shows how to authenticate securely, send chat-formatted messages to a hosted large language model, maintain conversation history, and produce responses using a fully remote, cloud‚Äëhosted LLM.  
The example uses the instruction‚Äëtuned **Qwen2.5‚Äë7B‚ÄëInstruct** model served through Hugging Face‚Äôs hosted inference API.

A **Hugging Face access token** is required to run this notebook.  
If you do not already have one, you can obtain it here:  
üîó https://huggingface.co/docs/hub/security-tokens  
The token should be stored securely as `HF_TOKEN` in **Colab Secrets** or another protected environment variable.

---

### ‚ö†Ô∏è **Compatibility Note**

This notebook is confirmed working as of **January 2026**, including the currently available `Qwen/Qwen2.5-7B-Instruct` model. However, Hugging Face client libraries, model endpoints, and hosted model availability evolve rapidly, and future updates or model deprecations may require adjustments to this notebook. This example serves as a learning-oriented reference illustrating current best practices, not a permanently stable API contract.

---

In [None]:
%pip install -q \
  huggingface_hub==0.36.0 \
  transformers==4.57.3

In [None]:
import os
from google.colab import userdata
from huggingface_hub import InferenceClient

os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get("HF_TOKEN")
client = InferenceClient()

In [None]:
MODEL_ID = "Qwen/Qwen2.5-7B-Instruct"

def ask_model(message, history, temperature=0.7, max_tokens=512):
    messages = [{"role": "system", "content": "You are a helpful AI assistant."}]
    
    for user_msg, bot_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": bot_msg})
    
    messages.append({"role": "user", "content": message})

    try:
        # We pass the model ID here instead of in the Client constructor
        # This allows the 'auto' provider system to work best
        response = client.chat.completions.create(
            model=MODEL_ID,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"System Busy: {str(e)}. Please try again in a moment."

In [None]:
def chatbot():
    print("Chatbot initialized. You can start chatting now (type 'quit' to stop)!\n")
    history = []

    while True:
        user_input = input("You: ")
        if user_input.lower() in ["quit", "exit"]:
            break

        answer = ask_model(user_input, history)
        print(f"\nAI: {answer}\n")
        history.append([user_input, answer])

chatbot()

Chatbot initialized. You can start chatting now (type 'quit' to stop)!

You: Hello
Chatbot: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. I'm here to help.

You: How are you?
Chatbot: I'm just a computer program, so I don't have feelings or emotions like humans do. I'm here to provide information and help you with your questions to the best of my ability. How can I assist you today?

You: What are the top 5 largest cities in Canada?
Chatbot: The top 5 largest cities in Canada by population (as of 2021) are:

1. Toronto (Toronto-Durham Region CMA) - 6,417,516
2. Montreal (CMA) - 4,340,395
3. Vancouver (CMA) - 2,642,811
4. Calgary (CMA) - 1,388,988
5. Ottawa-Gatineau (CMA) - 1,429,629 (split between Ontario and Quebec)

These population numbers are for the entire metropolitan areas, not just the city proper. The cities themselves have smaller populations.

You: What is the next largest city?
Chatbot: The next largest city in C