# üìò Phi-3 Instruct ‚Äî Chat Completion Demo  
A clean, beginner-friendly, line-by-line explanation.

---

## üß© Overview  
In this notebook, we will:  
- Load the **Phi-3 Mini Instruct** model  
- Build a reusable chat function  
- Understand each step with clear explanations  

---

## üõ†Ô∏è Install & Import Libraries

```python


In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
def load_phi3_instruct(model_id: str = "microsoft/Phi-3-mini-4k-instruct"):
    """
    Load the Phi-3 instruct model and its tokenizer.
    Returns (tokenizer, model) placed on an appropriate device.
    """
    # Detect GPU or use CPU as fallback
    device = "cuda" if torch.cuda.is_available() else "cpu"

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_id)

    # Load the model and move it to device
    model = AutoModelForCausalLM.from_pretrained(
        model_id,
        torch_dtype="auto",
        trust_remote_code=False,
    ).to(device)

    return tokenizer, model, device

In [None]:
def chat_with_model(tokenizer, model, device, chat_history, max_new_tokens: int = 128):
    """
    Given a list of chat messages, run one completion and return the model's reply text.
    """
    # Convert structured messages into a chat-formatted text prompt
    prompt_text = tokenizer.apply_chat_template(
        chat_history,
        tokenize=False,
        add_generation_prompt=True,
    )

    # Tokenize prompt and move to GPU/CPU
    inputs = tokenizer(prompt_text, return_tensors="pt").to(device)

    # Generate the model‚Äôs continuation
    output_ids = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=False,
    )

    # Extract only the newly generated tokens
    generated_ids = output_ids[0, inputs["input_ids"].shape[1]:]
    reply = tokenizer.decode(generated_ids, skip_special_tokens=True)

    return reply.strip()

In [None]:
if __name__ == "__main__":
    # Step 1: Load Phi-3 model and tokenizer
    tokenizer, model, device = load_phi3_instruct()

    # Step 2: Chat-style messages
    conversation = [
        {"role": "user", "content": "What is Generative AI."}
    ]

    # Step 3: Generate model response
    answer = chat_with_model(tokenizer, model, device, conversation, max_new_tokens=100)

    # Step 4: Print output
    print(answer)