## What is Hugging Face and the Hugging Face Hub ?

Hugging Face is an open-source-first AI company that curates Transformers, Diffusers, and other widely used libraries.
The Hub (https://huggingface.co/) is a **GitHub for models & datasets**:
- Free hosting for models, datasets, Spaces (demo apps).
- Built-in versioning, README rendering, and model cards.
- REST / Python SDK (huggingface_hub) for push-pull, inference endpoints, and gated access control.

In practice you clone/push a repo exactly like Git, but via hf_hub_download() or model.push_to_hub().

In [None]:
!pip install -q huggingface_hub


## What are providers on HF Inference Client and what is novita?

The InferenceClient can route a request to different providers (back-ends):
- huggingface (default): the standard Inference API hosted by HF.
- novita (a third-party back-end): offers chat-optimized latency/pricing; API surface matches OpenAI-style /chat/completions.

You switch simply by

```python
client = InferenceClient(provider="novita", api_key=HF_TOKEN)
```
so the same client code can talk to multiple vendors.

In [None]:
import os, getpass
from huggingface_hub import InferenceClient

# Inserisci il tuo token Hugging Face personale
os.environ["HF_TOKEN"] = getpass.getpass("👉  Inserisci il tuo HF token: ")

# Usa provider "novita" per il modello Mistral chat
client = InferenceClient(provider="novita", api_key=os.environ["HF_TOKEN"])

## Model spotlight - mistralai/Mistral-7B-Instruct-v0.3
| -                  | Pros                                                                             | Cons                                                                        |
| ------------------ | -------------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
| **Quality**        | Very strong chat & code for a 7 B model; outperforms LLaMA-7B/13B in many evals. | Still below GPT-4 / Claude-Opus tier on reasoning & long-context.           |
| **Licence**        | Apache-2.0 – commercial use allowed, no policy hoops.                            | Weight access sometimes gated; you need HF/token or local download.         |
| **Speed / Memory** | Fits into 14 GB GPU (8-bit) → cheap to host; responds \~50-70 tok/s on T4.       | Context window 8 K; cannot process very long docs like GPT-4o-128K.         |
| **LoRA**           | Adapter \~120 MB; fine-tune on a Colab T4 in <30 min.                            | PEFT on Mistral needs to target correct linear layers (`q_proj`, `v_proj`). |


## Prompt Engineering - Zero-shot

Ask the model to perform a task without examples: “Translate the sentence to Italian: 'I love coffee'”.

Zero-shot works because large LMs have seen countless instructions during pre-training. It's the fastest way to test a capability.

In [None]:
prompt = "A customer is requesting a refund because a product was delivered late. Write a professional reply."
completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {"role": "user", "content": prompt}
    ],
)
print(completion.choices[0].message["content"])

## Prompt Engineering - Few-shot

Provide 1-3 exemples so the model infers pattern, tone, structure.

Example:  
Q1: Formal -> Casual  
A1: Please contact me. -> Hit me up!  

The next transformation aligns with the examples far better than zero-shot.

In [None]:
prompt = '''Example 1:
Customer: I received my order late. I'd like a refund.
Reply: We're sorry for the delay. Your refund will be processed within 3 business days.

Example 2:
Customer: My package arrived after the estimated delivery date. I want a refund.
Reply: We apologize for the inconvenience. The refund has been approved and will be issued shortly.

Now write a reply to:
A customer is requesting a refund because a product was delivered late.
'''

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print(completion.choices[0].message["content"])

## Chain-of-Thought Prompting

Add “Think step by step” (or an explicit reasoning prompt).  
The model reveals intermediate reasoning, often raising factual accuracy and allowing you to inspect/error-correct the chain.

In [None]:
prompt = "A customer is requesting a refund because a product was delivered late. Think step by step and write a professional reply."

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print(completion.choices[0].message["content"])

## Role Prompting

Pre-frame the assistant persona:
"You are an empathetic customer-support agent…".  
Sets vocabulary, tone, even policy constraints without touching the model weights.

In [None]:
prompt = "You are a customer support agent. A customer is asking for a refund due to a late delivery. Write a helpful and empathetic response."

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print(completion.choices[0].message["content"])

## Prompt Patterns - Delimiters

Wrap user context in clear markers (""" or ###), e.g.

```txt
"""CONTEXT
long text …
"""
Summarize the context in one bullet.
```

Prevents the model from mixing instructions with payload; boosts reliability in multi-part prompts.

In [None]:
prompt = '''Write a professional reply with the following structure:
1. Greeting
2. Apology for the delay
3. Refund details
4. Closing remark

Customer: I received the product late and would like a refund.'''

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print(completion.choices[0].message["content"])

## LoRA-style Fine-tuning (simulation vs real)

Simulation: bake policy or domain hints inside the prompt ([POLICY] …).  
Real LoRA: train tiny adapter matrices (≪1 % params) -> insert at Q/V/(K,O) projections. Needs only minutes & single GPU; adapter file <100 MB.

In [None]:
# Prompt standard
prompt = "A customer is requesting a refund because a product was delivered late. Write a professional reply."

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print("Standard Prompt:\n", completion.choices[0].message["content"])

# Prompt 'personalizzato' (simulazione LoRA con istruzioni nel prompt)
prompt_custom_1 = '''[INTERNAL POLICY: Respond with empathy, offer refund without requiring customer to return the item, and apologize sincerely. Use concise and positive tone.]

A customer is requesting a refund because a product was delivered late. Write a professional reply.'''

completion_custom_1 = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt_custom_1}],
)
print("Prompt personalizzato:\n", completion_custom_1.choices[0].message["content"])

# Prompt 'personalizzato' (simulazione LoRA con istruzioni nel prompt)
prompt_custom_2 = '''[INTERNAL POLICY: Do not offer refunds, instead offer 15% discount code.]

A customer is requesting a refund because a product was delivered late. Write a professional reply.'''

completion_custom_2 = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt_custom_2}],
)
print("Prompt personalizzato:\n", completion_custom_2.choices[0].message["content"])

## Prompt for Structured Output (JSON)

Explicitly instruct format + give a skeleton:



```
Return ONLY valid JSON:
{"greeting":"", "apology":"", "resolution":"", "action_type":"", "needs_follow_up":false}
```

Model tends to obey if the request is precise and examples are consistent.

In [None]:
prompt = '''
You are a customer support assistant. A customer is requesting a refund because a product arrived late.

Return the response in JSON format with the following fields:
- greeting
- apology
- refund_policy
- closing

Example format:
{
  "greeting": "...",
  "apology": "...",
  "refund_policy": "...",
  "closing": "..."
}
'''

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[{"role": "user", "content": prompt}],
)
print(completion.choices[0].message["content"])