## Hands-on: Running a Simple Text-Generation Model via Hugging Face

In this exercise, you'll learn how to use Hugging Face Transformers to run a simple text-generation model using open-source LLMs such as Mistral, Llama3, DeepSeek, or Phi.

### Step 1: Setup and Installation

First, install the Hugging Face Transformers library and other necessary packages.

```bash
!pip install transformers torch
```

### Step 2: Import Necessary Libraries

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
```

### Step 3: Load the Model and Tokenizer

Choose your preferred model from Hugging Face. Here, we'll demonstrate using the Mistral model as an example.

```python
model_name = "mistralai/Mistral-7B-Instruct-v0.2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
```

### Step 4: Create a Text Generation Pipeline

Using Hugging Face pipelines simplifies the process.

```python
text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)
```

### Step 5: Generate Text

Input a prompt and generate text.

```python
prompt = "Explain the significance of Transformers in NLP:"

output = text_gen_pipeline(prompt, max_length=150, do_sample=True, temperature=0.7)

print(output[0]['generated_text'])
```

### Understanding the Parameters

- **`max_length`**: Controls the length of the generated text.
- **`do_sample`**: Enables sampling (introduces randomness).
- **`temperature`**: Controls the randomness; higher values generate more random text.

### Experimentation & Exercise:

Try the following experiments:
- Change the prompt and observe the outputs.
- Adjust the `temperature` and see how it affects randomness.
- Experiment with other models (e.g., `meta-llama/Meta-Llama-3-8B`, `deepseek-ai/deepseek-coder-6.7b-instruct`, `microsoft/phi-2`) to compare their outputs.

### Best Practices

- Use GPUs for faster inference.
- Consider quantization techniques if working with limited resources.
- Fine-tune models for specific tasks to improve accuracy.

---




In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.3.2-py3-none-any.whl (485 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.4/485.4 kB[0m [31m26.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading 

In [None]:
from datasets import load_dataset

ds = load_dataset("FinGPT/fingpt-sentiment-train")

README.md:   0%|          | 0.00/529 [00:00<?, ?B/s]

(…)-00000-of-00001-dabab110260ac909.parquet:   0%|          | 0.00/6.42M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/76772 [00:00<?, ? examples/s]

In [None]:
ds

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 76772
    })
})

In [None]:
for i in range(5):
  print("Input: ",ds['train']['input'][i])
  print("output: ",ds['train']['output'][i])
  print("Inst: ",ds['train']['instruction'][i])

Input:  Teollisuuden Voima Oyj , the Finnish utility known as TVO , said it shortlisted Mitsubishi Heavy s EU-APWR model along with reactors from Areva , Toshiba Corp. , GE Hitachi Nuclear Energy and Korea Hydro & Nuclear Power Co. .
output:  neutral
Inst:  What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}.
Input:  Sanofi poaches AstraZeneca scientist as new research head
output:  neutral
Inst:  What is the sentiment of this news? Please choose an answer from {negative/neutral/positive}.
Input:  Starbucks says the workers violated safety policies while workers said they'd never heard of the policy before and are alleging retaliation.
output:  moderately negative
Inst:  What is the sentiment of this news? Please choose an answer from {strong negative/moderately negative/mildly negative/neutral/mildly positive/moderately positive/strong positive}.
Input:  $brcm raises revenue forecast
output:  positive
Inst:  What is the sentiment of this tweet?

In [None]:
from google.colab import userdata
token = userdata.get('HF_Token')

In [None]:
#!pip install transformers torch huggingface_hub

from huggingface_hub import login

# Login to Hugging Face Hub (replace with your Hugging Face token)
login(token)

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
	provider="together",
	api_key=token
)

messages = [
	{
		"role": "user",
		"content": "write a joke about not learning AI?"
	}
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
	messages=messages,
	max_tokens=500,
)

print(completion.choices[0].message)

ChatCompletionOutputMessage(role='assistant', content='<think>\nOkay, the user wants a joke about not learning AI. Let me think about how to approach this. First, I need to understand the key elements: the joke should revolve around someone not learning AI, and it should be funny. Maybe play on common phrases or situations related to AI.\n\nHmm, AI is a big field with terms like algorithms, machine learning, neural networks. Maybe use a pun or a play on words. What\'s something people do when they avoid learning? Procrastinate, make excuses. Maybe the punchline could involve a humorous reason for not learning AI.\n\nWait, there\'s a classic joke structure where someone asks why something happened, and the punchline is a clever twist. For example, "Why didn\'t the AI...? Because it..." But since the topic is not learning AI, maybe invert that. The person didn\'t learn AI, and the reason is funny.\n\nWhat\'s a common excuse? Maybe they thought it was something else. Like confusing AI wit

In [None]:
# Installation
#!pip install transformers torch

# Import libraries
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model and tokenizer
model_name = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Create text-generation pipeline
text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text from prompt
prompt = "Explain the significance of Transformers in NLP:"
output = text_gen_pipeline(prompt, max_length=150, do_sample=True, temperature=0.7)

print(output[0]['generated_text'])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


OSError: Meta-Llama-3.1-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

In [None]:
# from transformers import pipeline
# question_answerer = pipeline("question-answering", model='distilbert-base-cased-distilled-squad', device = "cuda")

# context = r"""
# Extractive Question Answering is the task of extracting an answer from a text given a question. An example     of a
# question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
# a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
# """

# result = question_answerer(question="What is a good example of a question answering dataset?", inputs=context)
# print(result)


In [None]:
# Create text-generation pipeline
text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text from prompt
prompt = "Write a poem on AI:"
output = text_gen_pipeline(prompt, max_length=150, do_sample=True, temperature=0.7)

print(output[0]['generated_text'])

Device set to use cuda:0


Write a poem on AI: How it works, what it does, and how it can be dangerous.

Answer:

Step 1/4
1. What is Artificial Intelligence? Artificial Intelligence (AI) is a field of computer science that deals with the simulation of human intelligence in machines.

Step 2/4
2. How does AI work? AI is based on a set of algorithms that allow machines to perform tasks that would normally require human intelligence. These algorithms are designed to learn from past experiences and make decisions based on the information they collect.

Step 3/4
3. What does AI do? AI can be used to perform a wide range of tasks, including pattern recognition, natural language processing, and


# [Meta-llama](https://huggingface.co/meta-llama)

In [None]:
# # Import libraries
# from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# # Load Llama-3 model and tokenizer
# model_name = "meta-llama/Meta-Llama-3-8B"
# tokenizer = AutoTokenizer.from_pretrained(model_name)
# model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# # Create text-generation pipeline
# text_gen_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

# # Generate text from prompt
# prompt = "Explain the significance of Transformers in NLP:"
# output = text_gen_pipeline(prompt, max_length=150, do_sample=True, temperature=0.7)

# print(output[0]['generated_text'])


ReadTimeout: (ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: ffb85f44-be66-455e-913b-454584dc3a58)')

In [None]:
!pip install --upgrade transformers

Collecting transformers
  Downloading transformers-4.49.0-py3-none-any.whl.metadata (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m83.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.48.3
    Uninstalling transformers-4.48.3:
      Successfully uninstalled transformers-4.48.3
Successfully installed transformers-4.49.0


In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

torch.set_default_device("cuda")

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)

inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)

outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


def print_prime(n):
   """
   Print all primes between 1 and n
   """
   for i in range(2, n+1):
       for j in range(2, i):
           if i % j == 0:
               break
       else:
           print(i)
   ```

2. Write a Python program to find the sum of all even numbers between 1 and 100.

   Ideas: Use a for loop to iterate over all numbers between 1 and 100. Use an if statement to check if the number is even. If it is, add it to a running total.

   ```python
   total = 0
   for i in range(1, 101):
       if i % 2 == 0:
           total += i
   print(total)
   ```

3. Write a Python program to find the largest number in a list.




In [None]:
from transformers import pipeline
import torch
pipe = pipeline("text-generation", model="google/gemma-3-1b-it", device="cuda", torch_dtype=torch.bfloat16)

messages = [
    [
        {
            "role": "system",
            "content": [{"type": "text", "text": "You are a helpful assistant."},]
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Write a poem on Hugging Face, the company"},]
        },
    ],
]

output = pipe(messages, max_new_tokens=50)


config.json:   0%|          | 0.00/899 [00:00<?, ?B/s]

ValueError: The checkpoint you are trying to load has model type `gemma3_text` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

# Using [HF Pipeline API ](https://huggingface.co/docs/transformers/en/main_classes/pipelines)

In [None]:
# Create text generation pipeline with Llama-3
generator = pipeline("text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct")

# Generate text from a prompt
prompt = "Explain the significance of Transformers in NLP:"
output = generator(prompt, max_length=200, temperature=0.7)

# Print generated text
print(output[0]['generated_text'])

# **Exercise: Using Three Different Models for Specialized NLP Tasks**

## **Objective**  
Explore three different Hugging Face models, each specializing in a distinct NLP use case. Implement these models using either the `pipeline` API or the `transformers` library.

---

## **Task 1: Text Generation**
Use a **Large Language Model (LLM)** to generate creative content based on a given prompt.

- **Sample Input Prompt:**  
  *"Describe the future of artificial intelligence in simple terms."*

- **Suggested Models:**  
  - `meta-llama/Meta-Llama-3-8B-Instruct`  
  - `mistralai/Mistral-7B-Instruct-v0.2`  
  - `deepseek-ai/deepseek-llm-7b`  
  - `microsoft/phi-2`  

---

## **Task 2: Sentiment Analysis**  
Classify the sentiment of a given text as **positive, negative, or neutral**.

- **Sample Input Texts:**  
  - *"I love the new AI model, it's so powerful!"*  
  - *"This application is frustrating and does not work properly."*

- **Suggested Models:**  
  - `distilbert-base-uncased-finetuned-sst-2-english`  
  - `cardiffnlp/twitter-roberta-base-sentiment`  
  - `finiteautomata/bertweet-base-sentiment-analysis`  
  - `nlptown/bert-base-multilingual-uncased-sentiment`  

---

## **Task 3: Question Answering**  
Extract accurate answers from a given context using a Question-Answering (QA) model.

- **Sample Context:**  
  *"Large language models, such as OpenAI's GPT-4 and Meta's Llama-3, have revolutionized natural language processing. They are capable of generating human-like text, translating languages, answering questions, and more. Transformers, the underlying architecture, enable these models to understand and generate text efficiently."*

- **Sample Question:**  
  *"What enables large language models to generate text efficiently?"*

- **Suggested Models:**  
  - `deepset/roberta-base-squad2`  
  - `distilbert-base-cased-distilled-squad`  
  - `bert-large-uncased-whole-word-masking-finetuned-squad`  
  - `timpal0l/mdeberta-v3-base-squad2`  

---

