GPT: Generative Pretrained Transfer

- Task:Text Generation
- Model Class:AutoModelForCausalLM
- Model name:GPT2

## AutoModelForCausalLM

Models used in causal language modeling — typically for generating text, where the model predicts the next token given previous tokens.

It is used with:

- GPT-2

- GPT-Neo

- GPT-J

- LLaMA

- Mistral

- Falcon, etc.

| Purpose                              | Correct AutoModel                    |
| ------------------------------------ | ------------------------------------ |
| Text Generation                      | `AutoModelForCausalLM`               |
| Masked Language Modeling (like BERT) | `AutoModelForMaskedLM`               |
| Sequence Classification (like BERT)  | `AutoModelForSequenceClassification` |


| Task                            | PyTorch Model Class             | TensorFlow Model Class            | Auto Model Class (Framework Agnostic) | Tokenizer                         |
| ------------------------------- | ------------------------------- | --------------------------------- | ------------------------------------- | --------------------------------- |
| Feature extraction (embeddings) | `BertModel`                     | `TFBertModel`                     | `AutoModel`                           | `BertTokenizer` / `AutoTokenizer` |
| Sequence classification         | `BertForSequenceClassification` | `TFBertForSequenceClassification` | `AutoModelForSequenceClassification`  | `BertTokenizer` / `AutoTokenizer` |
| Masked Language Modeling        | `BertForMaskedLM`               | `TFBertForMaskedLM`               | `AutoModelForMaskedLM`                | `BertTokenizer` / `AutoTokenizer` |
| Question Answering              | `BertForQuestionAnswering`      | `TFBertForQuestionAnswering`      | `AutoModelForQuestionAnswering`       | `BertTokenizer` / `AutoTokenizer` |
| Token classification (NER)      | `BertForTokenClassification`    | `TFBertForTokenClassification`    | `AutoModelForTokenClassification`     | `BertTokenizer` / `AutoTokenizer` |


## Summary

**Tokenizer:**

- Always use AutoTokenizer unless you are tied specifically to BERT.

**Model (PyTorch):**

- Use BertFor... classes.

**Model (TensorFlow):**

- Use TFBertFor... classes.

**Model (Auto, works with BERT and other models):**

- Use AutoModelFor... classes for flexibility.

## microsoft/Phi-3-mini-4k-instruct

- a causal language model developed by Microsoft.

- Tokenizer Details for Phi-3-mini-4k-instruct:

    - Tokenizer type: Tokenizer with SentencePiece + byte fallback

    - Underlying model: Phi-3 uses a custom tokenizer trained similarly to
    
    - LLaMA, using SentencePiece and byte-level BPE (Byte Pair Encoding).

In [None]:
from transformers import AutoModelForCausalLM,AutoTokenizer

#Load model and tokenizer
model=AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-3-mini-4k-instruct",
    device_map='cuda',
    torch_dtype='auto',
    trust_remote_code=False,
)
tokenizer=AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

In [None]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500, # maximum 500 tokens
    do_sample=False # no need of getting random tokens
)

Device set to use cuda
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


In [None]:
# The prompt(user input/query)
messages=[
    {'role':'user',
     'content':"Create a funny joke about AI."
    }
]
output=generator(messages)
print(output[0]['generated_text'])

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


 Why don't AI ever get lost? Because they always follow the algorithm!


In [None]:
prompt = "Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|>"

# Tokenize the input prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")

# Generate the text
generation_output = model.generate(
  input_ids=input_ids,
  max_new_tokens=20
)
print(generation_output[0])
# Print the output
print(tokenizer.decode(generation_output[0]))

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


tensor([14350,   385,  4876, 27746,  5281,   304, 19235,   363,   278, 25305,
          293, 16423,   292,   286,   728,   481, 29889, 12027,  7420,   920,
          372,  9559, 29889, 32001,  3323,   622, 29901, 17778, 29888,  2152,
         6225, 11763,   363,   278, 19906,   292,   341,   728,   481,    13,
           13,    13, 29928,   799], device='cuda:0')
Write an email apologizing to Sarah for the tragic gardening mishap. Explain how it happened.<|assistant|> Subject: Heartfelt Apologies for the Gardening Mishap


Dear


In [None]:
tokenizer(prompt).input_ids

[14350,
 385,
 4876,
 27746,
 5281,
 304,
 19235,
 363,
 278,
 25305,
 293,
 16423,
 292,
 286,
 728,
 481,
 29889,
 12027,
 7420,
 920,
 372,
 9559,
 29889,
 32001]