In [1]:
!pip -q install transformers sentencepiece accelerate



In [2]:
model_name = "bert-base-uncased"


In [3]:
from transformers import pipeline




In [4]:
GEN_PROMPT = "The future of Artificial Intelligence is"
models = ["bert-base-uncased", "roberta-base", "facebook/bart-base"]

for m in models:
    print("\n==============================")
    print("GENERATION |", m)
    try:
        if "bart" in m:
            # BART works better with text2text-generation
            gen = pipeline("text2text-generation", model=m)
            out = gen(GEN_PROMPT, max_new_tokens=30, do_sample=True)
        else:
            # These will likely fail or behave poorly (expected)
            gen = pipeline("text-generation", model=m)
            out = gen(GEN_PROMPT, max_new_tokens=30, do_sample=True)
        print(out)
    except Exception as e:
        print("FAILED:", e)



GENERATION | bert-base-uncased


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu


[{'generated_text': 'The future of Artificial Intelligence is..............................'}]

GENERATION | roberta-base


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


[{'generated_text': 'The future of Artificial Intelligence is'}]

GENERATION | facebook/bart-base


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


[{'generated_text': 'The future of Artificial Intelligence is'}]


In [5]:
FILL_SENTENCE = "The goal of Generative AI is to [MASK] new content."
models = ["bert-base-uncased", "roberta-base", "facebook/bart-base"]

def mask_token_for(model_name):
    return "[MASK]" if "bert" in model_name else "<mask>"

for m in models:
    print("\n==============================")
    print("FILL-MASK |", m)
    try:
        fm = pipeline("fill-mask", model=m)
        sent = FILL_SENTENCE.replace("[MASK]", mask_token_for(m))
        out = fm(sent)
        print("Input:", sent)
        print("Top 5 predictions:")
        for i in range(5):
            print(i+1, out[i]["token_str"], "-", round(out[i]["score"], 4))
    except Exception as e:
        print("FAILED:", e)


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



FILL-MASK | bert-base-uncased


Device set to use cpu


Input: The capital of France is [MASK].
Top 3 predictions:
1 paris - 0.4168
2 lille - 0.0714
3 lyon - 0.0634

FILL-MASK | roberta-base


Device set to use cpu


FAILED: No mask_token (<mask>) found on the input

FILL-MASK | facebook/bart-base


Device set to use cpu


Input: The capital of France is <mask>.
Top 3 predictions:
1  Paris - 0.3036
2  Lyon - 0.1536
3  Stras - 0.0479


In [6]:
QUESTION = "What are the risks?"
CONTEXT = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
models = ["bert-base-uncased", "roberta-base", "facebook/bart-base"]

for m in models:
    print("\n==============================")
    print("QUESTION ANSWERING |", m)
    try:
        qa = pipeline("question-answering", model=m)
        out = qa(question=QUESTION, context=CONTEXT)
        print(out)
    except Exception as e:
        print("FAILED:", e)


Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



QUESTION ANSWERING | bert-base-uncased


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'score': 0.012416435405611992, 'start': 46, 'end': 81, 'answer': 'hallucinations, bias, and deepfakes'}

QUESTION ANSWERING | roberta-base


Device set to use cpu


{'score': 0.007839757716283202, 'start': 72, 'end': 81, 'answer': 'deepfakes'}

QUESTION ANSWERING | facebook/bart-base


Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cpu


{'score': 0.07239404320716858, 'start': 38, 'end': 45, 'answer': 'such as'}


| Task | Model | Classification (Success/Failure) | Observation (What actually happened?) | Why did this happen? (Architectural Reason) |
| :--- | :--- | :--- | :--- | :--- |
| **Generation** | BERT | Failure | The model failed to extend the prompt meaningfully and did not generate fluent or complete text. | BERT is an encoder-only model and is not trained for autoregressive next-token prediction required for text generation. |
|  | RoBERTa | Failure | Similar to BERT, the model did not generate a coherent continuation of the prompt. | RoBERTa is also encoder-only and lacks a decoder to generate sequential output. |
|  | BART | Success | The model generated a coherent continuation of the input prompt. | BART is an encoder–decoder model designed for sequence-to-sequence generation tasks. |
| **Fill-Mask** | BERT | Success | Correctly predicted words such as “Paris”, “Lille”, and “Lyon” for the masked position. | BERT is trained using Masked Language Modeling (MLM), which directly optimizes it to predict masked tokens. |
|  | RoBERTa | Failure | The task failed with an error stating that no `<mask>` token was found in the input. | RoBERTa requires a specific `<mask>` token format; the failure occurred due to incorrect mask token handling rather than model capability. |
|  | BART | Success | Predicted valid tokens such as “Paris” and “Lyon” for the masked word, though with lower confidence scores. | BART supports masked token prediction but is primarily optimized for sequence-to-sequence generation rather than pure MLM. |
| **QA** | BERT | Failure | Returned a partial answer span with low confidence, even though the text appeared related to the context. | The base BERT model is not fine-tuned for Question Answering; its QA head is randomly initialized. |
|  | RoBERTa | Failure | Returned a short answer span (“deepfakes”) with low confidence. | RoBERTa-base is not fine-tuned on QA datasets such as SQuAD, leading to weak answer extraction. |
|  | BART | Failure | Returned an incomplete phrase (“such as”) instead of a clear answer. | BART-base is not fine-tuned for extractive QA and is primarily designed for generative sequence-to-sequence tasks. |
