In [1]:
pip install transformers torch tf-keras



In [2]:
from transformers import pipeline

# Model identifiers
models = {
    "BERT": "bert-base-uncased",
    "RoBERTa": "roberta-base",
    "BART": "facebook/bart-base"
}
prompt_gen = "The future of Artificial Intelligence is"
prompt_mask = "The goal of Generative AI is to [MASK] new content."
qa_context = "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
qa_question = "What are the risks?"



In [3]:
print("--- EXPERIMENT 1: TEXT GENERATION ---")
for name, model_path in models.items():
    print(f"\nTesting {name}...")
    try:
        # Note: pipeline('text-generation') is meant for Decoder or Encoder-Decoder models
        generator = pipeline("text-generation", model=model_path, device=-1)
        result = generator(prompt_gen, max_length=20, num_return_sequences=1)
        print(f"Result: {result[0]['generated_text']}")
    except Exception as e:
        print(f"Result: Failed. (Reason: {str(e).split('.')[0]})")

--- EXPERIMENT 1: TEXT GENERATION ---

Testing BERT...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: The future of Artificial Intelligence is................................................................................................................................................................................................................................................................

Testing RoBERTa...


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: The future of Artificial Intelligence is

Testing BART...


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]

Some weights of BartForCausalLM were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['lm_head.weight', 'model.decoder.embed_tokens.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Result: The future of Artificial Intelligence is polymer dx Employee Creat Liv fitting kin Aqua Livjongjongjong LTD Majestyjong tradesBoo landmarks Liv axis Local axis Quentin axis Tight hometown 251 251 251 kin kin TightBoo hometown alcohol comedyton hometown judges COUNTY fatally fatally plausible confirputing kin OLBoo axisBrown faulty COUNTYidge COUNTYidgeyton Foss FossBooBoo kin kin axis comedytonyton hometown 251Boo fatallyytonBooBooyton hometown clearer 251 251putingputingputingyton Foss hometown 251 COUNTY comed 251 salv salv hides OL OLBoo Fossytonputingputing Fossputing fatallyancing Foss Foss Fossputing kinputingBooBoo Fossputingputing fatallyputing Foss Fossyton FossReportsputinglucentGuputingyton GarethlucentHonestBoo FossBooputing plausible fatallyputing plausible OLidgethatthatthat Fallen Fossputingmg faultyidgeelsiusyton hometownidgethat fatallyputingmgputinglucentputingBooancinglucentthatthat OLputingputingmgthatmgGuthatmgmg landmarks conclusivethatthat plausible OL ad

In [4]:
print("\n--- EXPERIMENT 2: FILL-MASK ---")
for name, model_path in models.items():
    print(f"\nTesting {name}...")
    # RoBERTa uses <mask> while BERT uses [MASK]
    current_prompt = prompt_mask.replace("[MASK]", "<mask>") if "roberta" in model_path else prompt_mask

    try:
        mask_filler = pipeline("fill-mask", model=model_path)
        results = mask_filler(current_prompt)
        # Show top 2 predictions
        predictions = [res['token_str'] for res in results[:2]]
        print(f"Top predictions: {predictions}")
    except Exception as e:
        print(f"Result: Failed. {str(e)}")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).



--- EXPERIMENT 2: FILL-MASK ---

Testing BERT...


Device set to use cpu


Top predictions: ['create', 'generate']

Testing RoBERTa...


Device set to use cpu


Top predictions: [' generate', ' create']

Testing BART...


Device set to use cpu


Result: Failed. No mask_token (<mask>) found on the input


In [5]:
print("\n--- EXPERIMENT 3: QUESTION ANSWERING ---")
for name, model_path in models.items():
    print(f"\nTesting {name}...")
    try:
        qa_bot = pipeline("question-answering", model=model_path)
        result = qa_bot(question=qa_question, context=qa_context)
        print(f"Answer: {result['answer']}")
    except Exception as e:
        print(f"Result: Failed or poor output. {str(e)}")

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.



--- EXPERIMENT 3: QUESTION ANSWERING ---

Testing BERT...


Device set to use cpu
Some weights of RobertaForQuestionAnswering were not initialized from the model checkpoint at roberta-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: hallucinations, bias,

Testing RoBERTa...


Device set to use cpu
Some weights of BartForQuestionAnswering were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Answer: bias

Testing BART...


Device set to use cpu


Answer: poses significant risks such as hallucinations, bias, and deepfakes


| Task | Model | Classification | Observation | Why? (Architectural Reason) |
| :--- | :--- | :--- | :--- | :--- |
| **Generation** | BERT | **Failure** | Often repeats tokens, generates nonsense, or throws an error. | **Encoder-only**: It lacks a causal mask; it is designed to see the whole sentence at once, not predict the next word. |
| | RoBERTa | **Failure** | Produces incoherent, repetitive, or random text sequences. | **Encoder-only**: Its bidirectional training objective is not optimized for auto-regressive generation. |
| | BART | **Success** | Logically completes the prompt into a coherent sentence. | **Encoder-Decoder**: Specifically designed with a decoder for sequence-to-sequence (Seq2Seq) generation. |
| **Fill-Mask** | BERT | **Success** | Accurately predicted 'create' or 'generate'. | **MLM Training**: BERT’s core pre-training objective is Masked Language Modeling. |
| | RoBERTa | **Success** | Highly accurate predictions, often more confident than BERT. | **Optimized MLM**: Uses dynamic masking and more data to refine the bidirectional understanding. |
| | BART | **Success** | Successfully identified the correct word to fill the gap. | **Text Infilling**: Part of its denoising objective involves reconstructing missing spans of text. |
| **QA** | BERT | **Failure** | Returns random words, strange symbols, or incorrect indices. | **No Fine-tuning**: Base BERT knows grammar but hasn't learned the "span extraction" task logic. |
| | RoBERTa | **Failure** | Produced inaccurate, irrelevant, or empty results. | **No Fine-tuning**: Needs specific training on datasets like SQuAD to map context to answers. |
| | BART | **Failure** | Often tried to summarize the text or rewrite it rather than extract. | **Generative Nature**: Without fine-tuning, BART defaults to generation rather than extractive answering. | **bold text** **bold text**