In [1]:
!pip install transformers torch
from transformers import pipeline


[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable


## Experiment 1: Text Generation

Task: Try to generate text using the prompt: "The future of Artificial Intelligence is"

In [2]:
generator_bert = pipeline(
    "text-generation",
    model="bert-base-uncased"
)

generator_bert("The future of Artificial Intelligence is")

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`


Loading weights:   0%|          | 0/202 [00:00<?, ?it/s]

BertLMHeadModel LOAD REPORT from: bert-base-uncased
Key                         | Status     |  | 
----------------------------+------------+--+-
bert.pooler.dense.bias      | UNEXPECTED |  | 
cls.seq_relationship.weight | UNEXPECTED |  | 
bert.pooler.dense.weight    | UNEXPECTED |  | 
cls.seq_relationship.bias   | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is. thele [ [ [ [ [ " " ( " ". it it it it it it it it it actually so and and and and are as..................... : you many so austin two to ". it it it it or and and and and ( ". it it it it it it it it or so jefferson in the and. it it it it it it it it it it org, " he bar lot, ( ". the an an a mrs.. it it it it was belle. him and and and " ( " ( ) ( " ( " ( ( - ) ( ) a a scheme on many way way way way re legal i an an an / the the ring. are were from ".. the the those being being which as our their to the women it to he to is\'\'\'" ( ) ( " (\') of with me is her of common just hers (. it or some some some some some some some mostly rock many so jefferson lu,..........................................................................................................................................'}]

In [3]:
generator_roberta = pipeline(
    "text-generation",
    model="roberta-base"
)

generator_roberta("The future of Artificial Intelligence is")

If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`


Loading weights:   0%|          | 0/202 [00:00<?, ?it/s]

RobertaForCausalLM LOAD REPORT from: roberta-base
Key                             | Status     |  | 
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'The future of Artificial Intelligence is'}]

In [4]:
generator_bart = pipeline(
    "text-generation",
    model="facebook/bart-base"
)

generator_bart(
    "The future of Artificial Intelligence is",
    max_length=30
)

Loading weights:   0%|          | 0/159 [00:00<?, ?it/s]

This checkpoint seem corrupted. The tied weights mapping for this model specifies to tie model.decoder.embed_tokens.weight to lm_head.weight, but both are absent from the checkpoint, and we could not find another related tied weight for those keys
BartForCausalLM LOAD REPORT from: facebook/bart-base
Key                                                           | Status     | 
--------------------------------------------------------------+------------+-
encoder.layers.{0, 1, 2, 3, 4, 5}.self_attn.k_proj.bias       | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.final_layer_norm.bias       | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.fc2.weight                  | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.self_attn_layer_norm.bias   | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.self_attn.q_proj.bias       | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.fc1.bias                    | UNEXPECTED | 
encoder.layers.{0, 1, 2, 3, 4, 5}.self_attn_layer_norm.weight | UNEXPECTED 

[{'generated_text': 'The future of Artificial Intelligence isayingDayayinggian beansiesta Tomato Tomato Tomato+) schematicaying REG beans beans beans greeting greeting Mininggianfacing sentiments beans beans organization leveraging leveragingElryceryce leveraging chick)[olesc Tomato intense census greeting organization leveragingDiamondeah twistolesc chairmanDiamond)[ beans Surve organization)[)[ beans beans visionary Armenia rec Yen REGGate greetinglo exercisesDiamondDiamond+) misrepresent chairman chairman chairman exercisesbett)[F recPersIGHTApple+) Kare+) chairman beans 398)[ interfaces)[)[)[IGHT rec+) chairman twist beansbett greeting greeting rec)[)[+) organizationIGHT rec chairman)[)[Ébett)[)[ organization)[IGHTIGHTIGHT Kare+) Kare misrepresent+) organization greeting organization+)É)[)[Flash Elastic)[IGHTPersIGHT 13olesc)[ Nicola)[)[ DruIGHT)[ organization interfaces organization)[ beansPersDiamond)[)[ BarthIGHTPers+) leveragingIGHTPers)[)[ occupiesPers ArmeniaIGHT wattsbettIGH

### Observation – Text Generation

- BERT generated incoherent and repetitive text when forced to perform text generation.
- RoBERTa also failed to generate meaningful text.
- BART was able to generate a short and somewhat coherent continuation.
- This shows that encoder-only models struggle with text generation, while encoder-decoder models perform better.

## Experiment 2: Masked Language Modeling (Missing Word)

Task: Predict the missing word in: "The goal of Generative AI is to [MASK] new content."

In [5]:
fill_mask_bert = pipeline(
    "fill-mask",
    model="bert-base-uncased"
)

fill_mask_bert("The goal of Generative AI is to [MASK] new content.")

Loading weights:   0%|          | 0/202 [00:00<?, ?it/s]

BertForMaskedLM LOAD REPORT from: bert-base-uncased
Key                         | Status     |  | 
----------------------------+------------+--+-
bert.pooler.dense.bias      | UNEXPECTED |  | 
cls.seq_relationship.weight | UNEXPECTED |  | 
bert.pooler.dense.weight    | UNEXPECTED |  | 
cls.seq_relationship.bias   | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


[{'score': 0.539692759513855,
  'token': 3443,
  'token_str': 'create',
  'sequence': 'the goal of generative ai is to create new content.'},
 {'score': 0.15575766563415527,
  'token': 9699,
  'token_str': 'generate',
  'sequence': 'the goal of generative ai is to generate new content.'},
 {'score': 0.05405496060848236,
  'token': 3965,
  'token_str': 'produce',
  'sequence': 'the goal of generative ai is to produce new content.'},
 {'score': 0.044515229761600494,
  'token': 4503,
  'token_str': 'develop',
  'sequence': 'the goal of generative ai is to develop new content.'},
 {'score': 0.017577484250068665,
  'token': 5587,
  'token_str': 'add',
  'sequence': 'the goal of generative ai is to add new content.'}]

In [6]:
fill_mask_roberta = pipeline(
    "fill-mask",
    model="roberta-base"
)

fill_mask_roberta("The goal of Generative AI is to <mask> new content.")

Loading weights:   0%|          | 0/202 [00:00<?, ?it/s]

RobertaForMaskedLM LOAD REPORT from: roberta-base
Key                             | Status     |  | 
--------------------------------+------------+--+-
roberta.embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


[{'score': 0.37113118171691895,
  'token': 5368,
  'token_str': ' generate',
  'sequence': 'The goal of Generative AI is to generate new content.'},
 {'score': 0.3677138090133667,
  'token': 1045,
  'token_str': ' create',
  'sequence': 'The goal of Generative AI is to create new content.'},
 {'score': 0.08351466804742813,
  'token': 8286,
  'token_str': ' discover',
  'sequence': 'The goal of Generative AI is to discover new content.'},
 {'score': 0.02133519947528839,
  'token': 465,
  'token_str': ' find',
  'sequence': 'The goal of Generative AI is to find new content.'},
 {'score': 0.01652175933122635,
  'token': 694,
  'token_str': ' provide',
  'sequence': 'The goal of Generative AI is to provide new content.'}]

In [7]:
fill_mask_bart = pipeline(
    "fill-mask",
    model="facebook/bart-base"
)

fill_mask_bart("The goal of Generative AI is to <mask> new content.")

Loading weights:   0%|          | 0/259 [00:00<?, ?it/s]

[{'score': 0.07461544126272202,
  'token': 1045,
  'token_str': ' create',
  'sequence': 'The goal of Generative AI is to create new content.'},
 {'score': 0.06571853160858154,
  'token': 244,
  'token_str': ' help',
  'sequence': 'The goal of Generative AI is to help new content.'},
 {'score': 0.060880184173583984,
  'token': 694,
  'token_str': ' provide',
  'sequence': 'The goal of Generative AI is to provide new content.'},
 {'score': 0.035935722291469574,
  'token': 3155,
  'token_str': ' enable',
  'sequence': 'The goal of Generative AI is to enable new content.'},
 {'score': 0.03319481760263443,
  'token': 1477,
  'token_str': ' improve',
  'sequence': 'The goal of Generative AI is to improve new content.'}]

### Observation – Fill Mask

- BERT accurately predicted meaningful words such as "generate" and "create".
- RoBERTa produced correct and contextually relevant predictions.
- BART was able to fill the mask but with slightly less accurate predictions.
- Masked Language Modeling works best for models explicitly trained for it.

## Experiment 3: Question Answering

Task: Answer the question "What are the risks?" based on the context: "Generative AI poses significant risks such as hallucinations, bias, and deepfakes."

In [8]:
qa_bert = pipeline(
    "question-answering",
    model="bert-base-uncased"
)

qa_bert(
    question="What are the risks?",
    context="Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
)

Loading weights:   0%|          | 0/197 [00:00<?, ?it/s]

BertForQuestionAnswering LOAD REPORT from: bert-base-uncased
Key                                        | Status     | 
-------------------------------------------+------------+-
cls.predictions.transform.dense.bias       | UNEXPECTED | 
bert.pooler.dense.bias                     | UNEXPECTED | 
cls.predictions.bias                       | UNEXPECTED | 
bert.pooler.dense.weight                   | UNEXPECTED | 
cls.seq_relationship.bias                  | UNEXPECTED | 
cls.predictions.transform.dense.weight     | UNEXPECTED | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED | 
cls.seq_relationship.weight                | UNEXPECTED | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | 
qa_outputs.weight                          | MISSING    | 
qa_outputs.bias                            | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized beca

{'score': 0.012824046425521374,
 'start': 46,
 'end': 66,
 'answer': 'hallucinations, bias'}

In [9]:
qa_roberta = pipeline(
    "question-answering",
    model="roberta-base"
)

qa_roberta(
    question="What are the risks?",
    context="Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
)

Loading weights:   0%|          | 0/197 [00:00<?, ?it/s]

RobertaForQuestionAnswering LOAD REPORT from: roberta-base
Key                             | Status     | 
--------------------------------+------------+-
lm_head.dense.weight            | UNEXPECTED | 
roberta.embeddings.position_ids | UNEXPECTED | 
lm_head.layer_norm.bias         | UNEXPECTED | 
lm_head.bias                    | UNEXPECTED | 
lm_head.dense.bias              | UNEXPECTED | 
lm_head.layer_norm.weight       | UNEXPECTED | 
qa_outputs.weight               | MISSING    | 
qa_outputs.bias                 | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


{'score': 0.00978428591042757,
 'start': 0,
 'end': 31,
 'answer': 'Generative AI poses significant'}

In [10]:
qa_bart = pipeline(
    "question-answering",
    model="facebook/bart-base"
)

qa_bart(
    question="What are the risks?",
    context="Generative AI poses significant risks such as hallucinations, bias, and deepfakes."
)

Loading weights:   0%|          | 0/259 [00:00<?, ?it/s]

BartForQuestionAnswering LOAD REPORT from: facebook/bart-base
Key               | Status  | 
------------------+---------+-
qa_outputs.weight | MISSING | 
qa_outputs.bias   | MISSING | 

Notes:
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


{'score': 0.012210825458168983,
 'start': 0,
 'end': 31,
 'answer': 'Generative AI poses significant'}

### Observation – Question Answering

- All three models produced answers, but the responses were inconsistent.
- Some answers were incomplete or had low confidence scores.
- Since base models were used without QA fine-tuning, performance was limited.
- Encoder strength helped, but lack of task-specific training affected accuracy.


### Overall Conclusion

This experiment highlights the importance of model architecture. Encoder-only models like BERT and RoBERTa perform well on understanding tasks such as masked word prediction but fail at text generation. Encoder-decoder models like BART are better suited for generation tasks.
