# Pick your pipeline 👩🏽‍🔧

This series of short exercises will introduce datasets connected to various NLP problems, your goal will be to identify the right pipeline to tackle the problem, analyse, and criticize your results.



# 🚀 Sentiment Analysis on Financial Tweets with Transformers 💸🐦

Welcome to this hands-on exercise where we explore the **Twitter Financial News** dataset! 🤓 Let's dive into the world of finance, NLP, and transformers! ⚡️



## 📥 1. Load the Dataset with 🤗 Hugging Face Datasets

First things first, let's load the dataset using `datasets` from Hugging Face. 
Use the [documentation](https://huggingface.co/datasets/zeroshot/twitter-financial-news-sentiment) to load the data with pandas.



In [36]:
from datasets import load_dataset
import tqdm as notebook_tqdm

import pandas as pd

splits = {'train': 'sent_train.csv', 'validation': 'sent_valid.csv'}
df = pd.read_csv("hf://datasets/zeroshot/twitter-financial-news-sentiment/" + splits["train"])

In [37]:
# visualize the first few rows of data
df.head()

Unnamed: 0,text,label
0,$BYND - JPMorgan reels in expectations on Beyo...,0
1,$CCL $RCL - Nomura points to bookings weakness...,0
2,"$CX - Cemex cut at Credit Suisse, J.P. Morgan ...",0
3,$ESS: BTIG Research cuts to Neutral https://t....,0
4,$FNKO - Funko slides after Piper Jaffray PT cu...,0


## 🔍 2. What Are the Labels? 🏷️

Once the dataset is loaded, let’s explore what labels are used to annotate the tweets:


In [38]:
# print out the label distribution
df["label"].value_counts()

label
2    6178
1    1923
0    1442
Name: count, dtype: int64

These labels reflect how the tweet might influence a financial decision 📈📉

```python
sentiments = {
    "LABEL_0": "Bearish", 
    "LABEL_1": "Bullish", 
    "LABEL_2": "Neutral"
}
```

If you are not familiar with this terminology, make a quick search before coming back to the exercise. Your goal is to come back with a clear idea of how these labels could relate to your models' predictions.


Let's create a new column called `label_text` where the more positive label is replaced with `POSITIVE`, the more negative label with `NEGATIVE`, and the othe one with `NEUTRAL`.

In [None]:
sentiments = {
    "0": "NEGATIVE", 
    "1": "POSITIVE", 
    "2": "NEUTRAL"
}

# associate each numbered label with a text label
df["label_text"] = df["label"].apply(lambda x: sentiments[str(x)])
df.head()

Unnamed: 0,text,label,label_text
0,$BYND - JPMorgan reels in expectations on Beyo...,0,NEGATIVE
1,$CCL $RCL - Nomura points to bookings weakness...,0,NEGATIVE
2,"$CX - Cemex cut at Credit Suisse, J.P. Morgan ...",0,NEGATIVE
3,$ESS: BTIG Research cuts to Neutral https://t....,0,NEGATIVE
4,$FNKO - Funko slides after Piper Jaffray PT cu...,0,NEGATIVE


## 🤖 3. Use Transformers to Make Predictions! 🔮

Let’s bring in the magic of the `transformers` library! Choose the apporiate pipeline to create predictions for this dataset.




In [40]:
from transformers import pipeline

# use the right pipeline to make predictions on the dataset
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


## 📊 4. Compare Predictions to Dataset Labels — Can We Do Better? 🤔

Let’s evaluate how well our pipeline’s predictions align with the actual labels:



In [42]:
predictions = classifier(df["text"].to_list())

In [43]:
# display some predictions
print(predictions)

[{'label': 'NEGATIVE', 'score': 0.9941707253456116}, {'label': 'NEGATIVE', 'score': 0.9943950176239014}, {'label': 'NEGATIVE', 'score': 0.9995361566543579}, {'label': 'NEGATIVE', 'score': 0.9997311234474182}, {'label': 'NEGATIVE', 'score': 0.9956093430519104}, {'label': 'NEGATIVE', 'score': 0.8739936947822571}, {'label': 'NEGATIVE', 'score': 0.9997133612632751}, {'label': 'NEGATIVE', 'score': 0.9991508722305298}, {'label': 'NEGATIVE', 'score': 0.9993239641189575}, {'label': 'NEGATIVE', 'score': 0.9984560012817383}, {'label': 'NEGATIVE', 'score': 0.9974965453147888}, {'label': 'NEGATIVE', 'score': 0.9977843165397644}, {'label': 'NEGATIVE', 'score': 0.9973112344741821}, {'label': 'NEGATIVE', 'score': 0.9993927478790283}, {'label': 'NEGATIVE', 'score': 0.9992534518241882}, {'label': 'NEGATIVE', 'score': 0.9990935325622559}, {'label': 'NEGATIVE', 'score': 0.9976300001144409}, {'label': 'NEGATIVE', 'score': 0.9971024394035339}, {'label': 'NEGATIVE', 'score': 0.9994408488273621}, {'label': '

In [44]:
df_predictions = pd.DataFrame(predictions)
df_predictions.head()

Unnamed: 0,label,score
0,NEGATIVE,0.994171
1,NEGATIVE,0.994395
2,NEGATIVE,0.999536
3,NEGATIVE,0.999731
4,NEGATIVE,0.995609


In [59]:
from sklearn.metrics import classification_report

# produce the classification report for the dataset, what comments can you make?
labels = df["label_text"]
predicted_labels = df_predictions["label"]

print(classification_report(labels, predicted_labels))

              precision    recall  f1-score   support

    NEGATIVE       0.19      0.97      0.32      1442
     NEUTRAL       0.00      0.00      0.00      6178
    POSITIVE       0.30      0.34      0.32      1923

    accuracy                           0.22      9543
   macro avg       0.16      0.44      0.21      9543
weighted avg       0.09      0.22      0.11      9543




Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.


Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.



In [60]:
from sklearn.metrics import confusion_matrix
import plotly.express as px

# produce the confusion matrix for your predictions, what comments can you make ?
cm = confusion_matrix(labels, predicted_labels)

fig = px.imshow(
    cm, 
    labels=dict(x="Predicted labels", y="Labels"),
    x=['NEGATIVE', 'NEUTRAL', 'POSITIVE'],
    y=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)

fig.update_traces(text=cm, texttemplate="%{text}")

fig.show()

<Note type="tip" title="Financial text modeling">

💡 Note: The default model might not be specifically tuned for financial data. To get better results, consider using a finance-specific sentiment model like `ProsusAI/finbert`:

</Note>

Use this new model:





In [47]:
# produce a new set of predictions using the specialized model
classifier_finbert = pipeline("sentiment-analysis", model="ProsusAI/finbert")
predictions_finbert = classifier_finbert(df["text"].to_list())

Device set to use mps:0


In [50]:
# convert predictions into a dataframe
df_predictions_finbert = pd.DataFrame(predictions_finbert)

In [61]:
from sklearn.metrics import classification_report

# produce the classification report for the dataset, what comments can you make?
labels = df["label_text"]
predicted_labels = df_predictions_finbert["label"].str.upper()

print(classification_report(labels, predicted_labels))

              precision    recall  f1-score   support

    NEGATIVE       0.50      0.78      0.61      1442
     NEUTRAL       0.85      0.73      0.79      6178
    POSITIVE       0.58      0.60      0.59      1923

    accuracy                           0.71      9543
   macro avg       0.64      0.70      0.66      9543
weighted avg       0.74      0.71      0.72      9543



In [62]:
# produce the confusion matrix for your predictions, what comments can you make ?
cm = confusion_matrix(labels, predicted_labels)

fig = px.imshow(
    cm, 
    labels=dict(x="Predicted labels", y="Labels"),
    x=['NEGATIVE', 'NEUTRAL', 'POSITIVE'],
    y=['NEGATIVE', 'NEUTRAL', 'POSITIVE']
)

fig.update_traces(text=cm, texttemplate="%{text}")

fig.show()

Now you're analyzing financial tweets like a pro! 💪📉📈


### 🧠 How Could We Improve Performance?

Give a few ideas that could lead to increased performance:



- ✅ Use **domain-specific models** like FinBERT
- ✅ **Fine-tune** a pre-trained transformer on this dataset
- ✅ **Preprocess** tweets: remove hashtags, cashtags, links, etc.
- ✅ Try **data augmentation** or balancing if one sentiment is overrepresented


## 🎉 Wrapping Up

By now, you've:
- Loaded a real-world finance dataset 📊
- Explored its sentiment labels 🎭
- Used transformers to make predictions 🤖
- Evaluated model performance and brainstormed improvements 🧠

Now go forth and build powerful financial sentiment models! 💼✨

# 🔍 Topic Modeling on Amazon Product Reviews using Zero-Shot Classification 🛍️🤖

Welcome to another exciting NLP challenge! 🎉 This time, you’ll be working with real **Amazon product reviews** to uncover hidden topics. 💡 No training required — just pure inference magic! 🧙‍♂️✨



## 📦 Dataset Overview

This dataset contains **~1.6k real Amazon reviews** across various products. Your mission, should you choose to accept it:

> 🧠 **"Can you build a strong model that identifies topics in these reviews — without any labeled data?"**

Let’s do this! 💪



## 🛠️ 1. Load the Dataset with Pandas

Start by loading the dataset using this url: https://full-stack-assets.s3.eu-west-3.amazonaws.com/datasets/M08/transformers/amazon_reviews.csv


In [63]:
import pandas as pd

# Load your dataset
df = pd.read_csv("https://full-stack-assets.s3.eu-west-3.amazonaws.com/datasets/M08/transformers/amazon_reviews.csv")

# Take a peek
df.head()

Unnamed: 0,id,asins,brand,categories,colors,dateAdded,dateUpdated,dimension,ean,keys,...,reviews.rating,reviews.sourceURLs,reviews.text,reviews.title,reviews.userCity,reviews.userProvince,reviews.username,sizes,upc,weight
0,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I initially had trouble deciding between the p...,"Paperwhite voyage, no regrets!",,,Cristina M,,,205 grams
1,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,Allow me to preface this with a little history...,One Simply Could Not Ask For More,,,Ricky,,,205 grams
2,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,4.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I am enjoying it so far. Great for reading. Ha...,Great for those that just want an e-reader,,,Tedd Gardiner,,,205 grams
3,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I bought one of the first Paperwhites and have...,Love / Hate relationship,,,Dougal,,,205 grams
4,AVpe7AsMilAPnD_xQ78G,B00QJDU3KY,Amazon,"Amazon Devices,mazon.co.uk",,2016-03-08T20:21:53Z,2017-07-18T23:52:58Z,169 mm x 117 mm x 9.1 mm,,kindlepaperwhite/b00qjdu3ky,...,5.0,https://www.amazon.com/Kindle-Paperwhite-High-...,I have to say upfront - I don't like coroporat...,I LOVE IT,,,Miljan David Tanic,,,205 grams


## 🎯 2. Which Pipeline? 🤔

Now comes the fun part! Which pipeline should you use to assign topics to reviews *without training a model*.

The data is unlabeled, which means you'd have to identify the labels yourself. Based on a sample of reviews, can you suggest a collection of labels for classifying the reviews?

We need to use a **zero-shot-classification** pipeline.

In [80]:
from pprint import pp

# print out a few reviews, try and list a few topics you could link the reviews to.
for review in df["reviews.text"].sample(10):
    pp(review)
    print()

('If you read my Fire TV review you know that I am tough on Amazon when it '
 'comes to their own items. It needs to deliver quality for the price point to '
 'earn stars from me. Please take the time to read my entire review and feel '
 'free to ask questions. I will do my best to respond to them as I can and '
 'update the review to reflect those answers and other things I discover along '
 'the way.First my background. I own many Amazon Kindles (bw, Fire gen2, Fire '
 'gen3) as well as Apple Ipad (gen 4), Samsung Note 3 and have an LG G2 '
 'smartphone (had a Samsung S4 before that), notebooks, chromebooks, etc. I '
 'have also used many other products including the Fire HDX line. I have a '
 'solid computer background as well but honestly I am more of a casual user '
 'when it comes to tablets like this one.Amazon has changed many things over '
 'the life of the Fire product line. Adding and removing features (like '
 'cameras--the first generation had one but the second generation

We could come up with categories that would help the seller understand which aspect of the products users are commenting on like : "ease of use", "value for money", "ergonomy", "quality", "features"

## 🎯 3. Use Transformers 🤖

Use the model you chose on a random sample of 20 reviews, how well do you think your model performed?

In [None]:
from transformers import pipeline

# Initialize the zero-shot classifier
classifier = pipeline("zero-shot-classification")

# Define potential topics (you can refine these!)
potential_topics = ["ease of use", "value for money", "ergonomy", "quality", "features"]

# Try it on a sample review
preds = [classifier(x, potential_topics) for x in df["reviews.text"].sample(20)]

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


In [None]:
# display predictions
for pred in preds[:3] :
    pp(pred)
    print()

{'sequence': 'Disappointing Kindle.Charging is sometimes interrupted by loss '
             'of connection between cover and Kindle. Complained and they sent '
             'me another cover. It works better, but still looses connectivity '
             'from time to time. Have to stop and disconnect cover and '
             'reconnect it. That usually stops the problem for a while.Also, '
             'It is not clear from the sales materials that the Premium '
             'Leather cover is suede. it shows scratches from fingernails.On '
             'the plus side, the Oasis screen is good and the quality of the '
             'text is excellent. Lighting is even and reading is easy on this '
             'device.',
 'labels': ['quality',
            'ease of use',
            'features',
            'ergonomy',
            'value for money'],
 'scores': [0.3852250874042511,
            0.3175143599510193,
            0.19775818288326263,
            0.07138220220804214,
           

## 🛠️ 4. Bonus: Improve Your Topic Model 💥

How do you think we could improve this zero-shot classification?


- 🧪 Try with **different or more refined labels** (e.g. “smartphones”, “cookware”)
- 🧱 Preprocess further: remove stopwords, lemmatize
- 📊 Visualize the topic distribution with bar charts
- 🧵 Cluster similar reviews using embeddings (`sentence-transformers`)

## 🏁 Wrap-Up

You just:
- Loaded and cleaned a real-world dataset 🧼
- Used **zero-shot learning** to perform topic classification 🔮
- Started building a simple but powerful unsupervised model 💪

This is the magic of modern NLP — you’re now ready to go deeper into **topic modeling** and **semantic understanding**! 🧠✨

# 📚 Summarizing Scientific Papers with Transformers 🧠✨

Welcome, NLP explorer! In this challenge, you're diving into the world of **long-form document summarization** using research papers from **ArXiv**. 🧪🔬 Your mission: build a model that creates powerful, concise summaries of lengthy academic papers! 📄➡️✂️




## 🧾 Dataset Overview

The **ArXiv Summarization Dataset** is a curated collection of scientific papers for training and evaluating summarization models. Each paper comes with:

- `article`: 📝 Full body of the research paper  
- `abstract`: 🧠 Human-written summary  
- `id`: 📌 Unique paper identifier

You’ll be using these pairs to train or evaluate a **text summarization model** that learns to generate high-quality abstracts from full papers!



## 📥 1. Load the Dataset with 🤗 Datasets

First, let’s install the required library and load the dataset:


In [83]:
from datasets import load_dataset

dataset = load_dataset("ccdv/arxiv-summarization")

# Preview the data
dataset["train"][0]

Generating train split: 100%|██████████| 203037/203037 [00:05<00:00, 38416.37 examples/s]
Generating validation split: 100%|██████████| 6436/6436 [00:00<00:00, 35996.13 examples/s]
Generating test split: 100%|██████████| 6440/6440 [00:00<00:00, 34351.95 examples/s]


{'article': 'additive models @xcite provide an important family of models for semiparametric regression or classification . some reasons for the success of additive models are their increased flexibility when compared to linear or generalized linear models and their increased interpretability when compared to fully nonparametric models . \n it is well - known that good estimators in additive models are in general less prone to the curse of high dimensionality than good estimators in fully nonparametric models . \n many examples of such estimators belong to the large class of regularized kernel based methods over a reproducing kernel hilbert space @xmath0 , see e.g. @xcite . in the last years \n many interesting results on learning rates of regularized kernel based models for additive models have been published when the focus is on sparsity and when the classical least squares loss function is used , see e.g. @xcite , @xcite , @xcite , @xcite , @xcite , @xcite and the references therein


🎯 Each entry has:
- A **very long** article (avg. ~6000 tokens!)
- Its corresponding **abstract** (avg. ~300 tokens)



## 🧼 2. Explore & Preprocess the Text 🧹

Each document is pre-tokenized, so the paragraphs are joined with spaces and `\n` characters for formatting. No heavy cleaning is required — but you can preview it like this:


In [84]:
# print some text data and the associated target summary
print(dataset['train'][0]['article'][:1000])  # First 1000 characters
print("\nAbstract:\n", dataset['train'][0]['abstract'])


additive models @xcite provide an important family of models for semiparametric regression or classification . some reasons for the success of additive models are their increased flexibility when compared to linear or generalized linear models and their increased interpretability when compared to fully nonparametric models . 
 it is well - known that good estimators in additive models are in general less prone to the curse of high dimensionality than good estimators in fully nonparametric models . 
 many examples of such estimators belong to the large class of regularized kernel based methods over a reproducing kernel hilbert space @xmath0 , see e.g. @xcite . in the last years 
 many interesting results on learning rates of regularized kernel based models for additive models have been published when the focus is on sparsity and when the classical least squares loss function is used , see e.g. @xcite , @xcite , @xcite , @xcite , @xcite , @xcite and the references therein . of course , t

Ready to summarize like a scientist? Let's go! 🚀

---

## 🧠 3. Summarize with Transformers! 🪄

You’ll now use the **`summarization` pipeline** to automatically summarize research papers.


In [85]:
from transformers import pipeline

# Load a summarization pipeline (long-doc models work better!)
summarizer = pipeline("summarization")

# Try on a small paper segment
long_text = dataset["test"][0]["article"][:1000]  # Keep it short due to model limits
summary = summarizer(long_text, max_length=150, min_length=40, do_sample=False)

print("original text:",long_text)
print("summary:",summary[0]["summary_text"])


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


original text: for about 20 years the problem of properties of short - term changes of solar activity has been considered extensively . 
 many investigators studied the short - term periodicities of the various indices of solar activity . 
 several periodicities were detected , but the periodicities about 155 days and from the interval of @xmath3 $ ] days ( @xmath4 $ ] years ) are mentioned most often . 
 first of them was discovered by @xcite in the occurence rate of gamma - ray flares detected by the gamma - ray spectrometer aboard the _ solar maximum mission ( smm ) . 
 this periodicity was confirmed for other solar flares data and for the same time period @xcite . 
 it was also found in proton flares during solar cycles 19 and 20 @xcite , but it was not found in the solar flares data during solar cycles 22 @xcite . 
 _    several autors confirmed above results for the daily sunspot area data . @xcite studied the sunspot data from 18741984 . 
 she found the 155-day periodicity in da

💡 **Note:** Most standard summarization models like `bart-large-cnn` or `t5-base` are trained on short documents. For long-form papers, try models like:

- `allenai/led-base-16384` 🔥
- `google/pegasus-arxiv` 💥
- `t5-large` (with truncation)

---

## 📊 4. Evaluate Your Summaries 📈

### [Evaluating using ROUGE](https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization?ref=blog.mozilla.ai#evaluating-using-rouge)

[ROUGE](https://aclanthology.org/W04-1013/), which stands for Recall-Oriented Understudy for Gisting Evaluation, primarily gauges the overlap of words between a generated output and a reference text. It's a prevalent metric for evaluating automatic summarization tasks. Among its variants, `ROUGE-L` offers insights into the longest contiguous match between system-generated and reference summaries, gauging how well the system retains the original summary's essence.

In [86]:
!pip install rouge

Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [87]:
from rouge import Rouge

rouge_m = Rouge()

# Compare generated summary vs reference
reference = dataset["test"][0]["abstract"]
prediction = summary[0]["summary_text"]

results = rouge_m.get_scores([prediction], [reference])
pp(results)

[{'rouge-1': {'r': 0.14285714285714285,
              'p': 0.41935483870967744,
              'f': 0.213114750307713},
  'rouge-2': {'r': 0.0379746835443038,
              'p': 0.17647058823529413,
              'f': 0.06249999708550361},
  'rouge-l': {'r': 0.13186813186813187,
              'p': 0.3870967741935484,
              'f': 0.19672130768476223}}]


### [Evaluating using BERTScore](https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization?ref=blog.mozilla.ai#evaluating-using-bertscore)

ROUGE relies on the exact presence of words in both the predicted and reference texts, failing to interpret the underlying semantics. This is where [BERTScore](https://arxiv.org/abs/1904.09675) comes in and leverages the contextual embeddings from the BERT model, aiming to evaluate the similarity between a predicted and a reference sentence in the context of machine-generated text. By comparing embeddings from both sentences, `BERTScore` captures semantic similarities that might be missed by traditional n-gram based metrics.

In [88]:
!pip install bert_score

Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
Installing collected packages: bert_score
Successfully installed bert_score-0.3.13


In [89]:
# BERTScore leverages the pre-trained contextual embeddings from BERT and matches words in candidate and reference sentences by cosine similarity.
from bert_score import BERTScorer

# Instantiate the BERTScorer object for English language
scorer = BERTScorer(lang="en")

# Calculate BERTScore for the summary 1 against the excerpt
# P1, R1, F1_1 represent Precision, Recall, and F1 Score respectively
P1, R1, F1_1 = scorer.score([prediction], [reference])

print("Summary 1 F1 Score:", F1_1.tolist()[0])

Some weights of RobertaModel were not initialized from the model checkpoint at roberta-large and are newly initialized: ['pooler.dense.bias', 'pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Summary 1 F1 Score: 0.8207549452781677


## 🚀 Bonus Missions: Go Deeper!

Want to take it up a notch? Try these:

- 🔄 **Fine-tune** a summarization model on this dataset
- 🧱 **Chunk** longer articles into sections for better summaries
- 📊 Analyze which topics/models perform better
- 📝 Compare **abstractive** vs **extractive** approaches

---

## 🏁 Wrapping Up

You’ve just:
- Explored a real research-paper summarization dataset 📄
- Used transformers to create automatic summaries 🧠
- Evaluated your output with ROUGE scores 📊

Congratulations, you're now one step closer to automating scientific summarization! 💡📚✨ Think of what you can do with such models, you could for example train it on this specific scientific paper summarization dataset and use it to summarize your machine learning courses?