# Prodigy Demo

https://demo.prodi.gy/?=null&view_id=ner_manual

# Run Prodigy Named Entity Annotation Session

Copy / paste the following to the terminal

```bash

prodigy ner.manual news-headlines-ner blank:en ./data/news-headlines.jsonl --label PERSON,ORG,PRODUCT,LOCATION

```

Annotate a couple of records and save.

# Let's Examine Input File and Saved Annotation
## Annotation Tasks (Input)

In [5]:
from srsly import read_jsonl
file_name = './data/news-headlines.jsonl'
input_dataset = list(read_jsonl(file_name)) # wrapping result into list because read_jsonl returns a generator
print(f"Loaded {len(input_dataset)} annotation tasks")

Loaded 200 annotation tasks


In [11]:
import json
task = input_dataset[0]
print(json.dumps(task, indent = 2))


{
  "text": "Uber\u2019s Lesson: Silicon Valley\u2019s Start-Up Machine Needs Fixing",
  "meta": {
    "source": "The New York Times"
  }
}


## Connect to Prodigy database

In [12]:
from prodigy.components.db import connect
db = connect()
print(f"Database location: {db.db.database}")


Database location: /home/vscode/.prodigy/prodigy.db


In [3]:
db.datasets

['news-headlines-ner']

In [14]:
dataset_name = 'news-headlines-ner'
dataset = db.get_dataset_examples(dataset_name)
print(f"Loaded {len(dataset)} annotated tasks")

Loaded 2 annotated tasks


In [20]:
task = dataset[1]
print(task.keys())


dict_keys(['text', 'meta', '_input_hash', '_task_hash', '_is_binary', 'tokens', '_view_id', 'spans', 'answer', '_timestamp', '_annotator_id', '_session_id'])


In [21]:
for key in task.keys() :
    if key not in ['tokens'] :
        print(f"{key}: {task[key]}")

text: Pearl Automation, Founded by Apple Veterans, Shuts Down
meta: {'source': 'The New York Times'}
_input_hash: 1487477437
_task_hash: -1298236362
_is_binary: False
_view_id: ner_manual
spans: [{'start': 0, 'end': 17, 'token_start': 0, 'token_end': 2, 'label': 'ORG'}, {'start': 29, 'end': 44, 'token_start': 5, 'token_end': 7, 'label': 'ORG'}]
answer: accept
_timestamp: 1762554543
_annotator_id: 2025-11-07_22-27-32
_session_id: 2025-11-07_22-27-32


In [23]:
print(f"Text: {task['text']}")

for span in task['spans'] :
    print(f"{span['label']}: {task['text'][span['start'] : span['end']]}")

Text: Pearl Automation, Founded by Apple Veterans, Shuts Down
ORG: Pearl Automation,
ORG: Apple Veterans,


## Visualize Annotations

### Initialize spaCy

In [32]:
import spacy
from spacy import displacy
model = spacy.blank("en")


### Create spaCy `Doc` and visualize it

In [33]:
doc = model(task['text'])
# A list of tuples (LABEL, TOKEN_START, TOKEN_END)
entities = [(span['label'], span['token_start'], span['token_end']) for span in task['spans']]
doc.ents = entities

displacy.render(doc, style="ent", jupyter = True)

# Annotate Summarization Dataset

In [90]:
from pandas import read_csv
import json
file_name = "./data/hf-summarization-dataset.csv"
df = read_csv(file_name)
df.list_choices = df.list_choices.apply(lambda x : json.loads(x))
print(f"Loaded {len(df)} records from {file_name}")

Loaded 3579 records from ./data/hf-summarization-dataset.csv


In [50]:
df.head()

Unnamed: 0,id,input,correct_choice,list_choices,lbl,distractor_model,dataset
0,32168497,Vehicles and pedestrians will now embark and d...,Passengers using a chain ferry have been warne...,[ A new service on the Isle of Wight's chain f...,1,bart-base,xsum
1,29610109,If you leave your mobile phone somewhere do yo...,"Do you ever feel lonely, stressed or jealous w...","[ You may be worried about your health, but wh...",1,bart-base,xsum
2,38018439,"Speaking on TV, Maria Zakharova said Jews had ...",A spokeswoman on Russian TV has said Jewish pe...,[ The Russian foreign minister has said she ha...,1,bart-base,xsum
3,32790804,"A report by the organisation suggests men, wom...",Egyptian security forces are using sexual viol...,[ Egyptian police are systematically abusing d...,1,bart-base,xsum
4,36437856,Police in Australia and Europe were aware of a...,One word and a freckle indirectly led to Huckl...,[One word and a freckle indirectly led to Huck...,0,bart-base,xsum


## Construct Prodigy Dataset

In [51]:
tasks = []
for _, row in df.iterrows():
    task = {
        'id' : row['id'],
        'text' : row['input'],
        'options': [],
        'correct_answer' : str(row.lbl),
    }
    for index, choice in enumerate(row.list_choices) :
        task['options'].append({'id' : str(index), 'text' : choice})
    tasks.append(task)
print(f"Generated {len(tasks)} tasks.")

Generated 3579 tasks.


In [55]:
print(json.dumps(tasks[0], indent = 2))

{
  "id": "32168497",
  "text": "Vehicles and pedestrians will now embark and disembark the Cowes ferry separately following Maritime and Coastguard Agency (MCA) guidance.\nIsle of Wight Council said its new procedures were in response to a resident's complaint.\nCouncillor Shirley Smart said it would \"initially result in a slower service\".\nOriginally passengers and vehicles boarded or disembarked the so called \"floating bridge\" at the same time.\nMs Smart, who is the executive member for economy and tourism, said the council already had measures in place to control how passengers and vehicles left or embarked the chain ferry \"in a safe manner\".\nHowever, it was \"responding\" to the MCA's recommendations \"following this complaint\".\nShe added: \"This may initially result in a slower service while the measures are introduced and our customers get used to the changes.\"\nThe service has been in operation since 1859.",
  "options": [
    {
      "id": "0",
      "text": " A new 

In [56]:
import srsly
file_name = "./data/summarization-dataset-choices.jsonl"
srsly.write_jsonl(file_name, tasks)
print(f"Saved {len(tasks)} tasks in {file_name}")

Saved 3579 tasks in ./data/summarization-dataset-choices.jsonl


## Start Prodigy

```bash

cd annotation-projects/summarization-choices
./start.sh

```

The annotation session will be started at at http://localhost:8082?session=natalia

## Examine annotated data

### Connect to Prodigy DB

In [61]:
from prodigy.components.db import connect
db = connect()
print(f"Database location: {db.db.database}")

Database location: /home/vscode/.prodigy/prodigy.db


In [62]:
db.datasets

['text-summarization']

In [63]:
dataset_name = 'text-summarization'
dataset = db.get_dataset_examples(dataset_name)
print(f"Loaded {len(dataset)} annotated tasks")

Loaded 3 annotated tasks


In [64]:
dataset[0]

{'id': '32168497',
 'text': 'Vehicles and pedestrians will now embark and disembark the Cowes ferry separately following Maritime and Coastguard Agency (MCA) guidance.\nIsle of Wight Council said its new procedures were in response to a resident\'s complaint.\nCouncillor Shirley Smart said it would "initially result in a slower service".\nOriginally passengers and vehicles boarded or disembarked the so called "floating bridge" at the same time.\nMs Smart, who is the executive member for economy and tourism, said the council already had measures in place to control how passengers and vehicles left or embarked the chain ferry "in a safe manner".\nHowever, it was "responding" to the MCA\'s recommendations "following this complaint".\nShe added: "This may initially result in a slower service while the measures are introduced and our customers get used to the changes."\nThe service has been in operation since 1859.',
 'options': [{'id': '0',
   'text': " A new service on the Isle of Wight's

### Inspect Task 1

```python
{
  # Copied from the input task
  'id': '32168497',
  'text': 'Vehicles and pedestrians will now embark and disembark the Cowes ferry separately following Maritime and Coastguard Agency (MCA) guidance.\nIsle of Wight Council said its new procedures were in response to a resident\'s complaint.\nCouncillor Shirley Smart said it would "initially result in a slower service".\nOriginally passengers and vehicles boarded or disembarked the so called "floating bridge" at the same time.\nMs Smart, who is the executive member for economy and tourism, said the council already had measures in place to control how passengers and vehicles left or embarked the chain ferry "in a safe manner".\nHowever, it was "responding" to the MCA\'s recommendations "following this complaint".\nShe added: "This may initially result in a slower service while the measures are introduced and our customers get used to the changes."\nThe service has been in operation since 1859.',
  'options': [
    {'id': '0', 'text': " A new service on the Isle of Wight's chain ferry has been launched following a complaint from a resident."},
    {'id': '1', 'text': 'Passengers using a chain ferry have been warned crossing times will be longer because of new safety measures.'}],
  'correct_answer': '1',

  # Spacy uses murmur hash to uniquely identify tasks for quicker processing
  '_input_hash': -920299312,
  '_task_hash': -1864632314,

  # This is passed from the recipe
  '_view_id': 'blocks', # Combined "choices" recipe with custom HTML line
  'config': {'choice_style': 'single'}, # Lets us know that it was a single choice annotation (as opposed to multiple choice)

  # This added by Prodigy
  'accept': ['0'], # "id" field of selected option from "options" above
  'answer': 'accept', # Accept button was pressed to process the annotation (as opposed to 'reject')
  '_timestamp': 1762560620, 
  '_annotator_id': 'text-summarization-natalia', # dataset_name + session id from the URL
  '_session_id': 'text-summarization-natalia' # dataset_name + session id from the URL. Currently, _session_id and annotator_id are interchangeable
}

```

In [66]:
dataset[2]


{'id': '38018439',
 'text': 'Speaking on TV, Maria Zakharova said Jews had told her they donated both to Mr Trump and Hillary Clinton.\nShe joked that American Jews were the best guide to US politics.\nThe diplomat\'s remarks caused shock. Anti-US propagandists in the last century peddled an idea that rich New York Jews controlled US politics.\nMs Zakharova was speaking on a chat show on Russian state TV at the weekend but her comments drew more attention after being picked up by media outlets on Thursday.\nShe said she had visited New York with an official Russian delegation at the time of the last UN General Assembly, in September.\n"I have a lot of friends and acquaintances there, of course I was interested to find out: how are the elections going, what are the American people\'s expectations?" she said.\n"If you want to know what will happen in America, who do you need to talk to? You have to talk to the Jews, of course. It goes without saying."\nAt this, the TV studio audience app

### Inspect Task 2 (Flagged)

```python
{
    'id': '29610109',
    'text': 'If you leave your mobile phone somewhere do you worry you will not be able to check it?\nIf any of this sounds familiar, there is a chance you could be spending too much time on social networks.\nAn exclusive online Newsbeat poll suggests that a quarter of 15 to 18-year-olds in the UK feel happier online than they do in real life.\nDr Radha from The Surgery on Radio 1 has dealt with patients who have displayed "a lot of social anxiety" because they are using social networks too much.\n"Being online can provoke a sense of \'I\'m not good enough, everyone else is having an amazing life\'," she explained.\n"It doesn\'t give us a sense of reality and actually what you will find is most people are probably doing the same thing as you are."\nThe survey, carried out last month, also suggests a third of 15 to 18-year-olds have met someone in person they originally met through social media.\nDr Radha has said it is important people carefully consider what information they share with the online community.\n"What this survey showed is a lot of people go online alone," she said.\n"In terms of our personal details and how we respond to messages from other people, we need to make sure we are looking after all of that safely."\nDr Radha was concerned that some people feel safer dealing with people online, rather than in person.\n"The more time we spend online, the less we are able to develop our social skills," she explained.\n"When you are online you\'re not getting eye contact with people or perceiving how body language is changing, so as a result what people are saying can be misinterpreted.\n"Physical contact, like a hug and a kiss, is really important. You don\'t get that kind of emotional confidence from being online."\nIf your online activity is leaving you feeling anxious, Dr Radha has advised that you should "slowly try to wean yourself off it".\nShe said: "If you are worrying, \'what\'s going on? What am I missing?\' It\'s a sign that being online too much is quite bad for you.\n"Give yourself some rules by saying, \'I\'m only going to check things three times a day for this amount of time\'."\nBBC Radio 1\'s The Surgery with Aled and Dr Radha is on Wednesday\'s at 9pm.\nFollow @BBCNewsbeat on Twitter and Radio1Newsbeat on YouTube',
    'options': [
        {'id': '0', 'text': ' You may be worried about your health, but what if you are online?'},
        {'id': '1', 'text': 'Do you ever feel lonely, stressed or jealous when you are online?'}
    ],
    'correct_answer': '1',
    '_input_hash': 1247871379,
    '_task_hash': 884697833,
    '_view_id': 'blocks',
    'accept': ['1'],
    'config': {'choice_style': 'single'},
    'flagged': True, # ATTN: the message was flagged. Normally, we would also add "User Comment" field to the annotation interface so that annotators can leave a comment
    'answer': 'accept',
    '_timestamp': 1762560626,
    '_annotator_id': 'text-summarization-natalia',
    '_session_id': 'text-summarization-natalia'
}

```

### Inspect Task 3 (Rejected)

```python

{
    'id': '38018439',
    'text': 'Speaking on TV, Maria Zakharova said Jews had told her they donated both to Mr Trump and Hillary Clinton.\nShe joked that American Jews were the best guide to US politics.\nThe diplomat\'s remarks caused shock. Anti-US propagandists in the last century peddled an idea that rich New York Jews controlled US politics.\nMs Zakharova was speaking on a chat show on Russian state TV at the weekend but her comments drew more attention after being picked up by media outlets on Thursday.\nShe said she had visited New York with an official Russian delegation at the time of the last UN General Assembly, in September.\n"I have a lot of friends and acquaintances there, of course I was interested to find out: how are the elections going, what are the American people\'s expectations?" she said.\n"If you want to know what will happen in America, who do you need to talk to? You have to talk to the Jews, of course. It goes without saying."\nAt this, the TV studio audience applauded loudly.\n"I went here and there among them, to chat," she continued.\nImitating a Jewish accent, Mrs Zakharova said Jewish people had told her: "\'Marochka, understand this - we\'ll donate to Clinton, of course. But we\'ll give the Republicans twice that amount.\' Enough said! That settled it for me - the picture was clear.\n"If you want to know the future, don\'t read the mainstream newspapers - our people in Brighton [Beach] will tell you everything."\nShe was referring to a district of Brooklyn with a large diaspora of Jewish emigres from the former Soviet Union.\nRussian opposition activist Roman Dobrokhotov wrote on Twitter (in Russian) that the spokeswoman had "explained Trump\'s victory as a Jewish conspiracy".\nMichael McFaul, the former US ambassador to Moscow, commented on Facebook, "Wow. And this is the woman who criticizes me for not being diplomatic."\nDuring the election campaign, Mrs Clinton accused Mr Trump of posting a "blatantly anti-Semitic" tweet after he used an image resembling the Star of David and stacks of money.\nMr Trump, whose son-in-law Jared Kushner is Jewish, dismissed the accusation as "ridiculous".\nAn exit poll by US non-profit J Street suggests an overwhelming majority of US Jews voted for Hillary Clinton in the presidential election.',
    'options': [
      {'id': '0', 'text': ' The Russian foreign minister has said she has been "settled" by criticism from Jewish people for saying that the US election was a "Jewish conspiracy".'},
      {'id': '1', 'text': 'A spokeswoman on Russian TV has said Jewish people in New York told her they had mainly backed Trump in the US election.'}
    ],
    'correct_answer': '1',
    '_input_hash': -386851509,
    '_task_hash': 594860135,
    '_view_id': 'blocks',
    'accept': [], # Nothing was selected
    'config': {'choice_style': 'single'},
    'answer': 'reject', # ATTN: the task was rejected
    '_timestamp': 1762560628,
    '_annotator_id': 'text-summarization-natalia',
    '_session_id': 'text-summarization-natalia'
}
```

# Annotate Summarization (Binary)
## Create Dataset

In [69]:
import random
tasks = []
for _, row in df.iterrows():
    for index, choice in enumerate(row.list_choices) :
        task = {
            'id' : row['id'],
            'text' : row['input'],
            'option': {'id' : str(index), 'value' : choice},
            'is_correct_answer' : row.lbl == choice,
            'html' : ' ', # Important
        }
        tasks.append(task)
print(f"Generated {len(tasks)} tasks.")

random.seed(42)
# random.shuffle(tasks) # Commenting it out for demo purposes, but the dataset should be shuffled in real life


Generated 7158 tasks.


In [70]:
file_name = './data/summarization-dataset-binary.jsonl'

srsly.write_jsonl(file_name, tasks)
print(f"Saved {len(tasks)} annotation tasks to {file_name}")

Saved 7158 annotation tasks to ./data/summarization-dataset-binary.jsonl


### Start Prodigy Session

```bash
cd /workspaces/prodigy-demo/annotation-projects/summarization-binary
./start.sh

# This will start a session at http://localhost:8082?session=natalia
```

### Inspect Annotated Tasks

In [72]:
db.datasets

['text-summarization', 'text-summarization-binary']

In [83]:
dataset_name = 'text-summarization-binary'
dataset = db.get_dataset_examples(dataset_name)
print(f"Loaded {len(dataset)} annotated tasks from {dataset_name}")

Loaded 8 annotated tasks from text-summarization-binary


#### Inspect accepted task

```python

{
    'id': '29610109',
    'text': 'If you leave your mobile phone somewhere do you worry you will not be able to check it?\nIf any of this sounds familiar, there is a chance you could be spending too much time on social networks.\nAn exclusive online Newsbeat poll suggests that a quarter of 15 to 18-year-olds in the UK feel happier online than they do in real life.\nDr Radha from The Surgery on Radio 1 has dealt with patients who have displayed "a lot of social anxiety" because they are using social networks too much.\n"Being online can provoke a sense of \'I\'m not good enough, everyone else is having an amazing life\'," she explained.\n"It doesn\'t give us a sense of reality and actually what you will find is most people are probably doing the same thing as you are."\nThe survey, carried out last month, also suggests a third of 15 to 18-year-olds have met someone in person they originally met through social media.\nDr Radha has said it is important people carefully consider what information they share with the online community.\n"What this survey showed is a lot of people go online alone," she said.\n"In terms of our personal details and how we respond to messages from other people, we need to make sure we are looking after all of that safely."\nDr Radha was concerned that some people feel safer dealing with people online, rather than in person.\n"The more time we spend online, the less we are able to develop our social skills," she explained.\n"When you are online you\'re not getting eye contact with people or perceiving how body language is changing, so as a result what people are saying can be misinterpreted.\n"Physical contact, like a hug and a kiss, is really important. You don\'t get that kind of emotional confidence from being online."\nIf your online activity is leaving you feeling anxious, Dr Radha has advised that you should "slowly try to wean yourself off it".\nShe said: "If you are worrying, \'what\'s going on? What am I missing?\' It\'s a sign that being online too much is quite bad for you.\n"Give yourself some rules by saying, \'I\'m only going to check things three times a day for this amount of time\'."\nBBC Radio 1\'s The Surgery with Aled and Dr Radha is on Wednesday\'s at 9pm.\nFollow @BBCNewsbeat on Twitter and Radio1Newsbeat on YouTube',
    'option': {'id': '0', 'value': ' You may be worried about your health, but what if you are online?'},
    'is_correct_answer': False,
    'html': ' ',
    '_input_hash': -1574724678,
    '_task_hash': -2145974390,
    '_view_id': 'html',
    'answer': 'accept',  ### ACCEPTED
    '_timestamp': 1762562740,
    '_annotator_id': 'text-summarization-binary-natalia',
    '_session_id': 'text-summarization-binary-natalia'
}

 ```

#### Inspect Rejected Task

```python
{
    'id': '38018439',
    'text': 'Speaking on TV, Maria Zakharova said Jews had told her they donated both to Mr Trump and Hillary Clinton.\nShe joked that American Jews were the best guide to US politics.\nThe diplomat\'s remarks caused shock. Anti-US propagandists in the last century peddled an idea that rich New York Jews controlled US politics.\nMs Zakharova was speaking on a chat show on Russian state TV at the weekend but her comments drew more attention after being picked up by media outlets on Thursday.\nShe said she had visited New York with an official Russian delegation at the time of the last UN General Assembly, in September.\n"I have a lot of friends and acquaintances there, of course I was interested to find out: how are the elections going, what are the American people\'s expectations?" she said.\n"If you want to know what will happen in America, who do you need to talk to? You have to talk to the Jews, of course. It goes without saying."\nAt this, the TV studio audience applauded loudly.\n"I went here and there among them, to chat," she continued.\nImitating a Jewish accent, Mrs Zakharova said Jewish people had told her: "\'Marochka, understand this - we\'ll donate to Clinton, of course. But we\'ll give the Republicans twice that amount.\' Enough said! That settled it for me - the picture was clear.\n"If you want to know the future, don\'t read the mainstream newspapers - our people in Brighton [Beach] will tell you everything."\nShe was referring to a district of Brooklyn with a large diaspora of Jewish emigres from the former Soviet Union.\nRussian opposition activist Roman Dobrokhotov wrote on Twitter (in Russian) that the spokeswoman had "explained Trump\'s victory as a Jewish conspiracy".\nMichael McFaul, the former US ambassador to Moscow, commented on Facebook, "Wow. And this is the woman who criticizes me for not being diplomatic."\nDuring the election campaign, Mrs Clinton accused Mr Trump of posting a "blatantly anti-Semitic" tweet after he used an image resembling the Star of David and stacks of money.\nMr Trump, whose son-in-law Jared Kushner is Jewish, dismissed the accusation as "ridiculous".\nAn exit poll by US non-profit J Street suggests an overwhelming majority of US Jews voted for Hillary Clinton in the presidential election.',
    'option': {'id': '0',
    'value': ' The Russian foreign minister has said she has been "settled" by criticism from Jewish people for saying that the US election was a "Jewish conspiracy".'},
    'is_correct_answer': False,
    'html': ' ',
    '_input_hash': 420195201,
    '_task_hash': 940450759,
    '_view_id': 'html',
    'answer': 'reject', # Choice was rejected
    '_timestamp': 1762562745,
    '_annotator_id': 'text-summarization-binary-natalia',
    '_session_id': 'text-summarization-binary-natalia'
}
```

# Explore Inter-annotator Agreement

### (Or more precisely, disagreement). 
### This pattern can also be used to review model results vs gold standard dataset


## Create Ground Truth Dataset

In [91]:
from copy import deepcopy
from prodigy.util import set_hashes

# Unannotated tasks (same code as above)
tasks = []
for _, row in df.iterrows():
    task = {
        'id' : row['id'],
        'text' : row['input'],
        'options': [],
        'correct_answer' : str(row.lbl),
    }
    for index, choice in enumerate(row.list_choices) :
        task['options'].append({'id' : str(index), 'text' : choice})
    tasks.append(task)
print(f"Generated {len(tasks)} tasks.")


correct_tasks = []

for task in deepcopy(tasks):
    task['_view_id'] = 'choice'
    task['answer'] = 'accept'
    task['accept'] = [task['correct_answer']]
    task['_session_id'] = 'summarization-dataset-gold'
    task['_annotator_id'] = 'summarization-dataset-gold'
    task = set_hashes(task)
    correct_tasks.append(task)


Generated 3579 tasks.


## Simulate annotations or model results

In [92]:
import random
random.seed(42)

random_tasks = []

for task in deepcopy(tasks):
    task['_view_id'] = 'choice'
    task['answer'] = 'accept'
    selected = random.choice(task['options'])
    task['accept'] = [selected['id']]
    task['_session_id'] = 'summarization-dataset-random'
    task['_annotator_id'] = 'summarization-dataset-random'
    task = set_hashes(task)
    random_tasks.append(task)

In [93]:
incorrect = 0
for task in random_tasks :
    if task['accept'][0] != task['correct_answer'] :
        incorrect += 1
incorrect

1789

## Add "Annotated" tasks to the database as if they were manually annotated

In [94]:
dataset_name = 'summarization-dataset-annotated'
if dataset_name in db.datasets :
    db.drop_dataset(dataset_name)
db.add_examples(correct_tasks, [dataset_name])
db.add_examples(random_tasks, [dataset_name])
db.count_dataset(dataset_name)

7158

## Start review session in Prodigy

# NER for text partitioning

# NER for IE

# Multi-step annotations

# Download Datasets

## Summarization Dataset

```python

from pandas import read_parquet
import json

df = read_parquet("https://huggingface.co/datasets/r-three/fib/resolve/refs%2Fconvert%2Fparquet/default/test/0000.parquet")
print(f"Loaded {len(df)} records")


df.list_choices = df.list_choices.apply(lambda x : json.dumps(list(x))) # Save lists as JSON for serialization

file_name = "./data/hf-summarization-dataset.csv"
df.to_csv(file_name, index = False)
print(f"Saved {len(df)} records in {file_name}")

```


## News Headlines Dataset

News headlines dataset: 

https://raw.githubusercontent.com/explosion/prodigy-recipes/master/example-datasets/news_headlines.jsonl

# Read Datasets from local copy

In [81]:
from pandas import read_csv
file_name = "./data/hf-summarization-dataset.csv"
df = read_csv(file_name)
print(f"Loaded {len(df)} records from {file_name}")

Loaded 3579 records from ./data/hf-summarization-dataset.csv


In [82]:
df.head()

Unnamed: 0,id,input,correct_choice,list_choices,lbl,distractor_model,dataset
0,32168497,Vehicles and pedestrians will now embark and d...,Passengers using a chain ferry have been warne...,"["" A new service on the Isle of Wight's chain ...",1,bart-base,xsum
1,29610109,If you leave your mobile phone somewhere do yo...,"Do you ever feel lonely, stressed or jealous w...","["" You may be worried about your health, but w...",1,bart-base,xsum
2,38018439,"Speaking on TV, Maria Zakharova said Jews had ...",A spokeswoman on Russian TV has said Jewish pe...,"["" The Russian foreign minister has said she h...",1,bart-base,xsum
3,32790804,"A report by the organisation suggests men, wom...",Egyptian security forces are using sexual viol...,"["" Egyptian police are systematically abusing ...",1,bart-base,xsum
4,36437856,Police in Australia and Europe were aware of a...,One word and a freckle indirectly led to Huckl...,"[""One word and a freckle indirectly led to Huc...",0,bart-base,xsum
