Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dataset card title #2381

Merged
merged 4 commits into from
May 20, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion datasets/circa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ task_ids:
- text-classification-other-question-answer-pair-classification
---

# Dataset Card Creation Guide
# Dataset Card for CIRCA

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
12 changes: 9 additions & 3 deletions datasets/multi_nli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ task_ids:
- semantic-similarity-scoring
---

# Dataset Card for "multi_nli"
# Dataset Card for Multi-Genre Natural Language Inference (MultiNLI)

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down Expand Up @@ -127,17 +127,23 @@ They constructed MultiNLI so as to make it possible to explicitly evaluate model

### Source Data

#### Initial Data Collection and Normalization

They created each sentence pair by selecting a premise sentence from a preexisting text source and asked a human annotator to compose a novel sentence to pair with it as a hypothesis.

#### Who are the source language producers?

[More Information Needed]

### Annotations

#### Annotation process

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
[More Information Needed]

#### Who are the annotators?

[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
[More Information Needed]

### Personal and Sensitive Information

Expand Down
24 changes: 23 additions & 1 deletion datasets/multi_nli_mismatch/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,29 @@
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
- found
languages:
- en
licenses:
- cc-by-3.0
- cc-by-sa-3.0-at
- mit
- other-Open Portion of the American National Corpus
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- text-scoring
task_ids:
- semantic-similarity-scoring
---

# Dataset Card for "multi_nli_mismatch"
# Dataset Card for Multi-Genre Natural Language Inference (Mismatched only)

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
2 changes: 1 addition & 1 deletion datasets/para_pat/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ task_ids:
- language-modeling
---

# Dataset Card Creation Guide
# Dataset Card for ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
2 changes: 1 addition & 1 deletion datasets/paws-x/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ task_ids:
- text-scoring-other-paraphrase-identification
---

# Dataset Card Creation Guide
# Dataset Card for PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
2 changes: 1 addition & 1 deletion datasets/paws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ task_ids:
- text-scoring-other-paraphrase-identification
---

# Dataset Card Creation Guide
# Dataset Card for PAWS: Paraphrase Adversaries from Word Scrambling

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
2 changes: 1 addition & 1 deletion datasets/re_dial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ task_ids:
- text-classification-other-dialogue-sentiment-classification
---

# Dataset Card Creation Guide
# Dataset Card for ReDial (Recommendation Dialogues)

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down
7 changes: 5 additions & 2 deletions datasets/s2orc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ task_ids:
- other-other-citation-recommendation
---

# Dataset Card Creation Guide
# Dataset Card for S2ORC: The Semantic Scholar Open Research Corpus

## Table of Contents
- [Dataset Description](#dataset-description)
Expand Down Expand Up @@ -242,4 +242,7 @@ Semantic Scholar Open Research Corpus is licensed under ODC-BY.
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
```
### Contributions

Thanks to [@bhavitvyamalik](https://github.com/bhavitvyamalik) for adding this dataset.