add dataset card title (#2381)

* add dataset card title * YAML tags for multi_nli_mismatch * extra info added in s2orc and multi_nli * minor change
huggingface · May 20, 2021 · e8abb4a · e8abb4a · github-actions · May 20, 2021
1 parent f3dc890
commit e8abb4a
Show file tree

Hide file tree

Showing 8 changed files with 42 additions and 11 deletions.
diff --git a/datasets/circa/README.md b/datasets/circa/README.md
@@ -20,7 +20,7 @@ task_ids:
 - text-classification-other-question-answer-pair-classification
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for CIRCA
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/multi_nli/README.md b/datasets/multi_nli/README.md
@@ -23,7 +23,7 @@ task_ids:
 - semantic-similarity-scoring
 ---
 
-# Dataset Card for "multi_nli"
+# Dataset Card for Multi-Genre Natural Language Inference (MultiNLI)
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)
@@ -127,17 +127,23 @@ They constructed MultiNLI so as to make it possible to explicitly evaluate model
 
 ### Source Data
 
+#### Initial Data Collection and Normalization
+
 They created each sentence pair by selecting a premise sentence from a preexisting text source and asked a human annotator to compose a novel sentence to pair with it as a hypothesis.
 
+#### Who are the source language producers?
+
+[More Information Needed]
+
 ### Annotations
 
 #### Annotation process
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+[More Information Needed]
 
 #### Who are the annotators?
 
-[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
+[More Information Needed]
 
 ### Personal and Sensitive Information
 

diff --git a/datasets/multi_nli_mismatch/README.md b/datasets/multi_nli_mismatch/README.md
@@ -1,7 +1,29 @@
 ---
+annotations_creators:
+- crowdsourced
+language_creators:
+- crowdsourced
+- found
+languages:
+- en
+licenses:
+- cc-by-3.0
+- cc-by-sa-3.0-at
+- mit
+- other-Open Portion of the American National Corpus
+multilinguality:
+- monolingual
+size_categories:
+- 100K<n<1M
+source_datasets:
+- original
+task_categories:
+- text-scoring
+task_ids:
+- semantic-similarity-scoring
 ---
 
-# Dataset Card for "multi_nli_mismatch"
+# Dataset Card for Multi-Genre Natural Language Inference (Mismatched only)
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/para_pat/README.md b/datasets/para_pat/README.md
@@ -34,7 +34,7 @@ task_ids:
 - language-modeling
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/paws-x/README.md b/datasets/paws-x/README.md
@@ -30,7 +30,7 @@ task_ids:
 - text-scoring-other-paraphrase-identification
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/paws/README.md b/datasets/paws/README.md
@@ -32,7 +32,7 @@ task_ids:
 - text-scoring-other-paraphrase-identification
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for PAWS: Paraphrase Adversaries from Word Scrambling
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/re_dial/README.md b/datasets/re_dial/README.md
@@ -21,7 +21,7 @@ task_ids:
 - text-classification-other-dialogue-sentiment-classification
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for ReDial (Recommendation Dialogues)
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)

diff --git a/datasets/s2orc/README.md b/datasets/s2orc/README.md
@@ -24,7 +24,7 @@ task_ids:
 - other-other-citation-recommendation
 ---
 
-# Dataset Card Creation Guide
+# Dataset Card for S2ORC: The Semantic Scholar Open Research Corpus
 
 ## Table of Contents
 - [Dataset Description](#dataset-description)
@@ -242,4 +242,7 @@ Semantic Scholar Open Research Corpus is licensed under ODC-BY.
       archivePrefix={arXiv},
       primaryClass={cs.CL}
 }
-```
+```
+### Contributions
+
+Thanks to [@bhavitvyamalik](https://github.com/bhavitvyamalik) for adding this dataset.