Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Switch branches/tags
Nothing to show
Clone or download
h-amirkhani and sebastianruder A new reported result for SciTail dataset is added (#129)
* A new reported result for SciTail dataset is added

* Update natural_language_inference.md
Latest commit ec3e88a Oct 23, 2018
Permalink
Failed to load latest commit information.
_data Update entity_linking_disambiguation_only.yaml (#122) Oct 11, 2018
_includes no capitalization for score headers Aug 14, 2018
.gitignore ignoring _site and gems Aug 2, 2018
CNAME Create CNAME Jul 11, 2018
Gemfile Added Gemfile, instructions on how to preview site locally with Jekyll Aug 29, 2018
LICENSE Initial commit Jun 22, 2018
README.md Update MT scores for Vietnamese (#121) Oct 11, 2018
_config.yml Set theme jekyll-theme-slate Jul 8, 2018
asr.md Added file linking to ASR repo Jul 21, 2018
ccg_supertagging.md adding possibility to add more scores in table Aug 3, 2018
chunking.md adding possibility to add more scores in table Aug 3, 2018
constituency_parsing.md adding possibility to add more scores in table Aug 3, 2018
coreference_resolution.md Update coreference_resolution.md (#123) Oct 11, 2018
dependency_parsing.md Add section for unsupervised dependency parsing Sep 21, 2018
dialog.md yamlized: dialog Aug 3, 2018
domain_adaptation.md yamlized: domain adaptation Aug 16, 2018
entity_linking.md Fixed entity_linking_results Sep 9, 2018
grammatical_error_correction.md yamlized: grammatical error corr, yaml anchors Aug 4, 2018
hindi.md Merge branch to master (#118) Oct 4, 2018
information_extraction.md Sorted tables, moved temporal IE and timex to temporal processing Jul 11, 2018
korean.md Merge branch to master (#118) Oct 4, 2018
language_modeling.md Added best LM model on wikitext 103 Sep 17, 2018
lexical_normalization.md Added lexical normalization, new results Sep 25, 2018
machine_translation.md Added SOTA from MT: Understanding Back-Translation at Scale (#127) Oct 18, 2018
multi-task_learning.md Adapted headlines to highlight high-level tasks and subtasks Jun 24, 2018
multimodal.md Changed from percentage to decimal score. Jun 30, 2018
named_entity_recognition.md FIx code link names. Oct 1, 2018
natural_language_inference.md A new reported result for SciTail dataset is added (#129) Oct 23, 2018
part-of-speech_tagging.md fix typo Oct 17, 2018
question_answering.md Minor fix for typo Sep 24, 2018
relation_prediction.md added links Sep 6, 2018
relationship_extraction.md Updated results for latest RE model Oct 8, 2018
semantic_parsing.md Added Spider semantic parsing dataset Sep 26, 2018
semantic_role_labeling.md New results for Semantic Role Labeling added. Jul 29, 2018
semantic_textual_similarity.md added code column to STS SentEval Jul 14, 2018
sentiment_analysis.md Adding more information of the exact SST dataset Sep 26, 2018
stance_detection.md Sorted tables, moved temporal IE and timex to temporal processing Jul 11, 2018
summarization.md Added: Bottom-Up Abstractive Summarization (#124) Oct 11, 2018
taxonomy_learning.md Replace hypernym_discovery with taxonomy_learning Aug 3, 2018
temporal_processing.md Sorted tables, moved temporal IE and timex to temporal processing Jul 11, 2018
text_classification.md Correcting size of TREC-6 dataset and name of model (#125) Oct 12, 2018
vietnamese.md Update MT scores for Vietnamese (#121) Oct 11, 2018
word_sense_disambiguation.md adding Word Sense Disambiguation Jul 22, 2018

README.md

Tracking Progress in Natural Language Processing

Table of contents

English

Korean

Hindi

Vietnamese

This document aims to track the progress in Natural Language Processing (NLP) and give an overview of the state-of-the-art (SOTA) across the most common NLP tasks and their corresponding datasets.

It aims to cover both traditional and core NLP tasks such as dependency parsing and part-of-speech tagging as well as more recent ones such as reading comprehension and natural language inference. The main objective is to provide the reader with a quick overview of benchmark datasets and the state-of-the-art for their task of interest, which serves as a stepping stone for further research. To this end, if there is a place where results for a task are already published and regularly maintained, such as a public leaderboard, the reader will be pointed there.

If you want to find this document again in the future, just go to nlpprogress.com or nlpsota.com in your browser.

Wish list

These are tasks and datasets that are still missing.

  • Bilingual dictionary induction
  • Discourse parsing
  • Keyphrase extraction
  • Knowledge base population (KBP)
  • More dialogue tasks
  • Semi-supervised learning

Contributing

If you would like to add a new result, you can do so with a pull request (PR). In order to minimize noise and to make maintenance somewhat manageable, results reported in published papers will be preferred (indicate the venue of publication in your PR); an exception may be made for influential preprints. The result should include the name of the method, the citation, the score, and a link to the paper and should be added so that the table is sorted (with the best result on top).

If your pull request contains a new result, please make sure that "new result" appears somewhere in the title of the PR. This way, we can track which tasks are the most active and receive the most attention.

In order to make reproduction easier, we recommend to add a link to an implementation to each method if available. You can add a Code column (see below) to the table if it does not exist. In the Code column, indicate an official implementation with Official. If an unofficial implementation is available, use Link (see below). If no implementation is available, you can leave the cell empty.

Model Score Paper / Source Code
Official
Link

To add a new dataset or task, follow the below steps. Any new datasets should have been used for evaluation in at least one published paper besides the one that introduced the dataset.

  1. Fork the repository.
  2. If your task is completely new, create a new file and link to it in the table of contents above. If not, add your task or dataset to the respective section of the corresponding file (in alphabetical order).
  3. Briefly describe the dataset/task and include relevant references.
  4. Describe the evaluation setting and evaluation metric.
  5. Show how an annotated example of the dataset/task looks like.
  6. Add a download link if available.
  7. Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset/task (change Score to the metric of your dataset).
  8. Submit your change as a pull request.
Model Score Paper / Source Code

Important note: We are currently transitioning from storing results in tables (as above) to using YAML files for their greater flexibility. This will allow us to highlight additional attributes and have interesting visualizations of results down the line.

If the results for your task are already stored in a YAML file, you can simply extend the YAML file using the same fields as the existing entries. To check that the resulting table looks as expected, you can build the site locally using Jekyll by following the steps detailed here:

  1. Check whether you have Ruby 2.1.0 or higher installed with ruby --version, otherwise install it. On OS X for instance, this can be done with brew install ruby. Make sure you also have ruby-dev and zlib1g-dev installed.
  2. Install Bundler gem install bundler. If you run into issues with installing bundler on OS X, have a look here for troubleshooting tips. Also try refreshing the terminal.
  3. Clone the repo locally: git clone https://github.com/sebastianruder/NLP-progress
  4. Navigate to the repo with cd NLP-progress
  5. Install Jekyll: bundle install
  6. Run the Jekyll site locally: bundle exec jekyll serve
  7. You can now preview the local Jekyll site in your browser at http://localhost:4000.

Things to do

  • Add a column for code (see above) to each table and a link to the source code to each method.
  • Add pointers on how to retrieve data.
  • Provide more details regarding the evaluation setup of each task.
  • Add an example to every task/dataset.
  • Add statistics to every dataset.
  • Provide a description and details for every task / dataset.
  • Add a table of contents to every file (particularly the large ones).
  • We could potentially use readthedocs to provide a clearer structure.
  • All current datasets in this list are for the English language (except for UD). In a separate section, we could add datasets for other languages.