Skip to content

Latest commit

 

History

History
400 lines (265 loc) · 14.9 KB

cli_interface.md

File metadata and controls

400 lines (265 loc) · 14.9 KB

CLI Interface to Various Tasks

Below is a (mostly complete) list of tasks that ExplainaBoard currently supports, along with examples of how to analyze different tasks. In particular text classification is a good example to start with.

General notes:

  • Click the link on the task name for more details, or when no link exists you can open the example data to see what the file format looks like.
  • You can either analyze an existing dataset included in Datalab or use your own custom dataset. The directions below describe how to do both in most cases, but using DataLab has some advantages such as allowing for easy calculation of training-set features and compatibility with ExplainaBoard online leaderboards. You can check the list of datasets supported in DataLab and add your dataset if it doesn't exist.
  • All of the examples below will output a json report to standard out, which you can pipe to a file such as report.json for later use. Also, check out our visualization tools.

We welcome contributions of more tasks, or detailed documentation for tasks where the documentation does not yet exist! Please open an issue or file a PR.

Table of Contents

Text classification consists of classifying text into different categories, such as sentiment values or topics. The below example performs an analysis on the Stanford Sentiment Treebank, a set of sentiment tags over English reviews.

CLI Examples

The below example loads the sst2 dataset from DataLab:

explainaboard --task text-classification --dataset sst2 --system-outputs ./data/system_outputs/sst2/sst2-lstm-output.txt

The below example loads a dataset from an existing file:

explainaboard --task text-classification --custom-dataset-paths ./data/system_outputs/sst2/sst2-dataset.tsv --system-outputs ./data/system_outputs/sst2/sst2-lstm-output.txt

Classification of pairs of text, such as natural language inference or paraphrase detection. The example below concerns natural language infernce, predicting whether a premise, entails, contradicts, or is neutral with respect to a hypothesis, on the Stanford Natural Language Inference dataset.

CLI Example

The below example loads the snli dataset from DataLab:

explainaboard --task text-pair-classification --dataset snli --system-outputs ./data/system_outputs/snli/snli-roberta-output.txt

The below example loads a dataset from an existing file:

explainaboard --task text-pair-classification --custom-dataset-paths ./data/system_outputs/snli/snli-dataset.tsv --system-outputs ./data/system_outputs/snli/snli-roberta-output.txt

Conditional text generation concerns generation of one text based on other texts, including tasks like summarization and machine translation. The below example evaluates a summarization system on the CNN-daily mail dataset.

CLI Example

The below example loads a miniature version of the CNN-daily mail dataset (100 lines only) from an existing file:

explainaboard --task summarization --custom-dataset-paths ./data/system_outputs/cnndm/cnndm_mini-dataset.tsv --system-outputs ./data/system_outputs/cnndm/cnndm_mini-bart-output.txt --metrics rouge2 bart_score_en_ref

Note that this uses two different metrics separated by a space.

You could also load the cnn_dailymail dataset from DataLab. Because the test set is large we don't include it directly in the explainaboard repository, but you can get an example by downloading with wget:

wget -P ./data/system_outputs/cnndm/ https://storage.googleapis.com/inspired-public-data/explainaboard/task_data/summarization/cnndm-bart-output.txt

Then run the below command and it should work:

explainaboard --task summarization --dataset cnn_dailymail --system-outputs ./data/system_outputs/cnndm/cnndm-bart-output.txt --metrics rouge2

Language Modeling

Language modeling is the task of predicting the probability for words in a text. You can analyze your language model outputs by inputting a file that has one log probability for each space-separated word. Here is an example:

CLI Example

The below example analyzes the wikitext corpus:

explainaboard --task language-modeling --custom-dataset-paths ./data/system_outputs/wikitext/wikitext-dataset.txt --system-outputs ./data/system_outputs/wikitext-sys1-output.txt

Named Entity Recognition

Named entity recognition recognizes entities such as people, organizations, or locations in text. The below examples demonstrate how you can perform such analysis on the CoNLL 2003 English named entity recognition dataset.

CLI Example

The below example loads the conll2003 NER dataset from DataLab:

explainaboard --task named-entity-recognition --dataset conll2003 --sub-dataset ner --system-outputs ./data/system_outputs/conll2003/conll2003-elmo-output.conll

Alternatively, you can reference a dataset file directly.

explainaboard --task named-entity-recognition --custom-dataset-paths ./data/system_outputs/conll2003/conll2003-dataset.conll --system-outputs ./data/system_outputs/conll2003/conll2003-elmo-output.conll

Word Segmentation

Word segmentation aims to segment texts without spaces between words.

CLI Example

The below example loads the msr dataset from DataLab:

explainaboard --task word-segmentation --dataset msr --system-outputs ./data/system_outputs/cws/test-msr-predictions.tsv

Note that the file test-msr-predictions.tsv can be downloaded here

Alternatively, you can reference a dataset file directly.

explainaboard --task word-segmentation --custom-dataset-paths ./data/system_outputs/cws/test.tsv --system-outputs ./data/system_outputs/cws/prediction.tsv

Chunking

Dividing text into syntactically related non-overlapping groups of words.

CLI Example

The below example loads the conll00_chunk dataset from DataLab:

explainaboard --task chunking --dataset conll00_chunk --system-outputs ./data/system_outputs/chunking/test-conll00-predictions.tsv

Alternatively, you can reference a dataset file directly.

explainaboard --task chunking --custom-dataset-paths ./data/system_outputs/chunking/dataset-test-conll00.tsv --system-outputs ./data/system_outputs/chunking/test-conll00-predictions.tsv

Extractive QA attempts to answer queries based on extracting segments from an evidence passage. The below example performs this extraction on the dataset SQuAD.

CLI Example

Below is an example of referencing the dataset directly.

explainaboard --task qa-extractive --custom-dataset-paths ./data/system_outputs/squad/squad_mini-dataset.json --system-outputs ./data/system_outputs/squad/squad_mini-example-output.json > report.json

The below example loads the squad dataset from DataLab. There is an open issue that prevents the specification of a dataset split, so this will not work at the moment. But we are working on it.

explainaboard --task qa-extractive --dataset squad --system-outputs MY_FILE > report.json

This task aims to answer a question based on a hybrid of tabular and textual context, e.g., Zhu et al.2021.

CLI Example

The below example loads the tat_qa dataset from DataLab.

explainaboard --task qa-tat --output-file-type json --dataset tat_qa --system-outputs predictions_list.json > report.json

where you can download the file predictions_list.json by:

wget -P ./ https://explainaboard.s3.amazonaws.com/system_outputs/qa_table_text_hybrid/predictions_list.json

Open-domain QA aims to answer a question in the form of natural language based on large-scale unstructured documents

Following examples show how an open-domain QA system can be evaluated with detailed analyses using ExplainaBoard CLI.

CLI Example

Using Build-in datasets from DataLab:

explainaboard --task qa-open-domain --dataset natural_questions_comp_gen   --system-outputs ./data/system_outputs/qa_open_domain/test.dpr.nq.txt  > report.json

Answer a question from multiple options. The following example demonstrates this on the metaphor QA dataset.

CLI Example

The below example loads the fig_qa dataset from DataLab.

explainaboard --task qa-multiple-choice --dataset fig_qa --system-outputs ./data/system_outputs/fig_qa/fig_qa-gptneo-output.json > report.json

And this is what it looks like with a custom dataset.

explainaboard --task qa-multiple-choice --custom-dataset-paths ./data/system_outputs/fig_qa/fig_qa-dataset.json --system-outputs ./data/system_outputs/fig_qa/fig_qa-gptneo-output.json > report.json

Predicting the tail entity of missing links in knowledge graphs

CLI Example

The below example loads the fb15k_237 dataset from DataLab.

    wget https://datalab-hub.s3.amazonaws.com/predictions/test_distmult.json
    explainaboard --task kg-link-tail-prediction --dataset fb15k_237 --sub-dataset origin --system-outputs test_distmult.json > log.res
    explainaboard --task kg-link-tail-prediction --custom-dataset-paths ./data/system_outputs/fb15k-237/data_mini.json --system-outputs ./data/system_outputs/fb15k-237/test-kg-prediction-no-user-defined-new.json > report.json

Predict the sentiment of a text based on a specific aspect.

CLI Example

This is an example with a custom dataset.

explainaboard --task aspect-based-sentiment-classification --custom-dataset-paths ./data/system_outputs/absa/absa-dataset.txt --system-outputs ./data/system_outputs/absa/absa-example-output.tsv > report.json

[Multiple-choice Cloze]

Fill in a blank based on multiple provided options

CLI Example

This is an example using the dataset from DataLab

explainaboard --task cloze-multiple-choice --dataset gaokao2018_np1 --sub-dataset cloze-multiple-choice --metrics CorrectScore --system-outputs ./integration_tests/artifacts/gaokao/rst_2018_quanguojuan1_cloze_choice.json > report.json

[Generative Cloze]

Fill in a blank based on hint

CLI Example

This is an example using the dataset from DataLab

explainaboard --task cloze-generative --dataset gaokao2018_np1 --sub-dataset cloze-hint --metrics CorrectScore --system-outputs ./integration_tests/artifacts/gaokao/rst_2018_quanguojuan1_cloze_hint.json > report.json

[Grammatical Error Correction]

Correct errors in a text

CLI Example

This is an example using the dataset from DataLab

explainaboard --task grammatical-error-correction --dataset gaokao2018_np1 --sub-dataset writing-grammar --metrics SeqCorrectScore --system-outputs ./integration_tests/artifacts/gaokao/rst_2018_quanguojuan1_gec.json > report.json

Tabular Classification

Classification over tabular data takes in a set of features and predicts a class for the outputs. The example below is over the sst2 dataset used in text classification, but after the text has been vectorized into bag-of-words features. By default the only features that is analyzed by ExplainaBoard is the label feature, so you might want to specify other features to perform bucketing over using the metadata entry in the dataset json file, as is done in sst2-tabclass-dataset.json below.

CLI Examples

The below example loads a dataset from an existing file:

explainaboard --task tabular-classification --custom-dataset-paths ./data/system_outputs/sst2_tabclass/sst2-tabclass-dataset.json --system-outputs ./data/system_outputs/sst2/sst2-lstm-output.txt

Tabular Regression

Regression over tabular data is basically the same as tabular classification above, but the predicted outputs are continuous numbers instead of classes.

CLI Examples

The below example loads a dataset from an existing file:

explainaboard --task tabular-regression --custom-dataset-paths ./data/system_outputs/sst2_tabreg/sst2-tabclass-dataset.json --system-outputs ./data/system_outputs/sst2_tabreg/sst2-tabreg-lstm-output.txt

This task aim to detect the argument pairs from each passage pair of review and rebuttal.

CLI Examples

The below example loads the ape dataset from DataLab:

explainaboard --task argument-pair-extraction --dataset ape --system-outputs ./data/system_outputs/ape/ape_predictions.txt

Given an argument, the task aims to identify one matched argument from a list of arguments.

CLI Examples

The example below loads the iapi dataset from DataLab:

explainaboard --task argument-pair-identification --dataset iapi --system-outputs data/system_outputs/iapi/predictions.txt > report.json

[Meta Evaluation NLG]

Evaluating the reliability of automated metrics for general text generation tasks, such as text summarization.

CLI Examples

The below example loads the meval_summeval dataset from DataLab:

explainaboard --task meta-evaluation-nlg --dataset meval_summeval --sub-dataset coherence --system-outputs ./data/system_outputs/summeval/sumeval_bart.json > report.json