GoToScorer

Code for the paper:

Takumi Gotou, Ryo Nagata, Masato Mita and Kazuaki Hanawa “Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation” In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020)

@inproceedings{gotou-etal-2020-taking,
    title = "Taking the Correction Difficulty into Account in Grammatical Error Correction Evaluation",
    author = "Gotou, Takumi  and
      Nagata, Ryo  and
      Mita, Masato  and
      Hanawa, Kazuaki",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2020.coling-main.188",
    doi = "10.18653/v1/2020.coling-main.188",
    pages = "2085--2095",
}

GoToScorer can evaluate the GEC systems performances considering the difficulty of error correction.

It is confirmed to work with python 3.8.0.

Usage

python gotoscorer.py -ref <ref_file> -hyp <hyp_file>

-ref <ref_file> represents a reference M2 file and -hyp <hyp_file> represents a hypothesis M2 file. You can generate both of files by ERRANT. You can see demo/ref.m2 and demo/hyp.m2 for an example.

Quick Start

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2

Output:

----- Weighted Scores -----
Sys_name	Prec. 	Recall	F	F0.5	Accuracy
0       :	1.0000	0.4444	0.6154	0.8000	0.5833
1       :	0.2500	0.2222	0.2353	0.2439	0.2500
2       :	0.0000	0.0000	0.0000	0.0000	0.1667

Other options

-v

The output includes TP, FP, FN and TN.

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -v

----- Weighted Scores -----
Sys_name	  TP      	  FP      	  FN      	  TN      	Prec.	Recall	F	F0.5	Accuracy
0       :	  1.3333	  0.0000	  1.6667	  1.0000	1.0000	0.4444	0.6154	0.8000	0.5833
1       :	  0.6667	  2.0000	  2.3333	  0.3333	0.2500	0.2222	0.2353	0.2439	0.2500
2       :	  0.0000	  2.6667	  3.0000	  0.6667	0.0000	0.0000	0.0000	0.0000	0.1667

-name <sys_1,sys_2,...,sys_N>

Register system names for output to convert id to specified. Separate each name with comma.

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -name CNN,LSTM,Transformer

----- Weighted Scores -----
Sys_name   	Prec.	Recall	F	F0.5	Accuracy
CNN        :	1.0000	0.4444	0.6154	0.8000	0.5833
LSTM       :	0.2500	0.2222	0.2353	0.2439	0.2500
Transformer:	0.0000	0.0000	0.0000	0.0000	0.1667

-cat {1,2,3}

Compute mean and standard deviation of each error type difficulty in descending order. {1,2,3} is granularity of error type, same behavior of ERRANT.

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -cat 3

----- Category Difficulty -----
Category  	Ave.	Std.	Freq.
U:NOUN    	1.00	0.00	1
M:VERB    	0.67	0.00	1
U:PREP    	0.67	0.00	1
R:VERB    	0.67	0.00	1
R:PRON    	0.00	0.00	1
M:DET     	0.00	0.00	1

-heat <output_file>

Generate a heat map of error correction difficulty. You can see demo/heat_map.html for an example.
```
$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -heat demo/heat_map.html
```
-gen_w_file <output_file>

Generate a weight-file. Originally, multiple systems outputs are required to calculate the correction difficulty, but a single system can be evaluated by using a pre-made weight-file. You can see demo/weight.txt for an example.
```
$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -gen_w_file demo/weight.txt 
```

-w_file <weight_file>

Evaluate a system using a weight-file.

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp_1sys.m2 -w_file demo/weight.txt

----- Weighted Scores -----
Sys_name	Prec.	Recall	F	F0.5	Accuracy
0       :	1.0000	0.4444	0.6154	0.8000	0.5833

-cv <output_file>

Visualize the chunk with weight and error type, as shown in the following example. If you specify None as the file path, the output will be on the terminal.

$ python gotoscorer.py -ref demo/ref.m2 -hyp demo/hyp.m2 -cv None

----- Chunk Visualizer -----
orig:   |    |We |         |discussing|   |about |   | its  |   | . |    |
gold:   |    |We |have been|discussing|   |      |   |  it  |   | . |    |
weight: |0.33|0.0|  0.67   |   0.33   |0.0| 0.67 |0.0| 0.0  |0.0|0.0|0.33|
cat:    |    |   | M:VERB  |          |   |U:PREP|   |R:PRON|   |   |    |

orig:   |   | I |   |have been|   |to |     |park|   |tomorrow|   | . |   |
gold:   |   | I |   |   go    |   |to | the |park|   |        |   | . |   |
weight: |0.0|0.0|0.0|  0.67   |0.0|0.0| 0.0 |0.0 |0.0|  1.0   |0.0|0.0|0.0|
cat:    |   |   |   | R:VERB  |   |   |M:DET|    |   | U:NOUN |   |   |   |

How to make M2 file

GTS requires reference M2 and hypothesis M2. You can make these files using ERRANT.

Example for generating M2 files with demo data

Generating demo/hyp.m2

$ errant_parallel -orig demo/orig.txt -cor demo/sys1.txt demo/sys2.txt demo/sys3.txt -out demo/hyp.m2

Generating demo/ref.m2
```
$ errant_parallel -orig demo/orig.txt -cor demo/gold.txt -out demo/ref.m2
```
In general, it is unlikely to be generated in this way, since existing correct answer files are used as references.

Visualizer of error correction difficulty

GTS provides a visualizer of error correction difficulty. Errors are colored according to the success rate: pale (easier) to deep (harder). Furthermore, the red indicates errors what should be corrected (TP, FN), and the blue indicates that system has corrected what should not be corrected (FP). If you mouseover colored words, you can see the detail of the correction: an error type, a correct correction, a weight.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
demo		demo
image		image
scripts		scripts
weight_files		weight_files
CHANGELOG.md		CHANGELOG.md
README.md		README.md
README_ja.md		README_ja.md
gotoscorer.py		gotoscorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoToScorer

Usage

Quick Start

Other options

How to make M2 file

Visualizer of error correction difficulty

About

Releases 2

Packages

Languages

gotutiyan/GTS

Folders and files

Latest commit

History

Repository files navigation

GoToScorer

Usage

Quick Start

Other options

How to make M2 file

Visualizer of error correction difficulty

About

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages