Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare .m2 files #24

Closed
wants to merge 8 commits into from
Closed

Prepare .m2 files #24

wants to merge 8 commits into from

Conversation

osyvokon
Copy link
Contributor

@osyvokon osyvokon commented Dec 11, 2022

M2 specifics:

  • Annotations are done on a sentence level.
  • Texts are tokenized with Stanza.
  • The error type annotations are copied from the corpus
  • There's a special document heading sentence added to the beginning of each document.
    • It looks like this: # 0123, where 0123 is the document ID.
    • This adds opportunity to utilize document-level context.

@osyvokon
Copy link
Contributor Author

Some baselines for reference:

gec-only

=========== Span-Based Correction =============================
                  TP      FP      FN      Prec    Rec     F0.5
LanguageTool      107     18166   2044    0.0059  0.0497  0.0071
Human 1    	  1420    1965    1195    0.4195  0.543   0.4395
Human 2           1420    1195    1965    0.543   0.4195  0.5128
------------ Span-Based Detection -----------------------------
                  TP      FP      FN      Prec    Rec     F0.5
Human 1           1675    1710    940     0.4948  0.6405  0.5184
LanguageTool      873     17393   1813    0.0478  0.325   0.0576
===============================================================

gec-fluency

=========== Span-Based Correction =============================
                  TP      FP      FN      Prec    Rec     F0.5
Human 1           1626    3227    1718    0.3351  0.4862  0.3573
LanguageTool	  121     18540   2735    0.0065  0.0424  0.0078

------------ Span-Based Detection -----------------------------
                  TP      FP      FN      Prec    Rec     F0.5
Human 1           2000    2853    1344    0.4121  0.5981  0.4394
================================================================

Human 1 -- annotator 2 is a hypothesis, annotator 1 is the reference
Human 2 -- annotator 1 is a hypothesis, annotator 2 is the reference
LanguageTool -- version 5.9

@osyvokon osyvokon marked this pull request as ready for review December 13, 2022 20:33
@osyvokon
Copy link
Contributor Author

Merged in #26

@osyvokon osyvokon closed this Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant