Training data , Data Format #2

saramoeini20 · 2023-05-14T14:52:06Z

Hello,
I have two questions,

I saw dev-text folder and it had more than one .txt file. I know about source/target file ( it has incorect/correct pair , right?) but what about other files that has same sentences with different order/grammar?
From what I have found, for training phase we need source/target file (incorrect/correct sentence) but what about M2 format? Is it for just enhancing the model or other things?

mrqorib · 2023-05-16T08:32:45Z

The dev-txt folder contains corrections of the source.txt by the base GEC systems.
Actually only the M2 files are used for training/testing. The files in *-text are used to generate the M2 files if they don't exist in the *-m2 folder.

I hope this answers your questions.

saramoeini20 · 2023-05-16T09:01:34Z

So for training only M2 format is needed for GEC or in your case it is that way?

saramoeini20 · 2023-05-16T09:02:59Z

And also you used base models output only or used their models too?

mrqorib · 2023-05-16T09:24:39Z

Yes, only the M2 format is needed.

ESC only requires the outputs, you don't need the models' weights.

saramoeini20 · 2023-05-16T09:38:27Z

Thank you.

mrqorib closed this as completed May 16, 2023

Provide feedback