Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training data , Data Format #2

Closed
saramoeini20 opened this issue May 14, 2023 · 5 comments
Closed

Training data , Data Format #2

saramoeini20 opened this issue May 14, 2023 · 5 comments

Comments

@saramoeini20
Copy link

Hello,
I have two questions,

  1. I saw dev-text folder and it had more than one .txt file. I know about source/target file ( it has incorect/correct pair , right?) but what about other files that has same sentences with different order/grammar?

  2. From what I have found, for training phase we need source/target file (incorrect/correct sentence) but what about M2 format? Is it for just enhancing the model or other things?

@mrqorib
Copy link
Collaborator

mrqorib commented May 16, 2023

Hi @saramoeini20,

  1. The dev-txt folder contains corrections of the source.txt by the base GEC systems.
  2. Actually only the M2 files are used for training/testing. The files in *-text are used to generate the M2 files if they don't exist in the *-m2 folder.

I hope this answers your questions.

@saramoeini20
Copy link
Author

So for training only M2 format is needed for GEC or in your case it is that way?

@saramoeini20
Copy link
Author

And also you used base models output only or used their models too?

@mrqorib
Copy link
Collaborator

mrqorib commented May 16, 2023

Yes, only the M2 format is needed.

ESC only requires the outputs, you don't need the models' weights.

@saramoeini20
Copy link
Author

Thank you.

@mrqorib mrqorib closed this as completed May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants