Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commercial use issue #9

Closed
SimonHFL opened this issue Jun 28, 2021 · 2 comments
Closed

Commercial use issue #9

SimonHFL opened this issue Jun 28, 2021 · 2 comments
Labels
question Further information is requested

Comments

@SimonHFL
Copy link

SimonHFL commented Jun 28, 2021

Hey @PrithivirajDamodaran

The readme states that Gramformer versions above 1.0 are allowed for commercial use - however, this is not currently the case as the grammar_error_correcter_v1 model has been trained using the non-commercial WI&Locness data, even though the documentation states otherwise:

The grammar_error_correcter_v1 model is actually identical to the previous grammar_error_correcter model which is trained using the non-commercial WI&Locness data – they have identical weights, which you can verify with this script

As the models are the same, this means that both models have been trained using the non-commercial WI&Locness data, and the grammar_error_correcter_v1 model along with Gramformer v1.1 and v1.2 should not be allowed for commercial use.

Could you please update the readme to clarify this, or upload a new model that has not been trained using WI&Locness?

Thanks

@PrithivirajDamodaran
Copy link
Owner

PrithivirajDamodaran commented Jun 29, 2021

Hey @SimonHFL

  • As per our email exchange the pre-release model (prithivida/grammar_error_correcter) was trained using filtered WikiEdits data and on top of that, a slice of WI&Locness is used. Because at the time WI&Locness was available as a HuggingFace dataset with no license, in fact, marked as "unknown" (Below are the proofs for that, find attached the screenshots). I have already mentioned this information in the email thread, to which you said it was probably an unintentional miss at the end of the people who uploaded the dataset to HuggingFace. So, to reiterate no intention to undermine anyone's academic work or violate a valid license policy, I merely used it based on the license info shown ( as "Unknown") at that point in time.

Screenshot 2021-06-19 at 8 54 44 PM
Screenshot 2021-06-19 at 8 55 11 PM

  • (I can see that you have/had them update the license info recently.)
  • But after you pointed out a possible gap/missing info on the license on the HuggingFace page, I acknowledged that in the email (also mentioned I am anyway in the process of gathering more WikiEdits data to train the subsequent models) and I did the following: A.) Explicitly called out the pre-release model is not intended for commercial usage in Github, B.) Did the same in HuggingFace readme and C.) Trained a brand new model excluding WI&Locness. That is the _V1 model.
  • _V1 model (prithivida/grammar_error_correcter_v1) is trained using WikiEdit pairs and other synthetic pairs (refer to the readme for details)
  • Your script is saying both pre-release and V1 models are identical because there might be an inadvertent oversight on my side in picking the right checkpoints while uploading to the tag v1.
  • I have refreshed the v1 tag with the right checkpoint files now and double-checked. See below

Screenshot 2021-06-29 at 7 37 14 AM

  • Also to avoid any future unintentional non-compliance in the usage from the consumers of the package <= v1.0 and hence the pre-release model (prithivida/grammar_error_correcter), I can remove it from HuggingFace.

Thanks

@PrithivirajDamodaran PrithivirajDamodaran added the question Further information is requested label Jun 29, 2021
@SimonHFL
Copy link
Author

Thanks for fixing this! Now it seems there should not be any issue with commercial use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants