Commercial use issue #9

SimonHFL · 2021-06-28T08:29:12Z

The readme states that Gramformer versions above 1.0 are allowed for commercial use - however, this is not currently the case as the grammar_error_correcter_v1 model has been trained using the non-commercial WI&Locness data, even though the documentation states otherwise:

The grammar_error_correcter_v1 model is actually identical to the previous grammar_error_correcter model which is trained using the non-commercial WI&Locness data – they have identical weights, which you can verify with this script

As the models are the same, this means that both models have been trained using the non-commercial WI&Locness data, and the grammar_error_correcter_v1 model along with Gramformer v1.1 and v1.2 should not be allowed for commercial use.

Could you please update the readme to clarify this, or upload a new model that has not been trained using WI&Locness?

Thanks

PrithivirajDamodaran · 2021-06-29T03:01:23Z

Hey @SimonHFL

As per our email exchange the pre-release model (prithivida/grammar_error_correcter) was trained using filtered WikiEdits data and on top of that, a slice of WI&Locness is used. Because at the time WI&Locness was available as a HuggingFace dataset with no license, in fact, marked as "unknown" (Below are the proofs for that, find attached the screenshots). I have already mentioned this information in the email thread, to which you said it was probably an unintentional miss at the end of the people who uploaded the dataset to HuggingFace. So, to reiterate no intention to undermine anyone's academic work or violate a valid license policy, I merely used it based on the license info shown ( as "Unknown") at that point in time.

(I can see that you have/had them update the license info recently.)
But after you pointed out a possible gap/missing info on the license on the HuggingFace page, I acknowledged that in the email (also mentioned I am anyway in the process of gathering more WikiEdits data to train the subsequent models) and I did the following: A.) Explicitly called out the pre-release model is not intended for commercial usage in Github, B.) Did the same in HuggingFace readme and C.) Trained a brand new model excluding WI&Locness. That is the _V1 model.
_V1 model (prithivida/grammar_error_correcter_v1) is trained using WikiEdit pairs and other synthetic pairs (refer to the readme for details)
Your script is saying both pre-release and V1 models are identical because there might be an inadvertent oversight on my side in picking the right checkpoints while uploading to the tag v1.
I have refreshed the v1 tag with the right checkpoint files now and double-checked. See below

Also to avoid any future unintentional non-compliance in the usage from the consumers of the package <= v1.0 and hence the pre-release model (prithivida/grammar_error_correcter), I can remove it from HuggingFace.

Thanks

SimonHFL · 2021-06-29T11:24:01Z

Thanks for fixing this! Now it seems there should not be any issue with commercial use.

PrithivirajDamodaran added the question Further information is requested label Jun 29, 2021

PrithivirajDamodaran closed this as completed Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commercial use issue #9

Commercial use issue #9

SimonHFL commented Jun 28, 2021 •

edited

PrithivirajDamodaran commented Jun 29, 2021 •

edited

SimonHFL commented Jun 29, 2021

Commercial use issue #9

Commercial use issue #9

Comments

SimonHFL commented Jun 28, 2021 • edited

PrithivirajDamodaran commented Jun 29, 2021 • edited

SimonHFL commented Jun 29, 2021

SimonHFL commented Jun 28, 2021 •

edited

PrithivirajDamodaran commented Jun 29, 2021 •

edited