What is the source of the dataset? #3

Vedant2311 · 2020-12-29T17:44:51Z

Hello. My name is Vedant and I am working on a project related to Indic MT at IIT Delhi. The stats about the dataset as mentioned in the readme are very impressive. We would like to make use for your training data but before that it would be much helpful to us if you can provide more information about the dataset. Like if there is any published research paper associated with your dataset, how did you get such a large dataset, was any human monitoring involved while curating the dataset etc.

As all this information is not present in the readme, it would be much helpful to us if you can help fill this gap :))

himanshudce · 2020-12-30T17:31:26Z

Hi actually most of the dataset is from - http://opus.nlpl.eu/ and other some other opensource place, You can check how dataset is created from opus, then we preprocessed as mentioned in the paper. Thanks and Regards

…

On Tue, Dec 29, 2020, 11:15 PM Vedant Raval ***@***.***> wrote: Hello. My name is Vedant and I am working on a project related to Indic MT at IIT Delhi. The stats about the dataset as mentioned in the readme are very impressive. We would like to make use for your training data but before that it would be much helpful to us if you can provide more information about the dataset. Like if there is any published research paper associated with your dataset, how did you get such a large dataset, was any human monitoring involved while curating the dataset etc. As all this information is not present in the readme, it would be much helpful to us if you can help fill this gap :)) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGAA5E7ETS7QITSTEWQJI3DSXIISJANCNFSM4VNOHYMQ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the source of the dataset? #3

What is the source of the dataset? #3

Vedant2311 commented Dec 29, 2020

himanshudce commented Dec 30, 2020 via email

What is the source of the dataset? #3

What is the source of the dataset? #3

Comments

Vedant2311 commented Dec 29, 2020

himanshudce commented Dec 30, 2020 via email