Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Get datasets/packages approved by CELA #45

Closed
nikhilrj opened this issue May 7, 2019 · 3 comments
Closed

Get datasets/packages approved by CELA #45

nikhilrj opened this issue May 7, 2019 · 3 comments
Assignees
Labels
engineering Engineering tasks for repo

Comments

@nikhilrj
Copy link

nikhilrj commented May 7, 2019

  • BERT PyTorch Repo
  • Yahoo Answers
  • IMDb Large Movie Review Dataset
@nikhilrj nikhilrj added the engineering Engineering tasks for repo label May 7, 2019
@nikhilrj nikhilrj added this to the Engineering Tasks milestone May 7, 2019
@hlums
Copy link
Collaborator

hlums commented May 13, 2019

@nikhilrj
For the NER notebook, I want to use the CoNLL dataset.
The dataset annotation is here https://www.clips.uantwerpen.be/conll2003/ner/ and the actual data is here https://trec.nist.gov/data/reuters/reuters.html

For text classification on Chinese data, I would like to look into a couple of datasets
https://github.com/facebookresearch/XNLI
http://thuctc.thunlp.org/
Sorry the second web page is Chinese. It says the dataset is free to use by universities, research institutes, companies, and individuals. For commercial usage, one needs to email thunlp@gmail.com to discuss the license.

@heatherbshapiro
Copy link
Contributor

We need to fill this table out for approvals

Dataset Name Dataset URL Usage Type: Internal, Public Demo, Referencing but not distributing, using a script to download data for the user, or Release/Distributing What Microsoft product/service/project is this dataset being used in? If data is being redistributed, where are you publishing the data? GitHub, Jupyter notebook, in a product, etc? When do you need approval by? (Date) Do you plan on keeping the data? Y/N For how long? What terms/license is the data under? URL Do you have to log in or click through or pay to download the data? Y/N Data Type (Image, map, audio, music, X-ray, etc) Is there personal data (personally identifiable information)? (names, emails, telephone numbers, etc) Sample Data
                         
                         

@miguelgfierro
Copy link
Member

hey @hlums, can this issue be closed?

@hlums hlums closed this as completed Aug 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
engineering Engineering tasks for repo
Projects
None yet
Development

No branches or pull requests

4 participants