Request for a smaller dataset for researchers with lesser resources #81

vj68 · 2019-10-23T21:40:36Z

Thank you for making this amazing problem statement public, along with a very comprehensive dataset!

Can a relatively smaller size dataset ( a subset ) of it be made available for independent developers/researchers who might try running this on their personal machines ?

This will open up the problem for a larger audience and may bring in some innovative solutions!

bzz · 2019-10-27T14:16:29Z

Hi @rajurajvijay619, did you try using just a single language for the experiments?

E.g for java, I find total 500k samples from 184Mb of .gz to be very comfortably manageable on a laptop.

As one can see from published analysis example,

languages like Go, JS or Ruby would give even smaller dataset sizes and fit on almost any local machine.

Hope this helps and good luck with experiments!

sara-02 · 2019-10-28T08:47:08Z

@bzz just one question, when running for a single language(local machine), does the setup still requires GPUs?

hamelsmu · 2019-10-29T00:22:55Z

@sara-02 you can download data without GPUs, however running the default models in this repo will be painfully slow without gpus. However, you can try training on a smaller sample of the data as @bzz proposes, you can also set this parameter to limit the size of the data.

Also, google colab notebooks are great for free GPUs. Thanks for getting involved with this project ❤️

hamelsmu · 2019-10-29T00:24:21Z

@rajurajvijay619 can you describe your constraints a bit more? Is it disk size for downloading the dataset? Can you download the entire dataset and just sample from that?

Thanks for your feedback

sara-02 · 2019-10-29T09:22:42Z

@sara-02 you can download data without GPUs, however running the default models in this repo will be painfully slow without gpus. However, you can try training on a smaller sample of the data as @bzz proposes, you can also set this parameter to limit the size of the data.

Also, google colab notebooks are great for free GPUs. Thanks for getting involved with this project heart

Thanks. I will look into colab as well as running it locally with only one language. I was hesitant to start because the first set in setup states that Additionally, you must install Nvidia-Docker to satisfy GPU-compute related dependencies. So, I thought the code might not run as-is on a local system with GPUs.

hamelsmu · 2019-10-29T15:01:47Z

@sara-02 you are correct regarding docker. I think in the end it could make your life easier to use the Docker setup, as installing all the dependencies by hand can become very cumbersome and brittle.

Let me know where you are struggling with Docker and I will be more than happy to help! I wrote this tutorial regarding Docker incase a gentle introduction is useful.

Looking forward to see what you do with this dataset! Please do not be shy in asking questions!

hamelsmu · 2019-10-29T15:03:43Z

if you are using collab, I do not believe you will be able to use Docker, in that case you will have to install via pip all the dependencies defined in the Dockerfile in the Collab notebook

hamelsmu · 2019-10-29T17:44:19Z

I'll go ahead and close this issue, please lmk if there are any more questions

vj68 changed the title ~~A smaller dataset for researchers with lesser resources~~ Request for a smaller dataset for researchers with lesser resources Oct 23, 2019

hamelsmu closed this as completed Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for a smaller dataset for researchers with lesser resources #81

Request for a smaller dataset for researchers with lesser resources #81

vj68 commented Oct 23, 2019

bzz commented Oct 27, 2019

sara-02 commented Oct 28, 2019

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

sara-02 commented Oct 29, 2019 •

edited

Loading

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

Request for a smaller dataset for researchers with lesser resources #81

Request for a smaller dataset for researchers with lesser resources #81

Comments

vj68 commented Oct 23, 2019

bzz commented Oct 27, 2019

sara-02 commented Oct 28, 2019

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

sara-02 commented Oct 29, 2019 • edited Loading

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

hamelsmu commented Oct 29, 2019

sara-02 commented Oct 29, 2019 •

edited

Loading