Skip to content

Add custom dataset and dataloader tutorial for C++ #841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

dhpollack
Copy link

I was struggling with creating a custom dataset and dataloader for the
C++ frontend partially because I couldn't find a good minimal example
and ran into a few gotchas when trying to model something after the
MNIST dataset. This tutorial might help others avoid the pitfalls that
I ran into.

Signed-off-by: David Pollack david@da3.net

@netlify
Copy link

netlify bot commented Jan 31, 2020

Deploy Preview for pytorch-tutorials-preview ready!

Name Link
🔨 Latest commit f4cef8c
🔍 Latest deploy log https://app.netlify.com/sites/pytorch-tutorials-preview/deploys/624c5dd3cfa6290008490c33
😎 Deploy Preview https://deploy-preview-841--pytorch-tutorials-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@boulabiar
Copy link

Base automatically changed from master to main February 16, 2021 19:33
Base automatically changed from main to master February 16, 2021 19:37
@jspisak
Copy link
Contributor

jspisak commented Mar 9, 2021

@glaringlee - can you review this tutorial and suggest any updates?

@glaringlee glaringlee self-requested a review March 9, 2021 18:37
Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhpollack Thanks for contributing this tutorial. This generally looks good to me. I slightly commented it.

This tutorial have 2 major parts, one is how to create a customized dataset/dataloader workflow, the other one is how to build it with libtorch. Can you put two reference link into this tutorial as well? We have a code example for creating dataset/dataloader and we have a documentation for build libtorch with customized code, but we have no tutorial to put all of these together, I think this tutorial is a good place to have all of these references.
code example: https://github.com/pytorch/examples/blob/master/cpp/custom-dataset/custom-dataset.cpp
libtorch build document: https://pytorch.org/cppdocs/installing.html

@glaringlee
Copy link
Contributor

There are some CI tests failures, but it seems not related to this PR. Let's see whether CI tests is recovered after this PR is updated.

dhpollack and others added 2 commits March 10, 2021 19:00
I was struggling with creating a custom dataset and dataloader for the
C++ frontend partially because I couldn't find a good minimal example
and ran into a few gotchas when trying to model something after the
MNIST dataset.  This tutorial might help others avoid the pitfalls that
I ran into.

Signed-off-by: David Pollack <david@da3.net>
@dhpollack dhpollack force-pushed the dhp/cpp_dataset_tutorial branch from 06a6126 to 5255c0a Compare March 10, 2021 18:01
@dhpollack
Copy link
Author

@glaringlee @jspisak I need to test this with the newest version of pytorch as I created this over a year ago and haven't tried it out with the newest versions of libtorch. I am on vacation this week, but I can try it this weekend / Monday when I get back.

@dhpollack
Copy link
Author

@glaringlee I tested this with libtorch 1.8 and gcc 10.2 and it works on my system. Good to go unless you have some other comments.

@glaringlee
Copy link
Contributor

@dhpollack This is good to me, there is a CI failure, we will merge this after clear the CI failure.

@glaringlee
Copy link
Contributor

@brianjo
ci/circleci: pytorch_tutorial_windows_pr_build_worker_0 is failed due to a python module missing. Is this an issue?

Copy link
Contributor

@glaringlee glaringlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jspisak @brianjo This looks good to me. There is a CI failure in ci/circleci: pytorch_tutorial_windows_pr_build_worker_0, but should not be related to this PR in my opinion.

@dhpollack
Copy link
Author

@brianjo
ci/circleci: pytorch_tutorial_windows_pr_build_worker_0 is failed due to a python module missing. Is this an issue?

Yea, librosa is an audio module. Probably related to one of the audio tutorials. Has nothing to do with this PR.

@brianjo
Copy link
Contributor

brianjo commented Mar 16, 2021

Rebasing this some fixes I added to master.

@brianjo brianjo self-assigned this Mar 16, 2021
@jspisak
Copy link
Contributor

jspisak commented Mar 21, 2021

@guyang3532 - looks like some Windows failures are occuring - can you take a look?

@brianjo
Copy link
Contributor

brianjo commented Mar 21, 2021

@guyang3532 - looks like some Windows failures are occuring - can you take a look?

I think this just needs a rebase. I did that. Should fix the issues.

@brianjo
Copy link
Contributor

brianjo commented Mar 21, 2021 via email

@brianjo brianjo requested a review from VitalyFedyunin June 22, 2021 16:47
@dhpollack
Copy link
Author

stale

@dhpollack dhpollack closed this Apr 18, 2023
@IncredibleMoney
Copy link

how to build a custom dataset and dataloader of an object detection based task(that is to say each image has different numbers of anchor. When iterating dataloader, batch cannot easily stack)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants