-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an example with torchdata and torchserve #1940
Conversation
testset = datasets.MNIST('./MNIST_dataset', download=True, train=False, transform=image_transform) | ||
|
||
# Creating the dataloader. | ||
inference_dataset = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, shuffle=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think the example looks good, it's integrating a torchvision dataset but it's not quite clearly a torchdata integration
Specifically I was hoping we could create some toy torchdata dataset directly without leveraging torchvision. I believe this change would be minor to your code but if it isn't I'm happy to merge this if you have bandwidth to work on the more vanilla torchdata integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. @NivekT Wondering do you have existing pipeline that they can take a reference. for vision benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For DataPipe reference:
- Here is the
torchvision
implementation of loading MNIST - it might be too complicated. One option is to import and directly use that here (similar to howdatasets.MNIST
is used) - A standalone, common example is something like this:
dp = FileLister(str(root), masks=[f"archive_{args.archive_size}*.tar"])
dp = dp.shuffle(buffer_size=10000)
dp = FileOpener(dp, mode="b")
dp = TarArchiveLoader(dp, mode="r:")
dp = dp.shuffle(buffer_size=archive_size)
dp = dp.sharding_filter()
dp = dp.map(pil_loader).map(pil_transformation)
# dp = dp.map(tensor_loader).map(tensor_transformation) # Alternate - convert image to tensor then transform
Separately, I think we should use DataLoader2
instead of the old version in the example. @ejguan WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @msaroufim , @ejguan, @agunapal and @NivekT for your comments and guidance. I have incorporated the required changes. Looking forward to your feedback. Thank you once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick turnaround @PratsBhatt . Looks good. I am approving it. Minor feedback: Please link the example here since its a few levels deep and might be missed by others. https://github.com/pytorch/serve/blob/master/examples/README.md
@PratsBhatt Thanks for taking this up. Overall it looks good, but In this example, we would want to explicitly make use of TorchData features (Ex: DataPipes). You could take a look at this example in TorchData and see if you can modify your current example with with. https://github.com/pytorch/data/blob/main/examples/vision/imagefolder.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing thank you for the quick turnaround
Codecov Report
@@ Coverage Diff @@
## master #1940 +/- ##
=======================================
Coverage 44.95% 44.95%
=======================================
Files 63 63
Lines 2609 2609
Branches 56 56
=======================================
Hits 1173 1173
Misses 1436 1436 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Minor feedback: Please link the example here since its a few levels deep and might be missed by others.
https://github.com/pytorch/serve/blob/master/examples/README.md
Thank you @agunapal and @msaroufim , I have implemented the code changes. Looking forward to merging the PR. |
* Add an example with torchdata * Update comment. * Incorporate code review comments. * Remove unsed imports. * Apply code review comments.
* Add an example with torchdata * Update comment. * Incorporate code review comments. * Remove unsed imports. * Apply code review comments.
Description
The pull request provides a simple example of using torchdata with torchserve.
It uses MNIST as the dataset and task to be solved.
The current example builds on top of the already provided example of MNIST.
Please read our CONTRIBUTING.md prior to creating your first pull request.
Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes #(issue)
Type of change
The current pull request adds an example w.r.t torchdata and torchserve for MNIST model.
It adds an
inference.py
script that takes care of loading the MNIST dataset and do REST calls to torchserver. It adds a newmnist_handler.py
script which adds a preprocessing step to convert the payload of the REST request to tensor as well as to output a class number once inference request is finished.The output of the
inference.py
looks as the following.Checklist: