New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update s3 test cases #464
update s3 test cases #464
Conversation
@ejguan Could you help enable the datapipes test? |
@ydaiming Sorry what do you mean? |
@ejguan the datapipes tests weren't run automatically because there's no changes made in datapipes source code. Just added a blank line to enable the datapipes test. |
Weird. We don't have such advanced mechanism using GHA. Anyways, as long as the tests start working, it's fine. |
It seems GHA stuck in queued again. We have to wait for a while... |
How long a while are we expecting? ... |
Emmm. Can't estimate the timeline, this depends on Github. Could you verify this is going to work on your local env? |
I've tested on my remote ubuntu EC2 instance and local MacOS system. Both build and pass the test without the |
Interesting. I applied your patch in my env on AWS cluster but I still got the same Error. Line 239 in ec83d11
|
Could you provide a longer backtrace? I'm not able to see that error on both instances. EDIT: tested on another ubuntu EC2 instance, and still no errors. |
@ejguan the datapipes tests are passing in this PR, I'm bit stuck on reproducing the errors you see. Is there any other way that you think we can continue the debug and fix? |
Here is the Error I encountered using your branch:
Let me create a script and run gdb for you. |
So the error still occurs when trying to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for adding this fix so quickly. LGTM.
And, based on our offline discussion, it seems I don't have permission to run configuration for aws on our AWS cluster. So, the Error on my local environment is not related to the implementation. |
@ejguan updated. Please review. Thanks. |
pip uninstall torchdata -y | ||
pip uninstall torchdata -y | ||
git clone https://github.com/pytorch/data.git | ||
cd data | ||
python setup.py clean | ||
python setup.py install | ||
BUILD_S3=1 python setup.py install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duplicate uninstall
Could you please also add this link https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#installation to https://github.com/pytorch/data/blob/main/README.md (in from source section) to let users know they can install with AWSSDK enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
README.md
Outdated
@@ -101,6 +101,8 @@ assert batch['text'][0][0:8] == ['Wall', 'St.', 'Bears', 'Claw', 'Back', 'Into', | |||
python setup.py install | |||
``` | |||
|
|||
In you'd like to include the S3 IO datapipes and aws-sdk-cpp, you may also follow [the instructions here](https://github.com/pytorch/data/blob/1d1cdbefe38041e067cda835ced9b5a5c59b3e5b/torchdata/datapipes/iter/load/README.md#build-from-source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/load/README.md
to always link it to the main branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes pytorch#460 ### Changes - update the s3 test cases due to an update in the public dataset Pull Request resolved: pytorch#464 Reviewed By: NivekT Differential Revision: D36678285 Pulled By: ejguan fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
Summary: Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes pytorch#460 ### Changes - update the s3 test cases due to an update in the public dataset Pull Request resolved: pytorch#464 Reviewed By: NivekT Differential Revision: D36678285 Pulled By: ejguan fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
Summary: Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to creating your pull request. - Note that there is a section on requirements related to adding a new DataPipe. Fixes #460 ### Changes - update the s3 test cases due to an update in the public dataset Pull Request resolved: #464 Reviewed By: NivekT Differential Revision: D36678285 Pulled By: ejguan fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
Please read through our contribution guide prior to
creating your pull request.
Fixes #460
Changes