Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update s3 test cases #464

Closed
wants to merge 5 commits into from
Closed

update s3 test cases #464

wants to merge 5 commits into from

Conversation

ydaiming
Copy link
Contributor

@ydaiming ydaiming commented May 25, 2022

Please read through our contribution guide prior to
creating your pull request.

  • Note that there is a section on requirements related to adding a new DataPipe.

Fixes #460

Changes

  • update the s3 test cases due to an update in the public dataset

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2022
@ydaiming
Copy link
Contributor Author

@ejguan Could you help enable the datapipes test?

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

@ydaiming Sorry what do you mean?

@ydaiming
Copy link
Contributor Author

@ejguan the datapipes tests weren't run automatically because there's no changes made in datapipes source code. Just added a blank line to enable the datapipes test.

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

@ejguan the datapipes tests weren't run automatically because there's no changes made in datapipes source code. Just added a blank line to enable the datapipes test.

Weird. We don't have such advanced mechanism using GHA. Anyways, as long as the tests start working, it's fine.

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

It seems GHA stuck in queued again. We have to wait for a while...

@ydaiming
Copy link
Contributor Author

How long a while are we expecting? ...

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

How long a while are we expecting? ...

Emmm. Can't estimate the timeline, this depends on Github. Could you verify this is going to work on your local env?

@ydaiming
Copy link
Contributor Author

I've tested on my remote ubuntu EC2 instance and local MacOS system. Both build and pass the test without the ValueError.

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

I've tested on my remote ubuntu EC2 instance and local MacOS system. Both build and pass the test without the ValueError.

Interesting. I applied your patch in my env on AWS cluster but I still got the same Error.
The Error comes from iterating the s3_lister_dp not from comparing result with input[1].

s3_lister_dp = S3FileLister(IterableWrapper(input[0]), region="us-west-2")

@ydaiming
Copy link
Contributor Author

ydaiming commented May 25, 2022

Could you provide a longer backtrace? I'm not able to see that error on both instances.

EDIT: tested on another ubuntu EC2 instance, and still no errors.

@ydaiming
Copy link
Contributor Author

@ejguan the datapipes tests are passing in this PR, I'm bit stuck on reproducing the errors you see. Is there any other way that you think we can continue the debug and fix?

@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

@ejguan the datapipes tests are passing in this PR, I'm bit stuck on reproducing the errors you see. Is there any other way that you think we can continue the debug and fix?

Here is the Error I encountered using your branch:

test_s3_io_iterdatapipe (__main__.TestDataPipeRemoteIO) ... ERROR

======================================================================
ERROR: test_s3_io_iterdatapipe (__main__.TestDataPipeRemoteIO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/shared/erjia/data/test/test_remote_io.py", line 237, in test_s3_io_iterdatapipe
    self.assertEqual(sum(1 for _ in s3_lister_dp), input[1], f"{input[0]} failed")
  File "/shared/erjia/data/test/test_remote_io.py", line 237, in <genexpr>
    self.assertEqual(sum(1 for _ in s3_lister_dp), input[1], f"{input[0]} failed")
  File "/shared/erjia/pytorch/torch/utils/data/datapipes/_typing.py", line 501, in wrap_generator
    response = gen.send(None)
  File "/shared/erjia/data/torchdata/datapipes/iter/load/s3io.py", line 64, in __iter__
    urls = self.handler.list_files(prefix)
ValueError: Access Denied
This exception is thrown by __iter__ of S3FileListerIterDataPipe(length=-1, source_datapipe=IterableWrapperIterDataPipe)

----------------------------------------------------------------------
Ran 1 test in 0.502s

FAILED (errors=1)
srun: error: a100-st-p4d24xlarge-1: task 0: Exited with exit code 1

Let me create a script and run gdb for you.

@ydaiming
Copy link
Contributor Author

@ejguan the datapipes tests are passing in this PR, I'm bit stuck on reproducing the errors you see. Is there any other way that you think we can continue the debug and fix?

Here is the Error I encountered using your branch:

test_s3_io_iterdatapipe (__main__.TestDataPipeRemoteIO) ... ERROR

======================================================================
ERROR: test_s3_io_iterdatapipe (__main__.TestDataPipeRemoteIO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/shared/erjia/data/test/test_remote_io.py", line 237, in test_s3_io_iterdatapipe
    self.assertEqual(sum(1 for _ in s3_lister_dp), input[1], f"{input[0]} failed")
  File "/shared/erjia/data/test/test_remote_io.py", line 237, in <genexpr>
    self.assertEqual(sum(1 for _ in s3_lister_dp), input[1], f"{input[0]} failed")
  File "/shared/erjia/pytorch/torch/utils/data/datapipes/_typing.py", line 501, in wrap_generator
    response = gen.send(None)
  File "/shared/erjia/data/torchdata/datapipes/iter/load/s3io.py", line 64, in __iter__
    urls = self.handler.list_files(prefix)
ValueError: Access Denied
This exception is thrown by __iter__ of S3FileListerIterDataPipe(length=-1, source_datapipe=IterableWrapperIterDataPipe)

----------------------------------------------------------------------
Ran 1 test in 0.502s

FAILED (errors=1)
srun: error: a100-st-p4d24xlarge-1: task 0: Exited with exit code 1

Let me create a script and run gdb for you.

So the error still occurs when trying to list_files(prefix). Could you check what $HOME and $AWS_CONFIG_FILE is and whether there's a $HOME/.aws/config file. If so, what's the content of it?

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for adding this fix so quickly. LGTM.

torchdata/datapipes/iter/load/s3io.py Outdated Show resolved Hide resolved
@ejguan
Copy link
Contributor

ejguan commented May 25, 2022

And, based on our offline discussion, it seems I don't have permission to run configuration for aws on our AWS cluster. So, the Error on my local environment is not related to the implementation.

@ydaiming
Copy link
Contributor Author

@ejguan updated. Please review. Thanks.

Comment on lines 15 to 19
pip uninstall torchdata -y
pip uninstall torchdata -y
git clone https://github.com/pytorch/data.git
cd data
python setup.py clean
python setup.py install
BUILD_S3=1 python setup.py install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate uninstall

Could you please also add this link https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#installation to https://github.com/pytorch/data/blob/main/README.md (in from source section) to let users know they can install with AWSSDK enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

README.md Outdated
@@ -101,6 +101,8 @@ assert batch['text'][0][0:8] == ['Wall', 'St.', 'Bears', 'Claw', 'Back', 'Into',
python setup.py install
```

In you'd like to include the S3 IO datapipes and aws-sdk-cpp, you may also follow [the instructions here](https://github.com/pytorch/data/blob/1d1cdbefe38041e067cda835ced9b5a5c59b3e5b/torchdata/datapipes/iter/load/README.md#build-from-source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/load/README.md to always link it to the main branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan linked an issue May 25, 2022 that may be closed by this pull request
@ejguan ejguan mentioned this pull request May 26, 2022
ejguan pushed a commit to ejguan/data that referenced this pull request May 26, 2022
Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes pytorch#460

### Changes

- update the s3 test cases due to an update in the public dataset

Pull Request resolved: pytorch#464

Reviewed By: NivekT

Differential Revision: D36678285

Pulled By: ejguan

fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
ejguan pushed a commit to ejguan/data that referenced this pull request May 26, 2022
Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes pytorch#460

### Changes

- update the s3 test cases due to an update in the public dataset

Pull Request resolved: pytorch#464

Reviewed By: NivekT

Differential Revision: D36678285

Pulled By: ejguan

fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
ejguan pushed a commit that referenced this pull request May 26, 2022
Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes #460

### Changes

- update the s3 test cases due to an update in the public dataset

Pull Request resolved: #464

Reviewed By: NivekT

Differential Revision: D36678285

Pulled By: ejguan

fbshipit-source-id: 5e579d20290aec0f4f0a17031fe1bb256a640231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
4 participants