Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add s3&azure blob&google cloud support for tb_plugin #182

Merged
merged 30 commits into from
May 11, 2021
Merged

Conversation

guotuofeng
Copy link
Contributor

Try to add the s3 bucket support

@guotuofeng guotuofeng self-assigned this Apr 22, 2021
@guotuofeng guotuofeng added the plugin PyTorch Profiler TensorBoard Plugin related label Apr 22, 2021
@guotuofeng guotuofeng changed the title add s3 support for tb_plugin add s3&azure blob support for tb_plugin Apr 23, 2021
@chauhang chauhang requested a review from gdankel April 29, 2021 00:32
tb_plugin/setup.py Outdated Show resolved Hide resolved
@chauhang
Copy link

@guotuofeng Thanks for the PR. Please add following as well:

  1. Update the Readme with the details for the ENV variables and optional installs (for boto3 and azureblob)
  2. Add support for GCS
  3. Review tf.io.gfile and see if there is a possibility of reusing the code vs copying over the logic

@chauhang chauhang self-requested a review April 29, 2021 02:27
@guotuofeng
Copy link
Contributor Author

  1. Update the Readme with the details for the ENV variables and optional installs (for boto3 and azureblob)
    The Readme under tb_plugin will be updated.
  2. Add support for GCS
    The support for Google Cloud will be in a separated PR after current PR is merged. Add Google Cloud support for tb_plugin #197 is used to track it.
  3. Review tf.io.gfile and see if there is a possibility of reusing the code vs copying over the logic
    we cannot reuse tf.io.gfile since it need install tensorflow. In case of you mean gfile in tensorboard, we cannot reuse it either since we add path functionality, download file, built in walk, to it. On the other side, the tensorboard gfile will have the possiblity to change since it is not public to end user. If we depend it, any breaking change in it will break our plugin as well. This is the reason that I don't choose directly use the gfile instead of copy some logic and add our supports.

@Reubend
Copy link

Reubend commented May 1, 2021

Hi @guotuofeng would you mind adding me as a reviweer?

@guotuofeng
Copy link
Contributor Author

Hi @guotuofeng would you mind adding me as a reviweer?

sorry, I could not find you when trying to add "Reubend" as reviewer. do you have permission for kineto?

@Reubend Reubend self-requested a review May 2, 2021 21:57
@Reubend
Copy link

Reubend commented May 2, 2021

sorry, I could not find you when trying to add "Reubend" as reviewer. do you have permission for kineto?

My bad, seems like some automated system removed me from the org and I had to add myself back in to get it working. No need for any action on your part; I'll get back to you with a full review soon.

Copy link

@Reubend Reubend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, can you clarify which of these files/functions/areas are taken from existing code? io/file.py is huge, so I'm wondering if all that needs to be re-reviewed, or if it's just taken from elsewhere.

Second, can you clarify exactly what parts of this are necessary that gfile from TensorBoard doesn't provide already? I'd prefer not to introduce redundant code too much. You mentioned things like adding path functionality, but is it possible to just add that method to the class and provide implementations for S3+Azure?

You mentioned that if the file changes, it could break our code here. That's a good point, but we're also not going to get upstream fixes unless we merge manually. And it makes sense to me to use some external module, even if it means taking on a dependency, instead of including code for local+S3+Azure filesystems directly in the profiler.

tb_plugin/torch_tb_profiler/consts.py Show resolved Hide resolved
tb_plugin/test/test_tensorboard_end2end.py Show resolved Hide resolved
tb_plugin/torch_tb_profiler/io/cache.py Show resolved Hide resolved
tb_plugin/torch_tb_profiler/io/file.py Outdated Show resolved Hide resolved
@guotuofeng
Copy link
Contributor Author

guotuofeng commented May 6, 2021

First of all, can you clarify which of these files/functions/areas are taken from existing code? io/file.py is huge, so I'm wondering if all that needs to be re-reviewed, or if it's just taken from elsewhere.

  1. The File class is copied from gfile
  2. S3FileSystem is copied from gfile except that we 1), add download_file to support cache, and 2), add aws key support in Init, gfile should only support anonymous access.
  3. LocalFileSystem is copied from gfile except walk is added to improve the performance of gflile.walk.

Second, can you clarify exactly what parts of this are necessary that gfile from TensorBoard doesn't provide already? I'd prefer not to introduce redundant code too much. You mentioned things like adding path functionality, but is it possible to just add that method to the class and provide implementations for S3+Azure?

  1. the class implement PathBase is added to support the path that isn't in tensorbord gfile.py.
  2. The BaseFileSystem is added to support the abstraction which doens't exist in gfile.
  3. The AzureBlobSystem is added to support Azure Blob. Google Cloud would be added in next PR.
  4. get_filesystem is changed to support Azure Blobs URLs

You mentioned that if the file changes, it could break our code here. That's a good point, but we're also not going to get upstream fixes unless we merge manually. And it makes sense to me to use some external module, even if it means taking on a dependency, instead of including code for local+S3+Azure filesystems directly in the profiler.

@guotuofeng guotuofeng changed the title add s3&azure blob support for tb_plugin add s3&azure blob&google cloud support for tb_plugin May 7, 2021
2. Fix the logging issue with default warning level in multiprocessing.spawn mode when invoking __setstate__.
3. Refactoring test code
@chauhang
Copy link

@Reubend Do you have any other inputs for optimizing from Tensorboard point of view?

@guotuofeng I have tested with S3, GCS, Minio and Azure. Things are working fine for loading the logs on Ubuntu 18.04 and MacOS 11.3.1

Please add the unit tests for loading from cloud storage. You can check for similar tests in core Tensorboard side

@guotuofeng
Copy link
Contributor Author

@chauhang could I use gs://pe-tests-public/tb_samples/ for our testing? I'd like to make sure it will not be deleted.

@guotuofeng guotuofeng merged commit edf8708 into plugin/0.2 May 11, 2021
@Reubend
Copy link

Reubend commented May 12, 2021

I still think it would be better to simply import the original gfile, and then add your own methods (or override when necessary the original ones). Maybe others can comment here as well with their thoughts - but making your own version means that you need to manually pull in new fixes from the original gfile if there are optimizations/patches that go into it.

@guotuofeng
Copy link
Contributor Author

I still think it would be better to simply import the original gfile, and then add your own methods (or override when necessary the original ones). Maybe others can comment here as well with their thoughts - but making your own version means that you need to manually pull in new fixes from the original gfile if there are optimizations/patches that go into it.

The consideration that I don't import the gfile is that there are lots of logic needed to be added/changed in original gfile.py. The patch/hacking way seems awkward. I forked this file and improve its performance, add S3 credential support, add path support, change the implementation of get_filesystem to support Azure Blob.

It is very hard to reuse the gfile without any changes.

@Reubend
Copy link

Reubend commented May 12, 2021

In that case, I think you need to put a big comment at the top explaining that this code was forked from gfile, and explaining each of the changes you made to the original. Maybe you can also say how to port a filesystem from the original gfile to your own implementation? That comment would help people understand the origins of all this code, and how to work with it when they need to integrate something new.

Also, since you're making your own version, I think you should make separate files for each filesystem class (S3FileSystem.py, AzureBlobSystem.py, and GoogleBlobSystem.py). That way, it's much cleaner organizationally.

@guotuofeng guotuofeng deleted the myguo/s3 branch May 12, 2021 09:53
@guotuofeng
Copy link
Contributor Author

In that case, I think you need to put a big comment at the top explaining that this code was forked from gfile, and explaining each of the changes you made to the original. Maybe you can also say how to port a filesystem from the original gfile to your own implementation? That comment would help people understand the origins of all this code, and how to work with it when they need to integrate something new.

Also, since you're making your own version, I think you should make separate files for each filesystem class (S3FileSystem.py, AzureBlobSystem.py, and GoogleBlobSystem.py). That way, it's much cleaner organizationally.

#220 is created to fix the all the pending comments.

@guotuofeng
Copy link
Contributor Author

@Reubend, would you please take a the new PR?

@Reubend
Copy link

Reubend commented May 12, 2021

@Reubend, would you please take a the new PR?

Sure, sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed plugin PyTorch Profiler TensorBoard Plugin related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants