add s3&azure blob&google cloud support for tb_plugin #182

guotuofeng · 2021-04-22T02:37:03Z

Try to add the s3 bucket support

tb_plugin/setup.py

chauhang · 2021-04-29T02:25:55Z

@guotuofeng Thanks for the PR. Please add following as well:

Update the Readme with the details for the ENV variables and optional installs (for boto3 and azureblob)
Add support for GCS
Review tf.io.gfile and see if there is a possibility of reusing the code vs copying over the logic

guotuofeng · 2021-04-29T06:34:08Z

Update the Readme with the details for the ENV variables and optional installs (for boto3 and azureblob)
The Readme under tb_plugin will be updated.

Add support for GCS
The support for Google Cloud will be in a separated PR after current PR is merged. Add Google Cloud support for tb_plugin #197 is used to track it.

Review tf.io.gfile and see if there is a possibility of reusing the code vs copying over the logic
we cannot reuse tf.io.gfile since it need install tensorflow. In case of you mean gfile in tensorboard, we cannot reuse it either since we add path functionality, download file, built in walk, to it. On the other side, the tensorboard gfile will have the possiblity to change since it is not public to end user. If we depend it, any breaking change in it will break our plugin as well. This is the reason that I don't choose directly use the gfile instead of copy some logic and add our supports.

Reubend · 2021-05-01T21:52:48Z

Hi @guotuofeng would you mind adding me as a reviweer?

guotuofeng · 2021-05-02T02:33:42Z

Hi @guotuofeng would you mind adding me as a reviweer?

sorry, I could not find you when trying to add "Reubend" as reviewer. do you have permission for kineto?

Reubend · 2021-05-02T22:11:14Z

sorry, I could not find you when trying to add "Reubend" as reviewer. do you have permission for kineto?

My bad, seems like some automated system removed me from the org and I had to add myself back in to get it working. No need for any action on your part; I'll get back to you with a full review soon.

Reubend

First of all, can you clarify which of these files/functions/areas are taken from existing code? io/file.py is huge, so I'm wondering if all that needs to be re-reviewed, or if it's just taken from elsewhere.

Second, can you clarify exactly what parts of this are necessary that gfile from TensorBoard doesn't provide already? I'd prefer not to introduce redundant code too much. You mentioned things like adding path functionality, but is it possible to just add that method to the class and provide implementations for S3+Azure?

You mentioned that if the file changes, it could break our code here. That's a good point, but we're also not going to get upstream fixes unless we merge manually. And it makes sense to me to use some external module, even if it means taking on a dependency, instead of including code for local+S3+Azure filesystems directly in the profiler.

tb_plugin/torch_tb_profiler/consts.py

tb_plugin/test/test_tensorboard_end2end.py

tb_plugin/torch_tb_profiler/io/cache.py

tb_plugin/torch_tb_profiler/io/file.py

guotuofeng · 2021-05-06T00:48:45Z

First of all, can you clarify which of these files/functions/areas are taken from existing code? io/file.py is huge, so I'm wondering if all that needs to be re-reviewed, or if it's just taken from elsewhere.

The File class is copied from gfile
S3FileSystem is copied from gfile except that we 1), add download_file to support cache, and 2), add aws key support in Init, gfile should only support anonymous access.
LocalFileSystem is copied from gfile except walk is added to improve the performance of gflile.walk.

Second, can you clarify exactly what parts of this are necessary that gfile from TensorBoard doesn't provide already? I'd prefer not to introduce redundant code too much. You mentioned things like adding path functionality, but is it possible to just add that method to the class and provide implementations for S3+Azure?

the class implement PathBase is added to support the path that isn't in tensorbord gfile.py.
The BaseFileSystem is added to support the abstraction which doens't exist in gfile.
The AzureBlobSystem is added to support Azure Blob. Google Cloud would be added in next PR.
get_filesystem is changed to support Azure Blobs URLs

You mentioned that if the file changes, it could break our code here. That's a good point, but we're also not going to get upstream fixes unless we merge manually. And it makes sense to me to use some external module, even if it means taking on a dependency, instead of including code for local+S3+Azure filesystems directly in the profiler.

…eading lock

2. Fix the logging issue with default warning level in multiprocessing.spawn mode when invoking __setstate__. 3. Refactoring test code

chauhang · 2021-05-11T06:11:43Z

@Reubend Do you have any other inputs for optimizing from Tensorboard point of view?

@guotuofeng I have tested with S3, GCS, Minio and Azure. Things are working fine for loading the logs on Ubuntu 18.04 and MacOS 11.3.1

Please add the unit tests for loading from cloud storage. You can check for similar tests in core Tensorboard side

guotuofeng · 2021-05-11T06:44:14Z

@chauhang could I use gs://pe-tests-public/tb_samples/ for our testing? I'd like to make sure it will not be deleted.

Reubend · 2021-05-12T02:05:55Z

I still think it would be better to simply import the original gfile, and then add your own methods (or override when necessary the original ones). Maybe others can comment here as well with their thoughts - but making your own version means that you need to manually pull in new fixes from the original gfile if there are optimizations/patches that go into it.

guotuofeng · 2021-05-12T02:53:23Z

I still think it would be better to simply import the original gfile, and then add your own methods (or override when necessary the original ones). Maybe others can comment here as well with their thoughts - but making your own version means that you need to manually pull in new fixes from the original gfile if there are optimizations/patches that go into it.

The consideration that I don't import the gfile is that there are lots of logic needed to be added/changed in original gfile.py. The patch/hacking way seems awkward. I forked this file and improve its performance, add S3 credential support, add path support, change the implementation of get_filesystem to support Azure Blob.

It is very hard to reuse the gfile without any changes.

Reubend · 2021-05-12T03:46:53Z

In that case, I think you need to put a big comment at the top explaining that this code was forked from gfile, and explaining each of the changes you made to the original. Maybe you can also say how to port a filesystem from the original gfile to your own implementation? That comment would help people understand the origins of all this code, and how to work with it when they need to integrate something new.

Also, since you're making your own version, I think you should make separate files for each filesystem class (S3FileSystem.py, AzureBlobSystem.py, and GoogleBlobSystem.py). That way, it's much cleaner organizationally.

guotuofeng · 2021-05-12T11:16:46Z

In that case, I think you need to put a big comment at the top explaining that this code was forked from gfile, and explaining each of the changes you made to the original. Maybe you can also say how to port a filesystem from the original gfile to your own implementation? That comment would help people understand the origins of all this code, and how to work with it when they need to integrate something new.

Also, since you're making your own version, I think you should make separate files for each filesystem class (S3FileSystem.py, AzureBlobSystem.py, and GoogleBlobSystem.py). That way, it's much cleaner organizationally.

#220 is created to fix the all the pending comments.

guotuofeng · 2021-05-12T11:20:08Z

@Reubend, would you please take a the new PR?

Reubend · 2021-05-12T23:46:01Z

@Reubend, would you please take a the new PR?

Sure, sounds good.

support s3 buckets

118f64a

guotuofeng mentioned this pull request Apr 22, 2021

[Task] Add support for saving chrome trace files to s3 urls and Azure blob #177

Closed

guotuofeng self-assigned this Apr 22, 2021

facebook-github-bot added the cla signed label Apr 22, 2021

guotuofeng added the plugin PyTorch Profiler TensorBoard Plugin related label Apr 22, 2021

guotuofeng added 13 commits April 22, 2021 11:14

use abstract file system in plugin

e66269a

Merge branch 'plugin/0.2' into myguo/s3

1683424

refactor code

aed2a01

clean python2 compatible import

9b0d4d5

clean setup future import

a06b966

change minimal python version to 3.6

a5df7e8

add aws extra

0cfda36

fix bug related to relpath call

ecdca45

failure test case when there is error to prevent endless loop

8cce99c

make sure test kill the tensorboard process

ce0a3a8

fix abspath with logdir for s3

126fd9a

enable s3 support

f06f8f6

add azure blob support

aa36f45

guotuofeng changed the title ~~add s3 support for tb_plugin~~ add s3&azure blob support for tb_plugin Apr 23, 2021

guotuofeng added 6 commits April 24, 2021 12:45

refactor cache code

32b5e1b

refactor file.py code to remove redundancy code

2be27ff

refactor code

afa7dbc

compress trace file before sending to trace view

c474dcf

remove temporary files when plugin exit

497f950

use regex to extract the worker name

ed16555

chauhang requested a review from gdankel April 29, 2021 00:32

ananthsub reviewed Apr 29, 2021

View reviewed changes

tb_plugin/setup.py Outdated Show resolved Hide resolved

chauhang self-requested a review April 29, 2021 02:27

update readme for s3

5c1f972

guotuofeng mentioned this pull request Apr 30, 2021

tensorboard "--logdir" path can't support "~" as path prefix #195

Closed

Reubend self-requested a review May 2, 2021 21:57

Reubend reviewed May 5, 2021

View reviewed changes

tb_plugin/torch_tb_profiler/consts.py Show resolved Hide resolved

tb_plugin/test/test_tensorboard_end2end.py Show resolved Hide resolved

tb_plugin/torch_tb_profiler/io/cache.py Show resolved Hide resolved

tb_plugin/torch_tb_profiler/io/file.py Outdated Show resolved Hide resolved

guotuofeng added 4 commits May 6, 2021 09:00

add google cloud support

c32b78b

add s3 support for minio

6abefd8

Fix cache pickle error when used in spawn mode related to weakref&thr…

5415e6c

…eading lock

add comments for cache.__getstate__

736fd01

guotuofeng changed the title ~~add s3&azure blob support for tb_plugin~~ add s3&azure blob&google cloud support for tb_plugin May 7, 2021

guotuofeng added 3 commits May 7, 2021 22:00

add test cases for spawn mode

abed3ee

1. Add logging level environment variable for troubleshooting.

bfc2018

2. Fix the logging issue with default warning level in multiprocessing.spawn mode when invoking __setstate__. 3. Refactoring test code

Merge branch 'plugin/0.2' of github.com:pytorch/kineto into myguo/s3

de34d32

add test for gs cloud

c12a961

try to install google cloud

b1e3db9

guotuofeng merged commit edf8708 into plugin/0.2 May 11, 2021

guotuofeng deleted the myguo/s3 branch May 12, 2021 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add s3&azure blob&google cloud support for tb_plugin #182

add s3&azure blob&google cloud support for tb_plugin #182

guotuofeng commented Apr 22, 2021

chauhang commented Apr 29, 2021

guotuofeng commented Apr 29, 2021

Reubend commented May 1, 2021

guotuofeng commented May 2, 2021

Reubend commented May 2, 2021

Reubend left a comment •

edited

Loading

guotuofeng commented May 6, 2021 •

edited

Loading

chauhang commented May 11, 2021

guotuofeng commented May 11, 2021

Reubend commented May 12, 2021

guotuofeng commented May 12, 2021

Reubend commented May 12, 2021

guotuofeng commented May 12, 2021

guotuofeng commented May 12, 2021

Reubend commented May 12, 2021

add s3&azure blob&google cloud support for tb_plugin #182

add s3&azure blob&google cloud support for tb_plugin #182

Conversation

guotuofeng commented Apr 22, 2021

chauhang commented Apr 29, 2021

guotuofeng commented Apr 29, 2021

Reubend commented May 1, 2021

guotuofeng commented May 2, 2021

Reubend commented May 2, 2021

Reubend left a comment • edited Loading

Choose a reason for hiding this comment

guotuofeng commented May 6, 2021 • edited Loading

chauhang commented May 11, 2021

guotuofeng commented May 11, 2021

Reubend commented May 12, 2021

guotuofeng commented May 12, 2021

Reubend commented May 12, 2021

guotuofeng commented May 12, 2021

guotuofeng commented May 12, 2021

Reubend commented May 12, 2021

Reubend left a comment •

edited

Loading

guotuofeng commented May 6, 2021 •

edited

Loading