-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add s3&azure blob&google cloud support for tb_plugin #182
Conversation
@guotuofeng Thanks for the PR. Please add following as well:
|
|
Hi @guotuofeng would you mind adding me as a reviweer? |
sorry, I could not find you when trying to add "Reubend" as reviewer. do you have permission for kineto? |
My bad, seems like some automated system removed me from the org and I had to add myself back in to get it working. No need for any action on your part; I'll get back to you with a full review soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, can you clarify which of these files/functions/areas are taken from existing code? io/file.py
is huge, so I'm wondering if all that needs to be re-reviewed, or if it's just taken from elsewhere.
Second, can you clarify exactly what parts of this are necessary that gfile
from TensorBoard doesn't provide already? I'd prefer not to introduce redundant code too much. You mentioned things like adding path functionality, but is it possible to just add that method to the class and provide implementations for S3+Azure?
You mentioned that if the file changes, it could break our code here. That's a good point, but we're also not going to get upstream fixes unless we merge manually. And it makes sense to me to use some external module, even if it means taking on a dependency, instead of including code for local+S3+Azure filesystems directly in the profiler.
|
2. Fix the logging issue with default warning level in multiprocessing.spawn mode when invoking __setstate__. 3. Refactoring test code
@Reubend Do you have any other inputs for optimizing from Tensorboard point of view? @guotuofeng I have tested with S3, GCS, Minio and Azure. Things are working fine for loading the logs on Ubuntu 18.04 and MacOS 11.3.1 Please add the unit tests for loading from cloud storage. You can check for similar tests in core Tensorboard side |
@chauhang could I use gs://pe-tests-public/tb_samples/ for our testing? I'd like to make sure it will not be deleted. |
I still think it would be better to simply |
The consideration that I don't import the gfile is that there are lots of logic needed to be added/changed in original gfile.py. The patch/hacking way seems awkward. I forked this file and improve its performance, add S3 credential support, add path support, change the implementation of get_filesystem to support Azure Blob. It is very hard to reuse the gfile without any changes. |
In that case, I think you need to put a big comment at the top explaining that this code was forked from gfile, and explaining each of the changes you made to the original. Maybe you can also say how to port a filesystem from the original gfile to your own implementation? That comment would help people understand the origins of all this code, and how to work with it when they need to integrate something new. Also, since you're making your own version, I think you should make separate files for each filesystem class ( |
#220 is created to fix the all the pending comments. |
@Reubend, would you please take a the new PR? |
Sure, sounds good. |
Try to add the s3 bucket support