New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create download.py #132
Create download.py #132
Conversation
Codecov Report
@@ Coverage Diff @@
## main #132 +/- ##
==========================================
+ Coverage 17.96% 18.38% +0.42%
==========================================
Files 43 44 +1
Lines 4075 4096 +21
==========================================
+ Hits 732 753 +21
Misses 3343 3343
Continue to review full report at Codecov.
|
kale/utils/download.py
Outdated
@@ -0,0 +1,71 @@ | |||
# ============================================================================= | |||
# Author: Xianyuan Liu, xianyuan.liu@sheffield.ac.uk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may consider to start using your more permanent, personal email (or both emails, as in my case at line 4) as your Shef email will become invalid soon after you leave.
|
||
"""Data downloading and compressed data extraction functions, Based on | ||
https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py | ||
https://github.com/pytorch/pytorch/blob/master/torch/hub.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I assume that you have followed our 3R green ML principles here to reuse pytorch APIs at line 16-17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it should be clear why the original APIs in pytorch is not good/simple enough. Why the user should use this API rather than the pytorch ones directly. What are the added benefits of using this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
download_and_extract_archive
anddownload_url_to_file
are not in the same PyTorch file and It would be better to integrate them into one file. This file cover functions to download files with different types from different sources. Using the same format and parameters likeoutput_directory
andoutput_file_name
can also simplify our coding. -
PyTorch provides the basic download function but it may raise issues when the
output_directory
does not exist. I only meet this problem.
kale/utils/download.py
Outdated
logging.info("Datasets downloaded and extracted in {}".format(file)) | ||
|
||
|
||
def download_file_by_url(url, output_directory, output_file_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce: Quite some repetitions here. I suggest to merge download_compressed_file_by_url
with this one by either
- add a flag to indicate whether it is compressed, or even better
- auto detect whether the file is in (supported) compressed format (matching against ".tar.xz", ".tar", ".tar.gz", ".tgz", ".gz", ".zip") to switch to either
download_and_extract_archive
ordownload_url_to_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our pre-commit hooks have been tested for long and are quite stable so I strongly suggest you to use the automation rather than manual Run pre-commit auto format in your commits to improve efficiency, which is the professional practice and also has a cleaner commit history.
You should consider to have tests written at the same time. This will save the overall coding time because you are working on the same functionality and also save review time because the tests help to validate the code as well.
See the comment at the top. Clarify the benefits against using pytorch APIs directly.
Also, this PR did not successfully complete the auto project assignment action (https://github.com/pykale/pykale/runs/2471407104?check_suite_focus=true), saying "Resource not accessible by integration". This is not critical but weird. In this case, we need to manually assign the project (please do). Not sure whether this is because the PR is from a fork. There were no problems before this PR. You do not need to solve the problem (except adding project manually). It is just my observation and we may see how to solve it later. I will add a card. But as said earlier, let us try to work on direct branches rather than forks. |
On On my system, I cannot make a commit before pass the pre-commit checks, I believe this is the case for all other members, except you, so your commit history is noisier and unnecessarily long. |
You can consider VS code for committing to GitHub if PyCharm problems cannot be fixed, Shuo used both. I am experienced in VS code. |
Ready.
This action seems successful from the link. You may have helped me correct it. Thanks!
Sorry about that. I create a new branch via Desktop and the default one is forked. I will check and use the direct one.
I will solve the problem. I commit and push via Desktop/PyCharm with automatic pre-commit. In the first commit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. This will greatly simplify our related coding!
Fixes #128 option 2.
Description
Create two data downloading functions for files and compressed files via GitHub URL. It hasn't cover @RaivoKoot 's one, which downloads files from Google drive.
Status
Ready
Types of changes