Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create download.py #132

Merged
merged 12 commits into from Apr 30, 2021
Merged

Create download.py #132

merged 12 commits into from Apr 30, 2021

Conversation

xianyuanliu
Copy link
Member

Fixes #128 option 2.

Description

Create two data downloading functions for files and compressed files via GitHub URL. It hasn't cover @RaivoKoot 's one, which downloads files from Google drive.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • In-line docstrings updated.

@xianyuanliu xianyuanliu added the enhancement Improvement of existing code label Apr 29, 2021
@codecov-commenter
Copy link

codecov-commenter commented Apr 29, 2021

Codecov Report

Merging #132 (d6e98e4) into main (00bda7b) will increase coverage by 0.42%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #132      +/-   ##
==========================================
+ Coverage   17.96%   18.38%   +0.42%     
==========================================
  Files          43       44       +1     
  Lines        4075     4096      +21     
==========================================
+ Hits          732      753      +21     
  Misses       3343     3343              
Impacted Files Coverage Δ
kale/embed/video_feature_extractor.py 0.00% <ø> (ø)
kale/embed/video_i3d.py 0.00% <ø> (ø)
kale/embed/video_res3d.py 0.00% <ø> (ø)
kale/embed/video_se_i3d.py 0.00% <ø> (ø)
kale/embed/video_se_res3d.py 0.00% <ø> (ø)
kale/embed/video_selayer.py 0.00% <ø> (ø)
kale/loaddata/action_multi_domain.py 0.00% <ø> (ø)
kale/loaddata/video_access.py 0.00% <ø> (ø)
kale/pipeline/action_domain_adapter.py 0.00% <ø> (ø)
kale/predict/class_domain_nets.py 0.00% <ø> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00bda7b...d6e98e4. Read the comment docs.

@@ -0,0 +1,71 @@
# =============================================================================
# Author: Xianyuan Liu, xianyuan.liu@sheffield.ac.uk
Copy link
Member

@haipinglu haipinglu Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may consider to start using your more permanent, personal email (or both emails, as in my case at line 4) as your Shef email will become invalid soon after you leave.


"""Data downloading and compressed data extraction functions, Based on
https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py
https://github.com/pytorch/pytorch/blob/master/torch/hub.py
Copy link
Member

@haipinglu haipinglu Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I assume that you have followed our 3R green ML principles here to reuse pytorch APIs at line 16-17

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it should be clear why the original APIs in pytorch is not good/simple enough. Why the user should use this API rather than the pytorch ones directly. What are the added benefits of using this API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. download_and_extract_archive and download_url_to_file are not in the same PyTorch file and It would be better to integrate them into one file. This file cover functions to download files with different types from different sources. Using the same format and parameters like output_directory and output_file_name can also simplify our coding.

  2. PyTorch provides the basic download function but it may raise issues when the output_directory does not exist. I only meet this problem.

logging.info("Datasets downloaded and extracted in {}".format(file))


def download_file_by_url(url, output_directory, output_file_name):
Copy link
Member

@haipinglu haipinglu Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce: Quite some repetitions here. I suggest to merge download_compressed_file_by_url with this one by either

  • add a flag to indicate whether it is compressed, or even better
  • auto detect whether the file is in (supported) compressed format (matching against ".tar.xz", ".tar", ".tar.gz", ".tgz", ".gz", ".zip") to switch to either download_and_extract_archive or download_url_to_file

Copy link
Member

@haipinglu haipinglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our pre-commit hooks have been tested for long and are quite stable so I strongly suggest you to use the automation rather than manual Run pre-commit auto format in your commits to improve efficiency, which is the professional practice and also has a cleaner commit history.

You should consider to have tests written at the same time. This will save the overall coding time because you are working on the same functionality and also save review time because the tests help to validate the code as well.

See the comment at the top. Clarify the benefits against using pytorch APIs directly.

@haipinglu
Copy link
Member

haipinglu commented Apr 30, 2021

Also, this PR did not successfully complete the auto project assignment action (https://github.com/pykale/pykale/runs/2471407104?check_suite_focus=true), saying "Resource not accessible by integration".

This is not critical but weird. In this case, we need to manually assign the project (please do). Not sure whether this is because the PR is from a fork. There were no problems before this PR.

You do not need to solve the problem (except adding project manually). It is just my observation and we may see how to solve it later. I will add a card.

But as said earlier, let us try to work on direct branches rather than forks.

@haipinglu
Copy link
Member

On pre-commit, it is also weird on your system because, if your pre-commit did not pass, you should not be able to make a commit so it is not clear why a Run pre-commit auto format is there. This somehow implies that your pre-commit hooks are not functioning automatically before each commit on your local system. I think this should be fixed, with help from Shuo maybe.

On my system, I cannot make a commit before pass the pre-commit checks, I believe this is the case for all other members, except you, so your commit history is noisier and unnecessarily long.

@haipinglu
Copy link
Member

On pre-commit, it is also weird on your system because, if your pre-commit did not pass, you should not be able to make a commit so it is not clear why a Run pre-commit auto format is there. This somehow implies that your pre-commit hooks are not functioning automatically before each commit on your local system. I think this should be fixed, with help from Shuo maybe.

On my system, I cannot make a commit before pass the pre-commit checks, I believe this is the case for all other members, except you, so your commit history is noisier and unnecessarily long.

You can consider VS code for committing to GitHub if PyCharm problems cannot be fixed, Shuo used both. I am experienced in VS code.

@xianyuanliu xianyuanliu mentioned this pull request Apr 30, 2021
2 tasks
@xianyuanliu
Copy link
Member Author

xianyuanliu commented Apr 30, 2021

Ready.

Also, this PR did not successfully complete the auto project assignment action (https://github.com/pykale/pykale/runs/2471407104?check_suite_focus=true), saying "Resource not accessible by integration".

This action seems successful from the link. You may have helped me correct it. Thanks!

But as said earlier, let us try to work on direct branches rather than forks.

Sorry about that. I create a new branch via Desktop and the default one is forked. I will check and use the direct one.

You can consider VS code for committing to GitHub if PyCharm problems cannot be fixed, Shuo used both. I am experienced in VS code.

I will solve the problem. I commit and push via Desktop/PyCharm with automatic pre-commit. In the first commit, isort always raise issues and change the imports to the correct one. But when I commit again, the code passes the check but revert the isort first change. It is fine when I use Terminal to run pre-commit run --all and commit.

Copy link
Member

@haipinglu haipinglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. This will greatly simplify our related coding!

@haipinglu haipinglu merged commit d2d667f into pykale:main Apr 30, 2021
@xianyuanliu xianyuanliu deleted the Add_downloader branch May 2, 2021 11:09
@github-actions github-actions bot mentioned this pull request Jun 21, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement of existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants