Create download.py #132

xianyuanliu · 2021-04-29T23:13:26Z

Fixes #128 option 2.

Description

Create two data downloading functions for files and compressed files via GitHub URL. It hasn't cover @RaivoKoot 's one, which downloads files from Google drive.

Status

Ready

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
In-line docstrings updated.

codecov-commenter · 2021-04-29T23:17:46Z

Codecov Report

Merging #132 (d6e98e4) into main (00bda7b) will increase coverage by 0.42%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #132      +/-   ##
==========================================
+ Coverage   17.96%   18.38%   +0.42%     
==========================================
  Files          43       44       +1     
  Lines        4075     4096      +21     
==========================================
+ Hits          732      753      +21     
  Misses       3343     3343

Impacted Files	Coverage Δ
kale/embed/video_feature_extractor.py	`0.00% <ø> (ø)`
kale/embed/video_i3d.py	`0.00% <ø> (ø)`
kale/embed/video_res3d.py	`0.00% <ø> (ø)`
kale/embed/video_se_i3d.py	`0.00% <ø> (ø)`
kale/embed/video_se_res3d.py	`0.00% <ø> (ø)`
kale/embed/video_selayer.py	`0.00% <ø> (ø)`
kale/loaddata/action_multi_domain.py	`0.00% <ø> (ø)`
kale/loaddata/video_access.py	`0.00% <ø> (ø)`
kale/pipeline/action_domain_adapter.py	`0.00% <ø> (ø)`
kale/predict/class_domain_nets.py	`0.00% <ø> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00bda7b...d6e98e4. Read the comment docs.

haipinglu · 2021-04-30T07:53:33Z

kale/utils/download.py

@@ -0,0 +1,71 @@
+# =============================================================================
+# Author: Xianyuan Liu, xianyuan.liu@sheffield.ac.uk


You may consider to start using your more permanent, personal email (or both emails, as in my case at line 4) as your Shef email will become invalid soon after you leave.

haipinglu · 2021-04-30T07:57:10Z

kale/utils/download.py

+
+"""Data downloading and compressed data extraction functions, Based on
+https://github.com/pytorch/vision/blob/master/torchvision/datasets/utils.py
+https://github.com/pytorch/pytorch/blob/master/torch/hub.py


Good. I assume that you have followed our 3R green ML principles here to reuse pytorch APIs at line 16-17

However, it should be clear why the original APIs in pytorch is not good/simple enough. Why the user should use this API rather than the pytorch ones directly. What are the added benefits of using this API.

download_and_extract_archive and download_url_to_file are not in the same PyTorch file and It would be better to integrate them into one file. This file cover functions to download files with different types from different sources. Using the same format and parameters like output_directory and output_file_name can also simplify our coding.

PyTorch provides the basic download function but it may raise issues when the output_directory does not exist. I only meet this problem.

haipinglu · 2021-04-30T08:06:18Z

kale/utils/download.py

+    logging.info("Datasets downloaded and extracted in {}".format(file))
+
+
+def download_file_by_url(url, output_directory, output_file_name):


Reduce: Quite some repetitions here. I suggest to merge download_compressed_file_by_url with this one by either

add a flag to indicate whether it is compressed, or even better

auto detect whether the file is in (supported) compressed format (matching against ".tar.xz", ".tar", ".tar.gz", ".tgz", ".gz", ".zip") to switch to either download_and_extract_archive or download_url_to_file

haipinglu

Our pre-commit hooks have been tested for long and are quite stable so I strongly suggest you to use the automation rather than manual Run pre-commit auto format in your commits to improve efficiency, which is the professional practice and also has a cleaner commit history.

You should consider to have tests written at the same time. This will save the overall coding time because you are working on the same functionality and also save review time because the tests help to validate the code as well.

See the comment at the top. Clarify the benefits against using pytorch APIs directly.

haipinglu · 2021-04-30T08:30:48Z

Also, this PR did not successfully complete the auto project assignment action (https://github.com/pykale/pykale/runs/2471407104?check_suite_focus=true), saying "Resource not accessible by integration".

This is not critical but weird. In this case, we need to manually assign the project (please do). Not sure whether this is because the PR is from a fork. There were no problems before this PR.

You do not need to solve the problem (except adding project manually). It is just my observation and we may see how to solve it later. I will add a card.

But as said earlier, let us try to work on direct branches rather than forks.

haipinglu · 2021-04-30T08:40:18Z

On pre-commit, it is also weird on your system because, if your pre-commit did not pass, you should not be able to make a commit so it is not clear why a Run pre-commit auto format is there. This somehow implies that your pre-commit hooks are not functioning automatically before each commit on your local system. I think this should be fixed, with help from Shuo maybe.

On my system, I cannot make a commit before pass the pre-commit checks, I believe this is the case for all other members, except you, so your commit history is noisier and unnecessarily long.

haipinglu · 2021-04-30T08:42:43Z

On pre-commit, it is also weird on your system because, if your pre-commit did not pass, you should not be able to make a commit so it is not clear why a Run pre-commit auto format is there. This somehow implies that your pre-commit hooks are not functioning automatically before each commit on your local system. I think this should be fixed, with help from Shuo maybe.

On my system, I cannot make a commit before pass the pre-commit checks, I believe this is the case for all other members, except you, so your commit history is noisier and unnecessarily long.

You can consider VS code for committing to GitHub if PyCharm problems cannot be fixed, Shuo used both. I am experienced in VS code.

xianyuanliu · 2021-04-30T13:45:10Z

Ready.

Also, this PR did not successfully complete the auto project assignment action (https://github.com/pykale/pykale/runs/2471407104?check_suite_focus=true), saying "Resource not accessible by integration".

This action seems successful from the link. You may have helped me correct it. Thanks!

But as said earlier, let us try to work on direct branches rather than forks.

Sorry about that. I create a new branch via Desktop and the default one is forked. I will check and use the direct one.

You can consider VS code for committing to GitHub if PyCharm problems cannot be fixed, Shuo used both. I am experienced in VS code.

I will solve the problem. I commit and push via Desktop/PyCharm with automatic pre-commit. In the first commit, isort always raise issues and change the imports to the correct one. But when I commit again, the code passes the check but revert the isort first change. It is fine when I use Terminal to run pre-commit run --all and commit.

haipinglu

Great. This will greatly simplify our related coding!

xianyuanliu added 2 commits April 30, 2021 00:05

Create download.py

6d1a7db

Update kale.utils.rst

a0355f8

xianyuanliu requested a review from haipinglu April 29, 2021 23:13

xianyuanliu added the enhancement Improvement of existing code label Apr 29, 2021

xianyuanliu added 3 commits April 30, 2021 00:21

Update download.py

0c09b0f

Run pre-commit auto format

b8299b2

Update doc

3767a94

haipinglu reviewed Apr 30, 2021

View reviewed changes

Merge branch 'upstream-main' into Add_downloader

906e22c

xianyuanliu mentioned this pull request Apr 30, 2021

Add video_access test #127

Merged

2 tasks

xianyuanliu and others added 5 commits April 30, 2021 12:11

Add another email

5f90e55

Add another email to all files

3fd09fb

Update download.py

2ed7501

Create test_download.py

2d961ba

Run pre-commit auto format

cd1ea7a

Merge branch 'upstream-main' into Add_downloader

d6e98e4

xianyuanliu requested a review from haipinglu April 30, 2021 21:10

haipinglu approved these changes Apr 30, 2021

View reviewed changes

haipinglu merged commit d2d667f into pykale:main Apr 30, 2021

xianyuanliu deleted the Add_downloader branch May 2, 2021 11:09

github-actions bot mentioned this pull request Jun 21, 2021

Release 0.1.0rc2 #161

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create download.py #132

Create download.py #132

xianyuanliu commented Apr 29, 2021

codecov-commenter commented Apr 29, 2021 •

edited

haipinglu Apr 30, 2021 •

edited

haipinglu Apr 30, 2021 •

edited

haipinglu Apr 30, 2021

xianyuanliu Apr 30, 2021

haipinglu Apr 30, 2021 •

edited

haipinglu left a comment

haipinglu commented Apr 30, 2021 •

edited

haipinglu commented Apr 30, 2021

haipinglu commented Apr 30, 2021

xianyuanliu commented Apr 30, 2021 •

edited

haipinglu left a comment

		@@ -0,0 +1,71 @@
		# =============================================================================
		# Author: Xianyuan Liu, xianyuan.liu@sheffield.ac.uk

		logging.info("Datasets downloaded and extracted in {}".format(file))


		def download_file_by_url(url, output_directory, output_file_name):

Create download.py #132

Create download.py #132

Conversation

xianyuanliu commented Apr 29, 2021

Description

Status

Types of changes

codecov-commenter commented Apr 29, 2021 • edited

Codecov Report

haipinglu Apr 30, 2021 • edited

Choose a reason for hiding this comment

haipinglu Apr 30, 2021 • edited

Choose a reason for hiding this comment

haipinglu Apr 30, 2021

Choose a reason for hiding this comment

xianyuanliu Apr 30, 2021

Choose a reason for hiding this comment

haipinglu Apr 30, 2021 • edited

Choose a reason for hiding this comment

haipinglu left a comment

Choose a reason for hiding this comment

haipinglu commented Apr 30, 2021 • edited

haipinglu commented Apr 30, 2021

haipinglu commented Apr 30, 2021

xianyuanliu commented Apr 30, 2021 • edited

haipinglu left a comment

Choose a reason for hiding this comment

codecov-commenter commented Apr 29, 2021 •

edited

haipinglu Apr 30, 2021 •

edited

haipinglu Apr 30, 2021 •

edited

haipinglu Apr 30, 2021 •

edited

haipinglu commented Apr 30, 2021 •

edited

xianyuanliu commented Apr 30, 2021 •

edited