Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple parallel downloads in the Helioviewer client #3862

Closed
wants to merge 4 commits into from

Conversation

Raahul-Singh
Copy link
Contributor

As per the discussion during the SunPy community meeting.

@Raahul-Singh Raahul-Singh requested a review from a team as a code owner March 5, 2020 09:24
@pep8speaks
Copy link

pep8speaks commented Mar 5, 2020

Hello @Raahul-Singh! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 197:17: E124 closing bracket does not match visual indentation
Line 204:13: E124 closing bracket does not match visual indentation
Line 440:37: E128 continuation line under-indented for visual indent
Line 442:1: W293 blank line contains whitespace

Comment last updated at 2020-03-13 16:58:41 UTC

@Raahul-Singh
Copy link
Contributor Author

Should I add the changelog as trivial or feature ?

@nabobalis nabobalis added this to the 2.0 milestone Mar 5, 2020
@nabobalis nabobalis added the net Affects the net submodule label Mar 5, 2020
@nabobalis
Copy link
Contributor

I would go with trivial

@Cadair
Copy link
Member

Cadair commented Mar 6, 2020

You are still calling downloader.download on the downloader you pass in, meaning you can't stack the downloads.

@Raahul-Singh
Copy link
Contributor Author

You are still calling downloader.download on the downloader you pass in, meaning you can't stack the downloads.

Ah, sorry about this. I made an error in judgement regarding the enqueue_file

@Raahul-Singh
Copy link
Contributor Author

Raahul-Singh commented Mar 6, 2020

This gives a dramatic improvement in download time when downloading large datasets from the same source.
So after some profiling, I have the following results:

The time it took to download 100 files before this modification: 122.59405422210693 sec

The time it takes now: 25.995344638824463 sec

That's a 4.7 times increase in speed.

This increase in speed matters more and would increase in magnitude when downloading a large number of files.

For example, to download ~56k images from the server, HelioViewer took approximately 52 hours.
Now it would take just about 4 hours, assuming constant download speed per file.

@Raahul-Singh
Copy link
Contributor Author

This gives a dramatic improvement in download time when downloading large datasets from the same source.
So after some profiling, I have the following results:

The time it took to download 100 files before this modification: 122.59405422210693 sec

The time it takes now: 25.995344638824463 sec

That's a 4.7 times increase in speed.

This increase in speed matters more and would increase in magnitude when downloading a large number of files.

For example, to download ~56k images from the server, HelioViewer took approximately 52 hours.
Now it would take just about 4 hours, assuming constant download speed per file.

@Cadair This makes me wonder if, provided this is okay, should not all the clients allow multiple files being so downloaded by stacking them together?

@Raahul-Singh
Copy link
Contributor Author

Here is the script that can be used to eveluate the changes:

from sunpy.net import helioviewer
import time

client = helioviewer.HelioviewerClient()
obsdate = ['1996-05-02 23:28:04', '1996-05-05 18:39:04',
           '1996-05-05 20:15:04', '1996-05-06 04:15:04',
           '1996-05-06 05:51:04', '1996-05-06 07:27:04',
           '1996-05-06 09:03:04', '1996-05-06 10:39:04',
           '1996-05-06 17:03:04', '1996-05-06 18:39:04']  

start = time.time()
for i in range(len(obsdate)):
    file = client.download_jp2(obsdate[i], observatory="SOHO", instrument="MDI", source_id=6)
end = time.time()
time_taken = end - start
print(f"Time taken to download 10 files before the modification: {time_taken: 0.3f} seconds")

start = time.time()
file = client.download_jp2(obsdate, observatory="SOHO", instrument="MDI", source_id=6)
end = time.time()
time_taken = end - start
print(f"Time taken to download 10 files before the modification: {time_taken: 0.3f} seconds")

@Raahul-Singh
Copy link
Contributor Author

Raahul-Singh commented Mar 13, 2020

@Cadair Here is the script that can be used to evaluate the changes:

from sunpy.net import helioviewer
import time

client = helioviewer.HelioviewerClient()
obsdate = ['1996-05-02 23:28:04', '1996-05-05 18:39:04',
           '1996-05-05 20:15:04', '1996-05-06 04:15:04',
           '1996-05-06 05:51:04', '1996-05-06 07:27:04',
           '1996-05-06 09:03:04', '1996-05-06 10:39:04',
           '1996-05-06 17:03:04', '1996-05-06 18:39:04']  

start = time.time()
for i in range(len(obsdate)):
    file = client.download_jp2(obsdate[i], observatory="SOHO", instrument="MDI", source_id=6)
end = time.time()
time_taken = end - start
print(f"Time taken to download 10 files before the modification: {time_taken: 0.3f} seconds")

start = time.time()
file = client.download_jp2(obsdate, observatory="SOHO", instrument="MDI", source_id=6)
end = time.time()
time_taken = end - start
print(f"Time taken to download 10 files before the modification: {time_taken: 0.3f} seconds")

@nabobalis
Copy link
Contributor

This in fact just allows us to download a list of dates instead of one date at a time?

@nabobalis nabobalis changed the title Adds support for passing custom parfive downloader to Helioviewer Allow multiple parallel downloads in the Helioviewer client Apr 9, 2020
@Raahul-Singh
Copy link
Contributor Author

This in fact just allows us to download a list of dates instead of one date at a time?

Yes. I was wondering if adding a default batch size would be required, along with raising a warning if a certain size is exceeded. We don't want to overwhelm the servers.

@nabobalis
Copy link
Contributor

Then we need:

  1. This change to be applied to all the functions of the Helioviewer Client, otherwise the API is inconsistent.
  2. The documentation updated.
  3. Tests for all of these changes
  4. A way to prevent someone from DDOSing the servers.

@Raahul-Singh
Copy link
Contributor Author

Then we need:

1. This change to be applied to all the functions of the Helioviewer Client, otherwise the API is inconsistent.

2. The documentation updated.

3. Tests for all of these changes

4. A way to prevent someone from DDOSing the servers.

So,

This change to be applied to all the functions of the Helioviewer Client, otherwise the API is inconsistent.

This would be keeping a list for the dates with get_closest_image and get_jp2_header functions, right?

A way to prevent someone from DDOSing the servers.

Setting a max_size instance variable and if the queued size exceeds this max size, we raise an exception?

@nabobalis
Copy link
Contributor

Then we need:

1. This change to be applied to all the functions of the Helioviewer Client, otherwise the API is inconsistent.

2. The documentation updated.

3. Tests for all of these changes

4. A way to prevent someone from DDOSing the servers.

So,

This change to be applied to all the functions of the Helioviewer Client, otherwise the API is inconsistent.

This would be keeping a list for the dates with get_closest_image and get_jp2_header functions, right?

Sure.

A way to prevent someone from DDOSing the servers.

Setting a max_size instance variable and if the queued size exceeds this max size, we raise an exception?

Sure.

@nabobalis nabobalis modified the milestones: 2.0, 2.1 Apr 15, 2020
@dstansby dstansby added the [WIP] label Jul 5, 2020
@nabobalis nabobalis marked this pull request as draft July 17, 2020 09:02
@nabobalis nabobalis removed the [WIP] label Sep 19, 2020
@nabobalis
Copy link
Contributor

I will close this, this kind of change will need some thought to prevent a user from overwhelming the service.

@nabobalis nabobalis closed this Sep 19, 2020
@Raahul-Singh Raahul-Singh deleted the HelioviewerDownloader branch September 19, 2020 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Affects the net submodule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants