New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSO query result block level caching #1785

Merged
merged 36 commits into from Aug 2, 2017

Conversation

Projects
9 participants
@Punyaslok
Member

Punyaslok commented May 18, 2016

This checks each existing entry in the database and checks if the user already has the files for each Query Result Block.

@Cadair

Each VSO query result is a list of query response blocks. So there are cases in which 2 queries have a large number of same query response blocks. For example, if one query is a subset of another. So this feature will download files for a response block if :

  • A query response block is new, i.e. its entry does not exist in the database beforehand.
  • An entry for a query response block exists, but it's files haven't been downloaded yet. This is the case where the user has simply added the entries to the database and not downloaded its files.

Example :

Initially we download files for a Query A which has 5 response blocks, say 1, 2, 3, 4, 5.

Now we want to download files for a Query B which has 5 response blocks, say 4, 5, 6, 7, 8. So response blocks 4 and 5 are common with Query A, and their files have already been downloaded.

Old behaviour : Files for 4, 5, 6, 7 and 8 are downloaded.

New behaviour : Files for 6, 7 and 8 are downloaded.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair May 23, 2016

Member

@Punyaslok this is awesome! ping @DanRyanIrish @dpshelio @derdon

How does this currently interact with the .download method? Does download call _download_and_collect_entries?

If so, this, in the first instance, be documented in the docstring for download.

Member

Cadair commented May 23, 2016

@Punyaslok this is awesome! ping @DanRyanIrish @dpshelio @derdon

How does this currently interact with the .download method? Does download call _download_and_collect_entries?

If so, this, in the first instance, be documented in the docstring for download.

@Punyaslok Punyaslok changed the title from VSO query block level caching to VSO query result block level caching May 23, 2016

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair
Member

Cadair commented May 27, 2016

Show outdated Hide outdated sunpy/database/tables.py
@DanRyanIrish

This comment has been minimized.

Show comment
Hide comment
@DanRyanIrish

DanRyanIrish Jun 8, 2016

Member

This looks like really great functionality!

Member

DanRyanIrish commented Jun 8, 2016

This looks like really great functionality!

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jul 7, 2016

Member

This doesn't remove the old query cache right? How will the old method and the new one interact?

Member

Cadair commented Jul 7, 2016

This doesn't remove the old query cache right? How will the old method and the new one interact?

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Aug 3, 2016

Member

@Cadair
The old caching method doesn't interact with the new one. Both should be in place for better efficiency.

The old cache is checked first. See here

The new caching only comes into play if there is no hit in the old caching system.

Consider a query A which is already made before.

  1. Another query which is same as A will be a hit in the old caching system and no need to proceed further.
  2. If a query B is made which is a subset of A, there will be no match in the old cache and the new cache code will be used.

Removing the old caching system will lead to considerable slowdown in Case 1.

Member

Punyaslok commented Aug 3, 2016

@Cadair
The old caching method doesn't interact with the new one. Both should be in place for better efficiency.

The old cache is checked first. See here

The new caching only comes into play if there is no hit in the old caching system.

Consider a query A which is already made before.

  1. Another query which is same as A will be a hit in the old caching system and no need to proceed further.
  2. If a query B is made which is a subset of A, there will be no match in the old cache and the new cache code will be used.

Removing the old caching system will lead to considerable slowdown in Case 1.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Aug 5, 2016

Member

Looks good! 👍

@sunpy/sunpy-developers review please.

Member

Cadair commented Aug 5, 2016

Looks good! 👍

@sunpy/sunpy-developers review please.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Aug 5, 2016

Member

Oh, @Punyaslok can you add an entry to the changelog please.

Member

Cadair commented Aug 5, 2016

Oh, @Punyaslok can you add an entry to the changelog please.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Aug 31, 2016

Member

@Punyaslok can you rebase this please?

Member

Cadair commented Aug 31, 2016

@Punyaslok can you rebase this please?

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Aug 31, 2016

Member

looking at the code in download() it looks like the _download_and_collect_entries() method (which calls VSOClient.get is called before the query level cache is checked, meaning it's going to do downloads and queries it doesn't need to.

Member

Cadair commented Aug 31, 2016

looking at the code in download() it looks like the _download_and_collect_entries() method (which calls VSOClient.get is called before the query level cache is checked, meaning it's going to do downloads and queries it doesn't need to.

@derdon

This comment has been minimized.

Show comment
Hide comment
@derdon

derdon Oct 18, 2016

Member

I think the test function is too bloated. Check if you can split it up into smaller tests where each should test only one property (ideally). Also, there is some redundancy with downloading to a path. Refactor that to a separate function.

Member

derdon commented Oct 18, 2016

I think the test function is too bloated. Check if you can split it up into smaller tests where each should test only one property (ideally). Also, there is some redundancy with downloading to a path. Refactor that to a separate function.

@Cadair

This looks great now, it just needs some documentation updates, both in the docstrings and in the guide/acquiring_data/database.rst file.

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Jan 31, 2017

Member

@Cadair

The old caching system should be removed as it becomes a special case of the new one. Also it clashes with the new caching system.

So the behavior now should be :

  • Caching is ON (default)

    • If file is downloaded, then DB entry already exists, do nothing.
    • If DB entry exists, but file not downloaded, then download file and UPDATE this DB entry.
    • If DB entry doesn't exist, then download file and add new DB entry.
  • Caching is OFF

    • If file is downloaded, then DB entry already exists, redownload and update timestamp ???
    • If DB entry exists, but file not downloaded, then download file and UPDATE this DB entry.
    • If DB entry doesn't exist, then download file and add new DB entry.

Suggest edits :)

Member

Punyaslok commented Jan 31, 2017

@Cadair

The old caching system should be removed as it becomes a special case of the new one. Also it clashes with the new caching system.

So the behavior now should be :

  • Caching is ON (default)

    • If file is downloaded, then DB entry already exists, do nothing.
    • If DB entry exists, but file not downloaded, then download file and UPDATE this DB entry.
    • If DB entry doesn't exist, then download file and add new DB entry.
  • Caching is OFF

    • If file is downloaded, then DB entry already exists, redownload and update timestamp ???
    • If DB entry exists, but file not downloaded, then download file and UPDATE this DB entry.
    • If DB entry doesn't exist, then download file and add new DB entry.

Suggest edits :)

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jan 31, 2017

Member

@Punyaslok I think I agree with your suggestion. We will talk about it at the dev meeting tomorrow.

Member

Cadair commented Jan 31, 2017

@Punyaslok I think I agree with your suggestion. We will talk about it at the dev meeting tomorrow.

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Feb 1, 2017

Member

Dev meeting consensus: The keyword should be changed as it is confusing. Caching should always be occurring. New keyword idea: /force, /globber.

Member

ehsteve commented Feb 1, 2017

Dev meeting consensus: The keyword should be changed as it is confusing. Caching should always be occurring. New keyword idea: /force, /globber.

@stale stale bot added the Stale label Jul 1, 2017

@stale

This comment has been minimized.

Show comment
Hide comment
@stale

stale bot Jul 1, 2017

This pull request has been automatically marked as stale because it has not had any activity for the past five months. It will be closed if no further activity occurs. If the ideas in this pull request are still worth implementing, please make sure you open an issue to keep track of that idea!

stale bot commented Jul 1, 2017

This pull request has been automatically marked as stale because it has not had any activity for the past five months. It will be closed if no further activity occurs. If the ideas in this pull request are still worth implementing, please make sure you open an issue to keep track of that idea!

Punyaslok added some commits May 18, 2016

Show outdated Hide outdated docs/guide/acquiring_data/database.rst
Show outdated Hide outdated docs/guide/acquiring_data/database.rst
Show outdated Hide outdated sunpy/database/database.py

@Cadair Cadair requested review from dpshelio and nabobalis Jul 17, 2017

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Jul 17, 2017

Member

@Cadair @dpshelio @nabobalis

Someone tell me what to do with the database.download() method. There seems to be no difference between download() and fetch() now.

We can put the decision here for future reference.

Member

Punyaslok commented Jul 17, 2017

@Cadair @dpshelio @nabobalis

Someone tell me what to do with the database.download() method. There seems to be no difference between download() and fetch() now.

We can put the decision here for future reference.

@Cadair Cadair added this to Nice to Haves in SunPy 0.8 Jul 17, 2017

@nabobalis

This comment has been minimized.

Show comment
Hide comment
@nabobalis

nabobalis Jul 17, 2017

Contributor

Delete download and keep/use fetch? I think thats what FIDO uses. Probably.

Contributor

nabobalis commented Jul 17, 2017

Delete download and keep/use fetch? I think thats what FIDO uses. Probably.

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Jul 17, 2017

Member

@nabobalis I agree with you. I was asking if anyone wanted backward compatibility because download is an intuitive keyword and it would be used more often by people, that is if anyone even uses it 😛

Member

Punyaslok commented Jul 17, 2017

@nabobalis I agree with you. I was asking if anyone wanted backward compatibility because download is an intuitive keyword and it would be used more often by people, that is if anyone even uses it 😛

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Jul 17, 2017

Member

So it was decided to deprecate download()
fetch() should be the one to query/download data.
👨‍⚖️

Member

Punyaslok commented Jul 17, 2017

So it was decided to deprecate download()
fetch() should be the one to query/download data.
👨‍⚖️

@nabobalis

This comment has been minimized.

Show comment
Hide comment
@nabobalis

nabobalis Jul 17, 2017

Contributor

I don't like backwards compatibility as a rule. It holds back so many things in the technology world.

Contributor

nabobalis commented Jul 17, 2017

I don't like backwards compatibility as a rule. It holds back so many things in the technology world.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jul 17, 2017

Member

We can add it to the list of things to remove in '1.0'

Member

Cadair commented Jul 17, 2017

We can add it to the list of things to remove in '1.0'

path=str(tmpdir.join('{file}.fits')),
file_pattern=str(tmpdir.join('*.fits')))
assert len(database) == num_of_fits_headers and len(database) > 0

This comment has been minimized.

@nabobalis

nabobalis Jul 18, 2017

Contributor

Do we need both checks here? Or is that if num_entries_from_vso_query fails we won't know unless we check len>0?

@nabobalis

nabobalis Jul 18, 2017

Contributor

Do we need both checks here? Or is that if num_entries_from_vso_query fails we won't know unless we check len>0?

This comment has been minimized.

@Punyaslok

Punyaslok Jul 18, 2017

Member

yes, we need > 0 to check if num_entries is working correctly or not

@Punyaslok

Punyaslok Jul 18, 2017

Member

yes, we need > 0 to check if num_entries is working correctly or not

@nabobalis

Minor comments here.

Punyaslok added some commits Jul 19, 2017

@Cadair

Cadair approved these changes Jul 27, 2017

@Punyaslok

This comment has been minimized.

Show comment
Hide comment
@Punyaslok

Punyaslok Jul 30, 2017

Member

@dpshelio Changes made

Member

Punyaslok commented Jul 30, 2017

@dpshelio Changes made

@Cadair Cadair requested a review from dpshelio Aug 1, 2017

Show outdated Hide outdated sunpy/database/database.py
Show outdated Hide outdated sunpy/database/database.py

@Cadair Cadair merged commit ac14af4 into sunpy:master Aug 2, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.01%) to 79.05%
Details

@Cadair Cadair moved this from Nice to Haves to Finished in SunPy 0.8 Aug 2, 2017

@Punyaslok Punyaslok removed the [Review] label Aug 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment