New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Branch]: Unified Downloader #1300

Merged
merged 319 commits into from Jan 11, 2017

Conversation

@Cadair
Member

Cadair commented Feb 23, 2015

Unified Downloader

AKA: UniDown, FIDO, dataretreiever

The unified downloader is a new front end for all the existing data downloading functionality in SunPy, including the code that is currently in Lightcurve.

The main component of unidown is the Fido instance of UnifiedDownloaderFactory which is a factory class like Map, and the download clients register with this factory, when they register they provide a method which tells the factory which attributes or values of attribtes they can support, like Map does for values in the metadata.

Unified Downloader TODO

  1. Test behavior when two clients can download the same data. (EVE is one).
  2. Test different data levels download with EVE and Lyra.
  3. Fix JSOC Wavelength attr #1675 (@Cadair)
  4. Documentation
    1. Update the current documentation to be accurate!
    2. Write some explanation of , vs & in the documentation, and change most instances of , to & for clarity.
    3. Document all the attrs describing what they search for.
    4. Make sure the documentation build is warning free
    5. Re-write the guide section on obtaining data to use Fido.
  5. Testing
    1. Make the tests pass.
    2. Write more tests to increase coverage.
    3. Write JSOC Fido tests.
    4. Write VSO Fido tests.
    5. Clients that need testing through Fido:
      • EVEClient
      • GOESClient
      • LYRAClient
      • NoRHClient
      • RHESSIClient
      • NOAAIndicesClient
      • NOAAPredictClient

Items marked in bold are the highest priority.

The documentation build for the unidown branch is here: http://docs.sunpy.org/en/unidown

The relevant section is here: http://docs.sunpy.org/en/unidown/code_ref/net.html#dataretriever but it is not representative of the code currently, it needs updating!


Extras:

  • Guide documentation, including how this affects Lightcurve. (Can be delayed till Factory Class).
  • Integration with JSOC. (@Cadair)
  • Try and unify the common attributes of UnifiedResponse
  • Test coverage
  • Standardise the formatting and behaviour of the path argument to Fido.fetch.
  • Remove the downloading code from lightcurve sources.

This is a reopening of #1088

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

I wanted to kick off a discussion about the API for this, currently this is still in it's dev state, the final public API should probably have some more thought put into it than this.

The API proposed by the old SEP (before it was made LightCurve only) was this:

>>> from sunpy.net import SolarDownloader, attrs
>>> results = SolarDownloader.query(attrs.Time("2012/1/1", "2012/1/2"),
                               attrs.Instrument('lyra'))
>>> print results
Start time  End time    Source  Instrument  URL                                                                                                         
----------  --------    ------  ----------  ---                                                                                                         
2012/01/01  2012/01/02  Proba2  lyra        http://proba2.oma.be/lyra/data/bsd/2012/01/01/lyra_20120101-000000_lev2_std.fits                            
2012/01/02  2012/01/03  Proba2  lyra        http://proba2.oma.be/lyra/data/bsd/2012/01/02/lyra_20120102-000000_lev2_std.fits 
>>> dl = SolarDownloader.get(results)
>>> dl.wait(progress=True)

I am pretty happy with most of this, but I am not convinced by the namespace choices. Personally, I find it quite annoying to type attrs. before each query attribute, I wonder if we should put the main downloader factory and the attributes into the same namespace something like:

from sunpy.net import solardownloader as sdl
>>> results = sdl.SolarDownloader.query(sdl.Time("2012/1/1", "2012/1/2"), sdl.Instrument('lyra'))

Opinions?

Member

Cadair commented Mar 6, 2015

I wanted to kick off a discussion about the API for this, currently this is still in it's dev state, the final public API should probably have some more thought put into it than this.

The API proposed by the old SEP (before it was made LightCurve only) was this:

>>> from sunpy.net import SolarDownloader, attrs
>>> results = SolarDownloader.query(attrs.Time("2012/1/1", "2012/1/2"),
                               attrs.Instrument('lyra'))
>>> print results
Start time  End time    Source  Instrument  URL                                                                                                         
----------  --------    ------  ----------  ---                                                                                                         
2012/01/01  2012/01/02  Proba2  lyra        http://proba2.oma.be/lyra/data/bsd/2012/01/01/lyra_20120101-000000_lev2_std.fits                            
2012/01/02  2012/01/03  Proba2  lyra        http://proba2.oma.be/lyra/data/bsd/2012/01/02/lyra_20120102-000000_lev2_std.fits 
>>> dl = SolarDownloader.get(results)
>>> dl.wait(progress=True)

I am pretty happy with most of this, but I am not convinced by the namespace choices. Personally, I find it quite annoying to type attrs. before each query attribute, I wonder if we should put the main downloader factory and the attributes into the same namespace something like:

from sunpy.net import solardownloader as sdl
>>> results = sdl.SolarDownloader.query(sdl.Time("2012/1/1", "2012/1/2"), sdl.Instrument('lyra'))

Opinions?

@landscape-bot

This comment has been minimized.

Show comment
Hide comment
@landscape-bot

landscape-bot Mar 6, 2015

Code Health
Repository health decreased by 0.51% when pulling d72e5f6 on unidown into 96390ce on master.

landscape-bot commented Mar 6, 2015

Code Health
Repository health decreased by 0.51% when pulling d72e5f6 on unidown into 96390ce on master.

@landscape-bot

This comment has been minimized.

Show comment
Hide comment
@landscape-bot

landscape-bot Mar 6, 2015

Code Health
Repository health decreased by 0.51% when pulling d72e5f6 on unidown into 96390ce on master.

landscape-bot commented Mar 6, 2015

Code Health
Repository health decreased by 0.51% when pulling d72e5f6 on unidown into 96390ce on master.

@wafels

This comment has been minimized.

Show comment
Hide comment
@wafels

wafels Mar 6, 2015

Member

I don't think it should be called "SolarDownloader" as we may want to use
it to download non-solar data in the future. Why not just
"DataDownloader"? Also, is typing attrs any worse than typing sdl?

On Fri, Mar 6, 2015 at 6:43 AM, landscape-bot notifications@github.com
wrote:

[image: Code Health] https://landscape.io/diff/106527
Repository health decreased by 0.51% when pulling d72e5f6
d72e5f6
on unidown
into 96390ce
96390ce
on master
.

Reply to this email directly or view it on GitHub
#1300 (comment).

Member

wafels commented Mar 6, 2015

I don't think it should be called "SolarDownloader" as we may want to use
it to download non-solar data in the future. Why not just
"DataDownloader"? Also, is typing attrs any worse than typing sdl?

On Fri, Mar 6, 2015 at 6:43 AM, landscape-bot notifications@github.com
wrote:

[image: Code Health] https://landscape.io/diff/106527
Repository health decreased by 0.51% when pulling d72e5f6
d72e5f6
on unidown
into 96390ce
96390ce
on master
.

Reply to this email directly or view it on GitHub
#1300 (comment).

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

@wafels what non-solar data are we likely to download in a library called sunpy :p

Especially bearing in mind that this is an API for querying catalogue type data such as the VSO, rather than an arbitrary bit of download code.

my problem with .attrs is more that it is a bit of an obscure namespace, (it's not very self describing) so I think it could be confusing to users. Having both the main client and the search Attributes in the same namespace seems more logical to me.

Member

Cadair commented Mar 6, 2015

@wafels what non-solar data are we likely to download in a library called sunpy :p

Especially bearing in mind that this is an API for querying catalogue type data such as the VSO, rather than an arbitrary bit of download code.

my problem with .attrs is more that it is a bit of an obscure namespace, (it's not very self describing) so I think it could be confusing to users. Having both the main client and the search Attributes in the same namespace seems more logical to me.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

Having said that when we start dealing with things like JSOC specific attrs, we are going to want them to have their own namespace so is:

attrs.jsoc.Series()

better than

sdl.jsoc.Series()

I am still in the latter camp.

Member

Cadair commented Mar 6, 2015

Having said that when we start dealing with things like JSOC specific attrs, we are going to want them to have their own namespace so is:

attrs.jsoc.Series()

better than

sdl.jsoc.Series()

I am still in the latter camp.

@landscape-bot

This comment has been minimized.

Show comment
Hide comment
@landscape-bot

landscape-bot Mar 6, 2015

Code Health
Repository health decreased by 0.47% when pulling f9505aa on unidown into 96390ce on master.

landscape-bot commented Mar 6, 2015

Code Health
Repository health decreased by 0.47% when pulling f9505aa on unidown into 96390ce on master.

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Mar 6, 2015

Member

I vote for even simpler. Just call it downloader. I agree with @Cadair with not liking attr. How about calling them what they are like searchTerm or token?

>>> from sunpy.net import Downloader, tokens
>>> results = Downloader.query(tokens.Time("2012/1/1", "2012/1/2"),
                               tokens.Instrument('lyra'))
>>> dl = Downloader.get(results)

Also I would vote that the default behavior is always to wait.

Member

ehsteve commented Mar 6, 2015

I vote for even simpler. Just call it downloader. I agree with @Cadair with not liking attr. How about calling them what they are like searchTerm or token?

>>> from sunpy.net import Downloader, tokens
>>> results = Downloader.query(tokens.Time("2012/1/1", "2012/1/2"),
                               tokens.Instrument('lyra'))
>>> dl = Downloader.get(results)

Also I would vote that the default behavior is always to wait.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

I don't like just Downloader because there is actually a Downloader class that just deals with the downloading of files. The UnifiedDownloader factory actually deals with all the querying as well, so I think it should be given a more representative name.

token is maybe a little better, but it's still not very descriptive by itself, I also don't particularly like the from sunpy.net import xyz, zxy I prefer import sunpy.net.solardownloader as sdl because then it is one namespace everywhere in the users script.

Member

Cadair commented Mar 6, 2015

I don't like just Downloader because there is actually a Downloader class that just deals with the downloading of files. The UnifiedDownloader factory actually deals with all the querying as well, so I think it should be given a more representative name.

token is maybe a little better, but it's still not very descriptive by itself, I also don't particularly like the from sunpy.net import xyz, zxy I prefer import sunpy.net.solardownloader as sdl because then it is one namespace everywhere in the users script.

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Mar 6, 2015

Member

Okay @Cadair good point. Well how about we call it Search then? Then Search.token would look pretty great.

Member

ehsteve commented Mar 6, 2015

Okay @Cadair good point. Well how about we call it Search then? Then Search.token would look pretty great.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

it should definitely be under sunpy.net sunpy.data is for accessing data directly rather than querying remote services.

Search is an interesting idea, maybe DataSearch to make it a little clearer what it is searching for. However, DataSearch.query() seems a little weird?

Member

Cadair commented Mar 6, 2015

it should definitely be under sunpy.net sunpy.data is for accessing data directly rather than querying remote services.

Search is an interesting idea, maybe DataSearch to make it a little clearer what it is searching for. However, DataSearch.query() seems a little weird?

@landscape-bot

This comment has been minimized.

Show comment
Hide comment
@landscape-bot

landscape-bot Mar 6, 2015

Code Health
Repository health decreased by 0.41% when pulling 7bede3b on unidown into 96390ce on master.

landscape-bot commented Mar 6, 2015

Code Health
Repository health decreased by 0.41% when pulling 7bede3b on unidown into 96390ce on master.

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Mar 6, 2015

Member

Search.query seems pretty good to me. It is a search and you are inputing a query as you would for Google. I guess we could also replace it with an action verb like Search.do.

Member

ehsteve commented Mar 6, 2015

Search.query seems pretty good to me. It is a search and you are inputing a query as you would for Google. I guess we could also replace it with an action verb like Search.do.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Mar 6, 2015

Member

@ehsteve I could go with DataSearch. I do wonder if this is a better API:

from sunpy.net import solardownloader as sdl
>>> results = sdl.DataSearch.query(sdl.Time("2012/1/1", "2012/1/2"), sdl.Instrument('lyra'))
>>> results.get(path='./', progress=True)

i.e. having the get method on the results object rather than the client. I would need to look into how easy this would be to achieve.

Member

Cadair commented Mar 6, 2015

@ehsteve I could go with DataSearch. I do wonder if this is a better API:

from sunpy.net import solardownloader as sdl
>>> results = sdl.DataSearch.query(sdl.Time("2012/1/1", "2012/1/2"), sdl.Instrument('lyra'))
>>> results.get(path='./', progress=True)

i.e. having the get method on the results object rather than the client. I would need to look into how easy this would be to achieve.

@aringlis

This comment has been minimized.

Show comment
Hide comment
@aringlis

aringlis Mar 6, 2015

Member

I'm not a fan of just Downloader either - whatever we call it, it should be obvious that it relates to downloading/searching for solar data.

Member

aringlis commented Mar 6, 2015

I'm not a fan of just Downloader either - whatever we call it, it should be obvious that it relates to downloading/searching for solar data.

Show outdated Hide outdated doc/source/code_ref/client.rst
Show outdated Hide outdated doc/source/code_ref/net.rst
Show outdated Hide outdated doc/source/code_ref/client.rst
@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Dec 5, 2016

Member

Looking through the Docs, in the API section it refers to GenericClient...which is a bit too generic of a name with no reference to what it refers to. Should it live under net/?

Member

ehsteve commented Dec 5, 2016

Looking through the Docs, in the API section it refers to GenericClient...which is a bit too generic of a name with no reference to what it refers to. Should it live under net/?

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Dec 5, 2016

Member

Also when I ran the following code from the docs
downresp = Fido.fetch(results)
It downloaded the files so that following line
files = downresp.wait()
was not needed. Very confusing.

Member

ehsteve commented Dec 5, 2016

Also when I ran the following code from the docs
downresp = Fido.fetch(results)
It downloaded the files so that following line
files = downresp.wait()
was not needed. Very confusing.

Cadair added some commits Jan 4, 2017

Re-work sunpy.net code reference documentation.
This also moves the factory class out of sunpy.net.dataretriever as I realised
that it belongs in the top level of net.
@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jan 4, 2017

Member

@ehsteve I have re-written the code_ref section of the docs for net. It's far from perfect, but at least it's not outright wrong any more.

Member

Cadair commented Jan 4, 2017

@ehsteve I have re-written the code_ref section of the docs for net. It's far from perfect, but at least it's not outright wrong any more.

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair
Member

Cadair commented Jan 5, 2017

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Jan 5, 2017

Member

Thanks, @Cadair. Looks like the Acquiring Data with SunPy needs to be completely rewritten as well, right?

Member

ehsteve commented Jan 5, 2017

Thanks, @Cadair. Looks like the Acquiring Data with SunPy needs to be completely rewritten as well, right?

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jan 5, 2017

Member

@ehsteve yes, that will need to be re-written, but seeing how it's not incorrect (that code will work) we could do that in master if you want.

Member

Cadair commented Jan 5, 2017

@ehsteve yes, that will need to be re-written, but seeing how it's not incorrect (that code will work) we could do that in master if you want.

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Jan 5, 2017

Member

@Cadair Ok, so would you consider yourself ready for a final review?

Member

ehsteve commented Jan 5, 2017

@Cadair Ok, so would you consider yourself ready for a final review?

@ehsteve

This comment has been minimized.

Show comment
Hide comment
@ehsteve

ehsteve Jan 5, 2017

Member

@Cadair is there any way to add new tags to the docs to make sure people notice?

Member

ehsteve commented Jan 5, 2017

@Cadair is there any way to add new tags to the docs to make sure people notice?

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jan 5, 2017

Member

yes, final review please @sunpy/sunpy-developers.

@ehsteve I don't think so, we should take some inspiration from astropy, because they do some excellent things with their changelog / what's new.

Member

Cadair commented Jan 5, 2017

yes, final review please @sunpy/sunpy-developers.

@ehsteve I don't think so, we should take some inspiration from astropy, because they do some excellent things with their changelog / what's new.

@ehsteve

ehsteve approved these changes Jan 6, 2017

@dpshelio

This is finally here!!

@dpshelio dpshelio merged commit b1f201d into master Jan 11, 2017

5 checks passed

continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.06%) to 78.386%
Details

@dpshelio dpshelio removed the [Review] label Jan 11, 2017

@Cadair

This comment has been minimized.

Show comment
Hide comment
@Cadair

Cadair Jan 11, 2017

Member

REJOICE!

Member

Cadair commented Jan 11, 2017

REJOICE!

@Cadair Cadair deleted the unidown branch Jan 11, 2017

@Cadair Cadair moved this from Issues to Finished in SunPy 0.8 Jul 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment