Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpAccess cannot download public file, but RsyncAccess can #16

Closed
andycasey opened this issue May 27, 2020 · 12 comments · Fixed by #17
Closed

HttpAccess cannot download public file, but RsyncAccess can #16

andycasey opened this issue May 27, 2020 · 12 comments · Fixed by #17
Assignees

Comments

@andycasey
Copy link
Contributor

andycasey commented May 27, 2020

Description
I can access public data products using sdss_access.RsyncAccess, but the same products cannot be downloaded using sdss_access.HttpAccess.

Expected behaviour
I don't think any authorization should be required (or even checked) for accessing public data products.

Steps to recreate

from sdss_access import RsyncAccess, HttpAccess

kwds = dict(
    public=True, 
    release="DR16", 
    verbose=True
)
apstar_kwds = dict(    
    apred="r12",
    apstar="stars",
    telescope="apo25m",
    field="000+14",
    prefix="ap",
    obj="2M16544175-2148453",
)

# Works:
rsync = RsyncAccess(**kwds)
rsync.remote()
rsync.add("apStar", **apstar_kwds)
rsync.set_stream()

rsync.commit()

# Fails:
http = HttpAccess(**kwds)
http.remote() 
http.get("apStar", **apstar_kwds)

The stack trace reads:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-1abe27f23dc9> in <module>
     21 # Fails:
     22 http = HttpAccess(**kwds)
---> 23 http.remote()
     24 http.get("apStar", **apstar_kwds)

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/http.py in remote(self, remote_base, username, password)
     49             self.remote_base = remote_base
     50         self._remote = True
---> 51         self.set_auth(username=username, password=password)
     52         if self.auth.ready():
     53             passman = HTTPPasswordMgrWithDefaultRealm()

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/http.py in set_auth(self, username, password)
     31         self.auth.set_password(password)
     32         if not self.auth.ready():
---> 33             self.auth.load()
     34 
     35     def remote(self, remote_base=None, username=None, password=None):

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/auth.py in load(self)
     59 
     60     def load(self):
---> 61         if self.netloc and self.netrc:
     62             authenticators = self.netrc.authenticators(self.netloc)
     63             if authenticators and len(authenticators) == 3:

AttributeError: 'Auth' object has no attribute 'netrc'

Additional context
The documentation for sdss_access.sync.auth.Auth.set_netrc says to add a line to my ~/.netrc file. I did that (using the correct password) and then a new exception occurs due to permissions. Admittedly the documentation for set_netrc does tell me to set the permissions correctly, but I would expect no authorization is required -- at all -- for public data.

SDSS_ACCESS> Error NetrcParseError('~/.netrc access too permissive: access permissions must restrict access to only the owner')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/anaconda3/envs/astra/lib/python3.7/io.py in <module>
     21 # Fails:
     22 http = HttpAccess(**kwds)
---> 23 http.remote()
     24 http.get("apStar", **apstar_kwds)

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/http.py in remote(self, remote_base, username, password)
     49             self.remote_base = remote_base
     50         self._remote = True
---> 51         self.set_auth(username=username, password=password)
     52         if self.auth.ready():
     53             passman = HTTPPasswordMgrWithDefaultRealm()

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/http.py in set_auth(self, username, password)
     31         self.auth.set_password(password)
     32         if not self.auth.ready():
---> 33             self.auth.load()
     34 
     35     def remote(self, remote_base=None, username=None, password=None):

~/anaconda3/envs/astra/lib/python3.7/site-packages/sdss_access/sync/auth.py in load(self)
     59 
     60     def load(self):
---> 61         if self.netloc and self.netrc:
     62             authenticators = self.netrc.authenticators(self.netloc)
     63             if authenticators and len(authenticators) == 3:

AttributeError: 'Auth' object has no attribute 'netrc'

I am using the the bleeding edge version of sdss_access.

@joelbrownstein
Copy link
Contributor

joelbrownstein commented May 27, 2020

Hi @andycasey,

This looks like a new bug, since the simple example at sdss_access_http_example_dr14 is no longer working in the latest tag:

$ module load sdss_access/1.0.0
$ ipython
In [1]: from sdss_access import HttpAccess
   ...: http_access = HttpAccess(verbose=True)
   ...:
   ...: #works with or without a ~/.netrc
   ...: http_access.remote()
   ...:
cannot find 'data.sdss.org' in ~/.netrc

@havok2063
Copy link
Collaborator

Thanks @andycasey and @joelbrownstein. Currently I cannot reproduce either of your errors, so I'll have to dig a little deeper.

DR16 apStar example

from sdss_access import RsyncAccess, HttpAccess
kwds = dict(
    public=True,
    release="DR16",
    verbose=True)

apstar_kwds = dict(
   apred="r12",
   apstar="stars",
   telescope="apo25m",
   field="000+14",
   prefix="ap",
   obj="2M16544175-2148453")

http = HttpAccess(**kwds)
http.remote()
authentication for netloc='data.sdss.org' set for username='sdss'

http.get('apStar', **apstar_kwds)
Downloading: https://data.sdss.org/sas/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits Bytes: 961920
CREATE /Users/Brian/sas/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits

DR14 example from https://github.com/sdss/sdss_access/blob/master/bin/sdss_access_http_example_dr14

http_access = HttpAccess(verbose=True)
http_access.remote()
authentication for netloc='data.sdss.org' set for username='sdss'

http_access.get('spec-lite', run2d='v5_10_0', plateid=3606, mjd=55182, fiberid=537)
CREATE /Users/Brian/sas/ebosswork/eboss/spectro/redux/v5_10_0/spectra/lite/3606
Downloading: https://data.sdss.org/sas/ebosswork/eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits Bytes: 218880
CREATE /Users/Brian/sas/ebosswork/eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits

http_access = HttpAccess(verbose=True, release='DR14')
http_access.remote()
authentication for netloc='data.sdss.org' set for username='sdss'

http_access.get('spec-lite', run2d='v5_10_0', plateid=3606, mjd=55182, fiberid=537)
CREATE /Users/Brian/sas/dr14/eboss/spectro/redux/v5_10_0/spectra/lite/3606
Downloading: https://data.sdss.org/sas/dr14/eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits Bytes: 218880
CREATE /Users/Brian/sas/dr14/eboss/spectro/redux/v5_10_0/spectra/lite/3606/spec-3606-55182-0537.fits

If I run it immediately after RsyncAccess it works as well. Finds the file or re-downloads it if I delete the local file.

rsync = RsyncAccess(**kwds)
rsync.remote()
rsync.add("apStar", **apstar_kwds)
rsync.set_stream()
rsync.commit()
rsync -R rsync://data.sdss.org/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits
SDSS_ACCESS> Reducing the number of streams from 5 to 1, the number of download tasks.
SDSS_ACCESS> CREATE /var/folders/j6/g80fb5pn2_j41jckdc4hyd5m0000gn/T/sdss_access/20200527_001
SDSS_ACCESS> streamlets added to /var/folders/j6/g80fb5pn2_j41jckdc4hyd5m0000gn/T/sdss_access/20200527_001
SDSS_ACCESS> [background]$ 'rsync -avRK --files-from=/var/folders/j6/g80fb5pn2_j41jckdc4hyd5m0000gn/T/sdss_access/20200527_001/sdss_access_00.txt rsync://data.sdss.org/dr16 /Users/Brian/sas/dr16'
SDSS_ACCESS> rsync stream 0 logging to /var/folders/j6/g80fb5pn2_j41jckdc4hyd5m0000gn/T/sdss_access/20200527_001/sdss_access_00.log
SDSS_ACCESS> syncing... please wait for 1 rsync streams to complete [running for 0 seconds]
SDSS_ACCESS> Done!

http = HttpAccess(**kwds)
http.remote()
authentication for netloc='data.sdss.org' set for username='sdss'

http.get('apStar', **apstar_kwds)
FOUND /Users/Brian/sas/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits (already downloaded)

@havok2063
Copy link
Collaborator

So if I remove my .netrc file and try again then it fails. This fails with the old sdss_access so it's a bug but not a new bug. It's probably been lingering around for a while, likely because we haven't regularly tested HttpAccess and always promoted RsyncAccess over HttpAccess.

import sdss_access
sdss_access.__version__
'0.2.11'

http_access = HttpAccess(verbose=True, release='DR14')
http_access.remote()
SDSS_ACCESS> Error FileNotFoundError(2, 'No such file or directory')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-8e61041e7506> in <module>
----> 1 http_access.remote()

~/anaconda3/envs/test/lib/python3.7/site-packages/sdss_access/sync/http.py in remote(self, remote_base, username, password)
     49             self.remote_base = remote_base
     50         self._remote = True
---> 51         self.set_auth(username=username, password=password)
     52         if self.auth.ready():
     53             passman = HTTPPasswordMgrWithDefaultRealm()

~/anaconda3/envs/test/lib/python3.7/site-packages/sdss_access/sync/http.py in set_auth(self, username, password)
     31         self.auth.set_password(password)
     32         if not self.auth.ready():
---> 33             self.auth.load()
     34
     35     def remote(self, remote_base=None, username=None, password=None):

~/anaconda3/envs/test/lib/python3.7/site-packages/sdss_access/sync/auth.py in load(self)
     59
     60     def load(self):
---> 61         if self.netloc and self.netrc:
     62             authenticators = self.netrc.authenticators(self.netloc)
     63             if authenticators and len(authenticators) == 3:

AttributeError: 'Auth' object has no attribute 'netrc'

havok2063 added a commit that referenced this issue May 28, 2020
Fixes http public access bug #16
@havok2063
Copy link
Collaborator

@andycasey This has now been fixed with PR #17 and merged into master. If you're using the bleeding edge you can try a git pull. I'll make a new tag soon. Without a netrc file this now works

from sdss_access import RsyncAccess, HttpAccess

kwds = dict(
    public=True,
    release="DR16",
    verbose=True)

apstar_kwds = dict(
    apred="r12",
    apstar="stars",
    telescope="apo25m",
    field="000+14",
    prefix="ap",
    obj="2M16544175-2148453")

http = HttpAccess(**kwds)
http.remote()
http.get('apStar', **apstar_kwds)
Downloading: https://data.sdss.org/sas/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits Bytes: 961920
CREATE /Users/Brian/Work/sdss/sas/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits

@joelbrownstein
Copy link
Contributor

Thanks Brian!

I've checked out 1.0.1 at Utah (although this means we now have to retag tree?), and confirmed your test works

In [1]: from sdss_access import RsyncAccess, HttpAccess
   ...:
   ...: kwds = dict(
   ...:     public=True,
   ...:     release="DR16",
   ...:     verbose=True)
   ...:
   ...: apstar_kwds = dict(
   ...:     apred="r12",
   ...:     apstar="stars",
   ...:     telescope="apo25m",
   ...:     field="000+14",
   ...:     prefix="ap",
   ...:     obj="2M16544175-2148453")
   ...:
   ...: http = HttpAccess(**kwds)
   ...: http.remote()
   ...: http.get('apStar', **apstar_kwds)
   ...:
FOUND /uufs/chpc.utah.edu/common/home/sdss/dr16/apogee/spectro/redux/r12/stars/apo25m/000+14/apStar-r12-2M16544175-2148453.fits (already downloaded)

@havok2063
Copy link
Collaborator

@joelbrownstein this was only an fix to sdss_access, unrelated to tree paths. We don't need to retag tree.

@joelbrownstein
Copy link
Contributor

tree explicitly depends on sdss_access/version via

module load sdss_access/1.0.0
prereq sdss_access/1.0.0

at line 259--260 of https://github.com/sdss/tree/blob/master/bin/setup_tree.py

@havok2063
Copy link
Collaborator

havok2063 commented May 28, 2020

Hmm. That requirement is meant to only be a minimum requirement so it can't load modules below version 1.0. Any version above 1.0.0 will work. I don't want to have to retag tree every time I update sdss_access. So maybe there's a better way to sort out minimum requirements with modules. If we can't find one then I'd opt to remove this requirement and figure out a different way. If we want to have it use 1.0.1 right now, we could probably manually edit the module files.

@havok2063
Copy link
Collaborator

@joelbrownstein
Copy link
Contributor

I agree this would be annoying to tag tree everytime we tag sdss_access.

I appreciate the advanced module stuff you found, but let's not implement something that would require modules itself to be managed.

Let's simplify this by removing the explicit version from tree's setup.py, i.e.,

module load sdss_access
prereq sdss_access

and allow the "default" version to be manually adjusted as needed.

@havok2063
Copy link
Collaborator

That sounds good to me.

@andycasey
Copy link
Contributor Author

Thanks @havok2063 and @joelbrownstein

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants