Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access token specified to resync-sync is not exposed to resync.Client instances #45

Closed
anarchivist opened this issue Mar 18, 2021 · 5 comments

Comments

@anarchivist
Copy link

As @zimeon knows I've been trying to put resync through its paces to harvest from the POD data lake, and specifically trying to use resync-sync (to provide a readymade solution). While #37 added support for Bearer tokens (as needed by POD), it appears that the --access-token parameter doesn't propagate down to the instance of resync.Client, namely the update_resource() method.

For what it's worth, would it be possible to pass or instantiate the same header configurations from the CONFIG global set in resync.url_or_file_open, to also allow the delay/backoff, and to set the User-Agent header? Thanks for considering.

Discovered while looking into pod4lib/aggregator#324.

@zimeon
Copy link
Member

zimeon commented Mar 19, 2021

Thanks for finding that @anarchivist! I'd forgotten that some of this code might have been recent enough to use requests

@zimeon
Copy link
Member

zimeon commented Mar 19, 2021

simeon@RottenApple resync> grep -r requests\\. resync/*.py
resync/client.py:                    r = requests.get(resource.uri, timeout=self.timeout, stream=True)
resync/client.py:                except requests.Timeout as e:
resync/client.py:                except (requests.RequestException, IOError) as e:
resync/explorer.py:            response = requests.head(uri)
resync/explorer.py:        """Mock up requests.head(..) response on local file."""
resync/explorer.py:    """Object to mock up requests.head(...) response."""
resync/url_or_file_open.py:    # FIXME - This token will be added blindy to all requests. This is insecure

@zimeon
Copy link
Member

zimeon commented Mar 19, 2021

Seems to make a valid request with token to download actual dump from POD now (haven't waited for completion however):

simeon@RottenApple resync> ./resync-sync -v --capability-list https://pod.stanford.edu/.well-known/resourcesync/normalized-capabilitylist/marcxml --access-token $ACCESS_TOKEN -b  https://pod.stanford.edu/ tmp
Reading capability list https://pod.stanford.edu/.well-known/resourcesync/normalized-capabilitylist/marcxml
Reading resource list https://pod.stanford.edu/organizations/normalized_resourcelist/marcxml
Read sitemap/sitemapindex from https://pod.stanford.edu/organizations/normalized_resourcelist/marcxml
Parsed as sitemapindex, 14 sitemaps
Now reading 14 sitemaps
Reading sitemap from https://pod.stanford.edu/organizations/brown/streams/2020-11-17b/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/chicago/streams/chicago-2021-02/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/columbia/streams/03fc38d1-ba2b-4db0-92ad-c8eb60fa425f/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/cornell/streams/ff13e021-2fbb-4119-87ce-d9f9ca2857ae/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/dartmouth/streams/2021-02-09/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/duke/streams/7546b907-2e75-4f8a-b13d-daf67d18b6cf/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/harvard/streams/8e51c9a1-5df9-453f-8128-3befec88ac79/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/jhu/streams/83e87b39-9317-4fae-96c1-d277a2c2e4e0/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/mit/streams/4fcb7338-53c5-4d2f-a2f3-9d685e8fbde8/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/penn/streams/2021-03-18/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/princeton/streams/9c5ef1ad-243a-46bb-9d9d-c4fbde2a8a94/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/stanford/streams/20210221/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/yale/streams/b22453d9-8917-4c57-8a55-d3c554e1b16f/normalized_resourcelist/marcxml (0 bytes)
Reading sitemap from https://pod.stanford.edu/organizations/yultest/streams/2fb8c910-6cbb-4b94-89b2-6f1dac97c6bb/normalized_resourcelist/marcxml (0 bytes)
Read source resource list, 24 resources listed
Scanning disk from tmp
Ignoring file 'tmp' (error: [Errno 2] No such file or directory: 'tmp')
Status:     NOT IN SYNC (same=0, to create=24, to update=0, to delete=0)
Will GET 24 resources
created: https://pod.stanford.edu/file/4670/duke-2020-12-13-full-marcxml.xml.gz -> tmp/file/4670/duke-2020-12-13-full-marcxml.xml.gz
^C

@anarchivist
Copy link
Author

Fantastic, thanks @zimeon - I can confirm it seems to be working to fetch the normalized dumps based on what's in develop.

@zimeon
Copy link
Member

zimeon commented Mar 23, 2021

Released in 2.0.1

@zimeon zimeon closed this as completed Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants