-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APS spider: use LastRunStore spider #261
APS spider: use LastRunStore spider #261
Conversation
hepcrawl/spiders/aps_spider.py
Outdated
@@ -199,3 +213,6 @@ def _get_authors_and_collab(self, article): | |||
|
|||
def _file_name_from_url(self, url): | |||
return "{}.xml".format(url[url.rfind('/') + 1:]) | |||
|
|||
def make_file_fingerprint(self, set_): | |||
return u'metadataPrefix={}&set={}'.format(self.format, set_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the format
coming from? do you really need that part? it seems OAI-PMH specific. Are you sure this really works?
710a20c
to
0efd08a
Compare
hepcrawl/spiders/aps_spider.py
Outdated
self.message = u"Failed to load file at {} for set {}".format( | ||
file_path, | ||
set_, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
2307c01
to
f851cde
Compare
hepcrawl/spiders/aps_spider.py
Outdated
|
||
.. _See documentation here: | ||
http://harvest.aps.org/docs/harvest-api#endpoints | ||
|
||
Journals are not supported as a parameter anymore for APS Spider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note:
Selecting specific journals is not supported for technical reasons as it's incompatible with the way the last run time is stored.
3a19a63
to
4e336d3
Compare
'per_page': self.per_page, | ||
'date': self.date | ||
} | ||
return furl(APSSpider.aps_base_url).add(params).url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the None
checks that were done previously useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried it out and it does matter, so changing it back to the old way with the checks
Signed-off-by: Victor Balbuena <vbalbp@gmail.com>
4e336d3
to
93be8b3
Compare
Signed-off-by: Victor Balbuena vbalbp@gmail.com