-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spiders: add elsevier spider #285
Conversation
549ec82
to
6887d9f
Compare
7e1bfb3
to
d54acc7
Compare
d54acc7
to
39fe794
Compare
465fa6d
to
1d93ef0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not finished yet, will continue tomorrow, but here are already some comments
4b02edd
to
f2b0fca
Compare
hepcrawl/spiders/elsevier.py
Outdated
def parse_record(self, record): | ||
"""Parse an elsevier XML exported file into a HEP record.""" | ||
with open(record, "r") as f: | ||
elsevier_record = f.read() | ||
|
||
parser = ElsevierParser(elsevier_record) | ||
return ParsedItem(record=parser.parse(), record_format="hep") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one used?
What we would need is a way to reharvest a given record, but we need to be able to provide it as an input to scrapy somehow. I think you can leave it out of the current PR, as it's already taken a lot of time and it's not super-urgent, and create an issue for this so we don't forget it.
f2b0fca
to
70ed16a
Compare
92ecb01
to
c3c1f3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor comments remaining, but looks very good, great job! 👍
Ref: inspirehep/inspirehep#1212