Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spiders: add elsevier spider #285

Merged
merged 1 commit into from
Oct 6, 2020
Merged

Conversation

MJedr
Copy link
Contributor

@MJedr MJedr commented Sep 18, 2020

@MJedr MJedr force-pushed the elsevierspider branch 4 times, most recently from 549ec82 to 6887d9f Compare September 21, 2020 07:29
@MJedr MJedr force-pushed the elsevierspider branch 6 times, most recently from 7e1bfb3 to d54acc7 Compare September 21, 2020 09:52
hepcrawl/parsers/elsevier.py Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
@MJedr MJedr force-pushed the elsevierspider branch 3 times, most recently from 465fa6d to 1d93ef0 Compare September 24, 2020 15:46
Copy link
Contributor

@michamos michamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not finished yet, will continue tomorrow, but here are already some comments

hepcrawl/spiders/elsevier_spider.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
@MJedr MJedr force-pushed the elsevierspider branch 5 times, most recently from 4b02edd to f2b0fca Compare September 25, 2020 15:29
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier.py Outdated Show resolved Hide resolved
Comment on lines 200 to 206
def parse_record(self, record):
"""Parse an elsevier XML exported file into a HEP record."""
with open(record, "r") as f:
elsevier_record = f.read()

parser = ElsevierParser(elsevier_record)
return ParsedItem(record=parser.parse(), record_format="hep")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one used?

What we would need is a way to reharvest a given record, but we need to be able to provide it as an input to scrapy somehow. I think you can leave it out of the current PR, as it's already taken a lot of time and it's not super-urgent, and create an issue for this so we don't forget it.

Copy link
Contributor

@michamos michamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments remaining, but looks very good, great job! 👍

hepcrawl/spiders/elsevier_spider.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier_spider.py Outdated Show resolved Hide resolved
hepcrawl/spiders/elsevier_spider.py Outdated Show resolved Hide resolved
@MJedr MJedr merged commit 4c733fd into inspirehep:master Oct 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants