Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-MARC import from archive.org #1952

Merged
merged 7 commits into from Mar 12, 2019

Conversation

hornc
Copy link
Collaborator

@hornc hornc commented Mar 5, 2019

Description: What does this PR achieve? [feature|hotfix|refactor]

Improves the metadata importing from Archive.org items that do no have MARC records.

The following item, https://openlibrary.org/books/OL26746219M , was imported using the https://github.com/internetarchive/openlibrary/wiki/Endpoints#importing POST /api/import/ia endoint "require_marc": "false" option, and had some issues / missing metadata. This PR fixes those.

  • Correct language format: /languages/por instead of por
  • Import all subjects
  • Import description, if available
  • Import publisher and publication date
  • Import other identifiers if avaialable (ISBN, OCLC, LCCN)
  • Change history message for archive.org imported items to: "Initial record created, from Internet Archive item record." to avoid saying MARC when there may have been no MARC involved.

Tagging @tfmorris who noticed these issues too.

Technical: What should be noted about the implementation?

Requires #1979 (for local testing), wait until that PR is merged before merging this one.

Testing: Steps for reviewer to reproduce / verify this PR fixes the problem?

Evidence: If this PR touches UI, please post evidence (screenshot) of it behaving correctly:

@hornc hornc added the WIP label Mar 5, 2019
@hornc hornc removed the WIP label Mar 12, 2019
@hornc hornc changed the title WIP: Feature/no marc import Non-MARC import from archive.org Mar 12, 2019
@@ -1,6 +1,5 @@
"""Library for interacting wih archive.org.
"""
from __future__ import print_function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed? Because the print functions were removed? (this was for python 3 compat.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no uses of print in the module.

if subject:
d["subjects"] = isinstance(subject, list) and subject or [subject]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we not need this checking logic?

Copy link
Member

@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, a few questions

@mekarpeles mekarpeles merged commit 8e964b5 into internetarchive:master Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants