Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Cleanup: Populate blank publisher fields from ISBN prefix when known #2119

Open
LeadSongDog opened this issue May 9, 2019 · 1 comment
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Theme: Publishers Type: Refactor/Clean-up Issues related to reorganization/clean-up of data or code (e.g. for maintainability). [managed]
Projects

Comments

@LeadSongDog
Copy link

Many edition records have no publisher shown, but do have an ISBN. Previous discussion at #895 shows how to get an official spelling of the publisher from the ISBN.

@brad2014 brad2014 added the Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] label May 10, 2019
@brad2014 brad2014 added Affects: Data Issues that affect book/author metadata or user/account data. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Refactor/Clean-up Issues related to reorganization/clean-up of data or code (e.g. for maintainability). [managed] and removed Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Type: Feature Request Issue describes a feature or enhancement we'd like to implement. [managed] labels May 10, 2019
@hornc
Copy link
Collaborator

hornc commented Jun 26, 2019

To give an idea of the scope of this:

in the May 2019 edition dump there are

grep -cv '"publishers":' ol_dump_editions_2019-05-31.txt
1,189,309
editions without publishers. 169,759 of those have ISBNs.

@xayhewalo xayhewalo added this to Un-Triaged in Triage Oct 18, 2019
@xayhewalo xayhewalo added Priority: 3 Issues that we can consider at our leisure. [managed] State: Backlogged Theme: Publishers labels Nov 15, 2019
@xayhewalo xayhewalo moved this from Un-Triaged to Triaged in Triage Nov 15, 2019
@mekarpeles mekarpeles added the Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] label Dec 18, 2019
@hornc hornc removed their assignment Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Data Issues that affect book/author metadata or user/account data. [managed] Lead: @hornc Issues overseen by Charles (Staff: Data Engineering Lead) [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Theme: Identifiers Issues related to ISBN's or other identifiers in metadata. [managed] Theme: Publishers Type: Refactor/Clean-up Issues related to reorganization/clean-up of data or code (e.g. for maintainability). [managed]
Projects
No open projects
Triage
  
Triaged
Development

No branches or pull requests

5 participants