Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch all the book with subject Fiction Horror #8170

Closed
ladaniavadh opened this issue Aug 8, 2023 · 14 comments
Closed

Fetch all the book with subject Fiction Horror #8170

ladaniavadh opened this issue Aug 8, 2023 · 14 comments
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Needs: Community Discussion This issue is to be brought up in the next community call. [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Type: Question This issue doesn't require code. A question needs an answer. [managed]

Comments

@ladaniavadh
Copy link

ladaniavadh commented Aug 8, 2023

Question

I need data of the books which have the Subject 'Fiction Horror' (around 11.5K books). Which APIs are best to fetch those details?

Additional context

I need below mentioned fields:
description
covers image
book title
authors
number of Pages
publishDate
ISBN number
Unique key that will be common across all the books edition

Issue resolution criteria

I planned to use below APIs:
https://openlibrary.org/subjects/horror.json > Use this API 12 times with offset and limit to collect all books data

from the above response, for each object
https://openlibrary.org/{key}.json > pass the value of the key from the object to fetch the book's description
https://openlibrary.org/works/{cover_edition_key}.json > pass the value of cover_edition_key from the object to fetch all other details

now my concern is above mentioned workflow is good to fetch the book's details or does any better solution exist.

@ladaniavadh ladaniavadh added Needs: Community Discussion This issue is to be brought up in the next community call. [managed] Needs: Lead Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Type: Question This issue doesn't require code. A question needs an answer. [managed] labels Aug 8, 2023
@cdrini
Copy link
Collaborator

cdrini commented Aug 11, 2023

Hi @ladaniavadh ! That might take a little long to get through everything. If I'm understanding you correctly, you want:

  • edition metadata: description, cover, title/subtitle, number of pages, publish date, isbn, work key
  • work metadata: authors keys, author names

And you want all fiction horror editions from open library, correct?

Your approach will work, but will take a very long time :P I'd recommend something like this:

  1. Fetch work key, author keys, author names using the search.json endpoint:
  2. With the edition keys from the first step, use the get_many API to fetch all the editions:

That should be a much faster/friendlier way to hit our APIs! Does that answer your question?

@ladaniavadh
Copy link
Author

@cdrini
Thank you for your detailed explanation.
Still, I have a few queries:
1 - I want all the ISBN with specific keys and values, as I have checked with the other API https://openlibrary.org/works/{cover_edition_key}.json, for ISBN I am having the values like isbn_10, isbn_13 like that. So how will I get those?

2 - edition_key has a large number of arrays, is there any easy way to fetch all the details that you mentioned in your 2nd API?

@mekarpeles mekarpeles added Priority: 3 Issues that we can consider at our leisure. [managed] Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] and removed Needs: Triage This issue needs triage. The team needs to decide who should own it, what to do, by when. [managed] Needs: Lead labels Aug 14, 2023
@ladaniavadh
Copy link
Author

@cdrini @mekarpeles
Is there any update on the mentioned issue?
If is there no API support at the moment then please revert that by when it will be ready OR suggest me an alternative way to implement the way for which I requested the data.

@cdrini
Copy link
Collaborator

cdrini commented Aug 16, 2023

I'm not fully sure I understand your questions. Could you give me some more details about what you want to use this data for?

  1. If you want you can fetch all the isbns in step 1; add isbn to the url in the fields parameter
  2. I'm not understanding; can you rephrase the question?

@ladaniavadh
Copy link
Author

@cdrini
1 - when I passed ISBN in step-1, it is giving a long list of ISBN numbers, how can I identify which is isbn_10 or isbn_13 or any?
2 - in step-2, i need to add the edition_key after /books/{edition_key} but I am getting the long of edition_key in step-1. is there any in which I have only latest edition_key?

@cdrini
Copy link
Collaborator

cdrini commented Aug 16, 2023

  1. Ah, you have to do that afterwards; they're mostly normalized, so you just need to check the length is 13 if you only care about the isbn 13s.
  2. That's tougher! You only want one edition for each work? The newest edition? Is that correct? Note the edition could be in any language ; our works group editions together that have the same content, regardless of language.

@ladaniavadh
Copy link
Author

ladaniavadh commented Aug 18, 2023

#1 - I want the isbn specifically defined the isbn with its key and value. Is there any way to find that?
#2 - I want only the latest English edition, how can I get that edition key?

@cdrini

@ladaniavadh
Copy link
Author

@cdrini is there any update on the issue after 2weeks?

@cdrini
Copy link
Collaborator

cdrini commented Aug 31, 2023

Howdy! I still don't understand what you're looking for with 1. For 2, There's no convenient way to fetch the latest English edition for each of those works I'm afraid. There's no way to sort the edition results. But let me see if I can add an option for that real quick.

Can you give me more information about what you're trying to do overall with this data? That might help me provide better guidance.

@cdrini
Copy link
Collaborator

cdrini commented Aug 31, 2023

This will largely give you what you want: https://testing.openlibrary.org/search?q=subject_key%3Afiction_horror+language%3Aeng+isbn%3A*+publish_year%3A*+-publisher%3A%22Independently+published%22&mode=everything&editions.sort=new

As json: https://testing.openlibrary.org/search?q=subject_key%3Afiction_horror+language%3Aeng+isbn%3A*+publish_year%3A*+-publisher%3A%22Independently+published%22&mode=everything&editions.sort=new&fields=key,title,subtitle,editions,author_key,author_name,cover_i,number_of_pages_median,editions.isbn,description

Caveats:

  • This limits to books with ISBN and shows the latest english edition with an isbn.
  • Description is missing from here, you'll have to fetch it from the data dump.
  • Exclude independently published because they're usually bad data

What you're asking for is actually complicated, so understanding what you want to do overall would really help a ton!

@cdrini
Copy link
Collaborator

cdrini commented Aug 31, 2023

Ah those queries look a little off actually it seems to be not always sorting them correctly. I'll take a look when I get a chance.

@cdrini
Copy link
Collaborator

cdrini commented Aug 31, 2023

Ok those links should now work as expected

@ladaniavadh
Copy link
Author

@cdrini
I'm actually asking you this as well. Here's what I recommend saying:

We're working primarily in the US market, so we would like to accomplish the following:

  1. Show only one edition for each title
  2. Show titles in English
    1b) Show the latest ISBN for that edition

Knowing that, here's my question:
Is there a date associated with each edition, so we can sort by newest?

If so, then we can also filter editions in English and use the ISBN associated with that.

@cdrini
Copy link
Collaborator

cdrini commented Jun 3, 2024

Yes, the last links I sent you meets all those requirements 👍 Apologies for the delay on this one!

https://openlibrary.org/search?q=subject_key%3Afiction_horror+language%3Aeng+isbn%3A*+publish_year%3A*+-publisher%3A%22Independently+published%22&mode=everything&editions.sort=new

Note:

  • language:eng - this shows only works with an English edition
  • isbn:* - this shows only works with an English edition that also has an isbn
  • editions.sort=new - this uses the date associated with each edition and sorts by newest.

Here it is as JSON: https://openlibrary.org/search.json?q=subject_key%3Afiction_horror+language%3Aeng+isbn%3A*+publish_year%3A*+-publisher%3A%22Independently+published%22&mode=everything&editions.sort=new&fields=key,title,subtitle,editions,author_key,author_name,editions.key,editions.title,editions.subtitle,editions.cover_i,editions.isbn,description

I believe all the requirements have been met here, so closing this; but feel free to re-open or open a new issue if you need help with anything else!

@cdrini cdrini closed this as completed Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Lead: @cdrini Issues overseen by Drini (Staff: Team Lead & Solr, Library Explorer, i18n) [managed] Needs: Community Discussion This issue is to be brought up in the next community call. [managed] Priority: 3 Issues that we can consider at our leisure. [managed] Type: Question This issue doesn't require code. A question needs an answer. [managed]
Projects
None yet
Development

No branches or pull requests

3 participants