-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASIN-only imports from Amazon include inappropriate and duplicate sellers' items #2674
Comments
Sounds very similar to #709 |
Thanks for reporting this one @seabelis ! |
What I'm finding with a lot of these ASINs (and other in the data) is that they 404, these seller ASINs do not seem to be persistent IDs at all. This one is found: I was hoping to find a categorization that indicates that this not a single book, but it does seem lumped in with books. |
Why are ASIN-only items being imported? What is the expectation? It seems there is very little good data to be gotten. I may be incorrect about this, but it seems like most of the data will be 3rd-party seller items that don't have any ISBN or a suitable corresponding ISBN in the amazon catalog; how are duplicates checked on these? Usually the third-party items that fall into this category have limited or inaccurate information. As I have seen on GR, these items get imported over and over if the seller re-lists. ASINs also correspond to kindle and audible items, but those things DO have corresponding ISBNs and should not be creating new items in the catalog unless it's checking for or including ISBN in the new record. |
@hornc |
I noticed that most of them were 404ing as well. How did these ASINs even make it into our import queue? The Steinbeck books item above is a) below #20,000,000 on the top sellers list, so I can't imagine anyone is linking to it and b) has a title that doesn't match any of his works. We should reject items like this if they make it into the queue, but they shouldn't even be in the import queue in the first place. |
@seabelis, there is an intent to import pre-ISBN physical books with AISN, e.g. https://www.amazon.com/Greek-studies-Gilbert-Murray/dp/B0007JAFEA I have not been able to determine a way to tell the difference between an ebook and a pre-ISBN book (I don't think there is a way) I think we do want to import:
but we do not want to import seller bundled items like the examples in this issue report. |
@hornc I think the issue with ASINs is they are just amazon catalog numbers; they don't indicate that something is a book or a unique book. What is the expectation for the imported Kindle and Audible items; will they be imported with their ISBNs? Even Goodreads has separate items for Kindle ( by ASIN) and their corresponding ebook records (by ISBN). I'm not sure if this is a marketing choice or because they cannot import the ISBN for Kindle items. I don't think it's useful for Open Library to mirror the amazon catalog; Goodreads has clear incentive to be. The provided example is a third-party seller item; these usually have low-quality or incomplete data; what is the benefit of importing them? |
A few seconds on Worldcat found that edition and numerous others including translations to Spanish and French. If an AMZ record has title, year, publisher, and author it should be straightforward to find the matching OCLC entry and get some more reliable catalogue data to work with, vastly improving the entry: otoh, when there is no match for these basics in Worldcat, the odds that AMZ has it correct dwindle into insignificance. Absent a match we should ignore the AMZ entry. |
We already have that edition: https://openlibrary.org/books/OL26546996M/Greek_studies Actually we have a 1948 printing of the 1947 reprint edition, but it's cataloged as being published in 1946. What are the odds that Amazon will have a good quality catalog record for an item that exists nowhere else? |
As I understand it, Amazon assigns an ASIN to each item being sold; this does not have to be a single book nor does it have to be sets of books that were originally published as a set. I have recently noticed imports of seller-created bundles that are not appropriate for the OpenLibrary catalog. Such items should be excluded from imports if possible.
Relevant url?
List: https://openlibrary.org/people/seabelis/lists/OL143669L/Bad_ASIN_imports
This looks like it is legitimately a set, but does it need to be represented as such in the catalog? I think records for the individual volumes is sufficient. In any case, it was imported from multiple sellers. https://openlibrary.org/works/OL20110771W
Details
Proposal & Constraints
Related files
Stakeholders
@hornc
The text was updated successfully, but these errors were encountered: