-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LCC and Dewey decimal numbers to solr in April solr reindex #3290
Comments
Sample data as stored in OL:
|
Great starting point! For (my personal) reference: I notice some of these have [Fic] or [E]. Because of this being less reliable (like there are multiple E's), I'm assuming that the LOC would be the better ID to start with (i.e. more specific), especially since those books do have it. I'm assuming it'll be both for each edition in the end, so I won't worry. |
This seems super low priority. I assume that the MANY other Solr bug fixes and feature requests will be waaayyy ahead of this. |
@tfmorris Here's my assumption of it: this first step is small, but what comes next is important when genres/sub-genres could be attached to editions - this'll clean up and help with the subject pages. |
@finnless suggested that I look at https://github.com/thisismattmiller/lcc-pdf-to-json which made it easy for me to create a lc_classifier function: def lcc_to_subject(lcc: str) -> str:
"""
>>> lcc_to_subject("ZA3201")
'Information superhighway'
""" The output could be |
@cclauss That looks awesome! Hmmm, we should display these on pages we have an LCC; maybe something like this? Then once these are in solr, we can make each level clickable, leading to search page 😍 . But I don't think this needs to be blocked by that happening. They still add value + improve SEO even if they're just text! |
@cdrini I spoke with @cclauss and thought up a kind of a new idea/way of thinking about it. We could do both my idea and your format, that's fine. I just want to say mine and how it'll look like with yours: |
@BrittanyBunk and I slacked on this 12 hours ago and she proposed the same two-letter thing. The letters A, D, and J threw me because that table does not provide single-letter meanings but she provided them to me. So I will propose a new PR that shows us how to get the first three classifications so we see how it looks and then we can choose wether use just the two letters or letters plus numbers. |
@cclauss ok. I see why you're coming into issues. The site you showed me is incomplete (as it's used for dewey dec conversions, and dewey dec is not as robust as the LoC). The official one to use is complete. This equivalent should be the complete version to use (I'd just download it to a doc just in case it gets changed) (although it might need to be double checked just in case). |
Long keys: [DAW, DJK, KBM, KBP, KBR, KBS, KBT, KBU, KD/KDK, KDZ, KJ-KKZ, KL-KWX, KU/KUQ] That parses to 230 records:
|
Cool! So now that we have that, we could use this for the DDC too! I tried to create an excel with LCC -> DDC and vice versa, but didn't get far enough. Maybe it could be coded, but here's the start: https://drive.google.com/file/d/1Yu-srlXD_FcUUTRV9lwseXrR7qEsNQ9a/view?usp=sharing |
UI-wise, let's start with just |
Agreed. Let’s also get LC classes working smoothly & consistently before also doing DDC. The .pdf you added is great but highlights the complexity of getting it right. |
Baby steps :) To quote one of my new favourite laws (thanks @LeadSongDog !)
|
@cdrini Just so we're all on the same page, you mean what's on https://www.loc.gov/catdir/cpso/lcco/ only right? Like https://openlibrary.org/books/OL103608M/Johann_Wolfgang_Goethe_Faust-Dichtungen. would show "Language and Literature" or what's in the image you posted? |
I mean the classes section displayed here: #3290 (comment) |
Outline is the title of that specific page because the contents of the page is an outline of the LC Classification system. Let's make the heading what it is "Library of Congress Classification" or "LC Classification." Would be nice to include the call number itself, similar to what is displayed on the LC catalog record as 'browse by shelf order" with the added functionality of the class names themselves as links. See https://catalog.loc.gov/vwebv/holdingsInfo?searchId=33155&recCount=25&recPointer=0&bibId=21468183, about half-way down the record. |
@seabelis thanks for helping me out.
|
You are showing this at the edition level. I understood this was going to be used at the work-level. |
I didn't see it being there unless we're calling it a 'genre'. However, since it's a classification based on a classification number of a book, this seems to be a better place. I will double check on this right now. |
Why? This is not a genre. This is a classification. |
Exactly! That's why it goes underneath the classification tab, as each edition has a different call number. Even though most are in the same category as each other, this is just in case they aren't: |
I can't find one right now, so I wouldn't know where would be better - the works page or under Classifications.
|
Yes, it is a little awkward since the LCC is stored on the edition (and can vary!). But I think we should display it on the work near the subjects section, because I think that will make it easier to find for non-librarian users. I chose the word "classes" instead of "Classifications" or "LCC" also hoping that might be easier for novice users. When DDC are eventually added, they can also appear in the "classes" section. I think we need to wait for some ui demos to be implemented to see how these look and feel before we can make a final decision :) |
@cdrini Having it on the works page should be fine, as the LCC should be the same for all the editions - they all should be of the same topic. I didn't use 'classes' as it's a combination of classes and subclasses (so I got confused), but I see what you mean. I'll wait until those are finished then before proceeding further. |
@cdrini There are 48 open Solr bugs, some over a decade old. Do none of them meet your criteria? If you want help choosing, I'll suggest #178 which is small, self-contained, HUGELY impactful, and just over a decade old, having been first reported March 13, 2010. By simply changing the definition of a single field, the author's name, users will now be able to find this record with 7 1 works for René-Aubert Vertot rather than this orphan with a single work when they search for Rene Vertot. If you search for Renee Shann, you won't find ANY of the 107 works that OpenLibrary has cataloged. If a librarian like @seabelis wanted to merge the 9 different Rene Char records, they'd need to search twice and then stitch the results together by hand. In the face of all this, and they myriad other Solr issues, we're going to invent an entirely new, never before requested, issue to waste time on? That is doing our patrons a HUGE disservice. |
@tfmorris I don't like getting involved in other people's discussions, but some things are important to say. This github issue that @cdrini's working on is something that's been going on for a while and requires a lot of people's help and right now's the moment that the resources are here. Also, doing this helps with future developments. It's an infrastructure that will make books easier to find - that includes the #178 you mentioned. Drini mentioned in the community meeting that reindexing the solr project is going to fix the inability to search by non-English characters, so it seems a little misguided. I would read up on the community meeting notes, especially 3-31-20 - which shows what I mean. |
Sorry @cdrini for continuing after you said to you wanted to focus on getting the indexing right, but since the labeling was discussed in the meeting, I just wanted to give another input on this: I realize that 'LCC titles' might be more appropriate than 'Library of Congress Classification'. The reason is that the call number is already called that on the OL and the LCCO page says that it's letters (and I'm assuming numbers) and 'titles' of an LC classification (first sentence) and the LoC calls the call number the LC classification (although they are inconsistent on some pages)." @seabelis @cclauss |
@BrittanyBunk Thanks Brittany! Reindexing is one of the blockers for allowing us to work on #178; it unfortunately doesn't impact the issue itself; that must've been a typo in the notes. @tfmorris Yep; I'm aware of that issue. I believe updating to solr 8 (#3317) is more important (which is why I've also taken that up in my milestone for this month). Trying to fix #178 before #3317 would require investing time into installing solr 3.6 specific plugins / config, all of which would have to get redone once we do #3317. We've had this discussion before; one of my first PRs on openlibrary was a fix to #178, #599 ; So I'm fully aware of how important that issue is. We decided that although using ASCII Folding (which is what I did in #599 ) was an improvement, it wasn't that great for non-English languages, and that ICUFolding Filter (as you've done in your solr PR) was most correct ( See #599 (comment) ). This filter requires us to add plugins to solr (which I even did on a branch off #599). But I think adding plugins to solr in 3.6 would be a waste of time, since my guess would be the plugin flow has changed. I worked on and completed the first issue that was blocking #178 (re-indexable solr), and am planning on the second which is ~blocking #178 (#3317). I am also working on this current issue, because it addresses issues brought up in one of the community calls, it made a lot of people (myself included) excited, provides infrastructure which will allow for a whole host of features that will improve the user experience, because it allows us to take advantage of a librarian standard carefully curated data field, because it further tests the solr re-index flow, and because it showcases the importance of a re-indexable solr to people who might not realize how important it is. I apologize if that wasn't clearly communicated, but I wish you would be a little less quick to jump to accusations. We have been and are aligned on a lot of the same over all goals, and I am making progress towards them. |
I'm happy to accept #599 as a temporary patch until we get ICUFolding with solr 8; does that seem reasonable? I can include that in the May 1 solr reindex. |
@cdrini Maybe the notes can be fixed? Also, how come your trying to work #718 that's already setup and closed? I would like to setup an enthusiast and beta tester's (EBT) wiki, but it's difficult with the current situation lol. Guess I'll wait until the Solr reindex, data dump, and series are finished before starting. If everything's setup, newcomers can move onto the next steps with the tools they need and not worry about anything unnecessary. It's really awesome to see everything come together, step-by-step. I would share my vision of the EBT page, but I didn't set it up yet. If anyone wants me to, let me know. |
Ahhh, I meant #178 :P Fixed. I'll fix the notes 👍 |
@cdrini Wow! Clear. I agree - it does make sense to move forward in order to address what's in the back to keep up. That's how I've done it in my life, so it works - I think you got a great plan and look forward to the changes :) Thanks for helping with the corrections. |
I created a new issue for getting the LCC class names from the LCC, since this current issue is going to be closed soon :) See #3396 |
Closed by the PRs mentioned here. Unfortunately it's still living on dev.openlibrary.org , but here's a nice little demo:
|
As discussed in the community call this past Tu, we would like to try implementing some sort of beta interface that lets users explore the LoC classification (or maybe dewey decimal) in openlibrary. See https://www.loc.gov/catdir/cpso/lcco/ . The first step of this would be to store the data into solr (which it currently isn't; e.g http://server.openjournal.foundation:8984/solr/select/?q=key%3A%2Fworks%2FOL3773057W&version=2.2&start=0&rows=10&indent=on vs https://openlibrary.org/books/OL2543776M/Course_Design ).
Describe the problem that you'd like solved
loc:[BC1 TO BC199]
;dewey_decimal:[070 TO 079]
Proposal & Constraints
Additional context
lc_classifications
anddewey_decimal_class
Stakeholders
@cclauss @finnless @tfmorris
The text was updated successfully, but these errors were encountered: