-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add solr support for synonyms for numbers/abbreviations #6635
Comments
I think the solution for this would be to make use of solr's synonyms feature. But some experimenting / investigation needed. Anyone who has some time to experiment with adding synonyms to solr, please do! |
@cdrini I would like to do it, how can I? |
The search is strict not only with terms, but also with letters. Compare Безпека життєдіяльності and Безпека життєдіяльност. With just one letter missing (і) there are no results |
So this is a solr research task; here are some of places where it will need modifications: The solr schema which defines the various type of text fields has synonyms enabled -- but only at query time: openlibrary/conf/solr/conf/managed-schema Lines 426 to 467 in 82bc2f6
This blog post has some info: https://library.brown.edu/create/digitaltechnologies/using-synonyms-in-solr/ In a nutshell we need synonyms inside https://github.com/internetarchive/openlibrary/blob/ccabd95be2a82c4f79d94b1f10e46ea1d3c5c730/conf/solr/conf/synonyms.txt And then test locally with a full reindex (See https://github.com/internetarchive/openlibrary/wiki/Solr#making-changes-to-solr-config ) But for numbers, they probably need to be in English only for now? I'm not sure how we should handle non-English numbers. Ideally we'd want different synonyms files for different user locales, but I'm not sure if/how to do this in solr. |
But we can definitely add something like |
Actually it looks like the synonyms file is working! You can see the So adding volume should be easy enough! |
@bicolino34 For your issue, that would probably be handled by solr's spell checking features. So having something like "Did you mean?" when a user's query is close to be not perfectly correct. Would you mind creating a separate issue to add support for "Did you mean?" ? That'll require a different approach on the solr side, but would help users a ton! |
I've been using the website for a long time now and one of my biggest gripes is how searching works. When searching for books in OpenLibrary, you often need to write exactly the correct title. This means that if a book uses words for numbers (One, Two, Three etc), searching the same title with digits (1, 2, 3 etc) would give no result.
Another example is if a book uses "Vol." in the title, searching "volume" would net no result even though they mean the same thing. This makes finding specific books a lot more difficult.
Describe the problem that you'd like solved
The search engine searches exact terms, but it should have tolerance when dealing with numbers or words of equivalent meaning.
Here's an example:
I would like searching "The Walking Dead Compendium Four" and "The Walking Dead Compendium 4" to find the book.
Proposal & Constraints
The search engine should be error tolerant to words of the same meaning.
"Vol." should be the same as writing "Volume"
"Two" should be the same as writing "2" or "II"
"&" and "and" should also be interchangeable.
Additional context
Another example, but with "vol" and "volume"
The text was updated successfully, but these errors were encountered: