-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable synonyms for scientific terminology #16
Labels
Comments
5 tasks
patrick-austin
added a commit
that referenced
this issue
Jan 24, 2022
patrick-austin
added a commit
that referenced
this issue
Feb 2, 2022
patrick-austin
added a commit
that referenced
this issue
Feb 16, 2022
patrick-austin
added a commit
that referenced
this issue
Feb 23, 2022
patrick-austin
added a commit
that referenced
this issue
Feb 23, 2022
patrick-austin
added a commit
that referenced
this issue
Jul 7, 2022
patrick-austin
added a commit
that referenced
this issue
Aug 17, 2022
patrick-austin
added a commit
that referenced
this issue
Oct 17, 2022
patrick-austin
added a commit
that referenced
this issue
Sep 6, 2023
Add synonym injection on search #16
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Lucene supports synonym files, which can be used for both alternate spellings (e.g. "ionisation" vs "ionization") and scientific terms which are equivalent (such as chemical symbols to element names).
This can either be applied on indexing, on search, or both. It seems that injecting synonyms at search times is the expected approach (https://lucene.apache.org/core/8_5_1/core/org/apache/lucene/analysis/package-summary.html?is-external=true).
The Solr format for synonym files is supported by both elasticsearch, so should be compatible if we move to that.
The other thing to consider is where in the current analyzer we put the synonym injection. To recognise chemical symbols, keeping upper case letters might be useful? However this might be problematic for going the other way round, e.g. hydrogen, Hydrogen and HYDROGEN are all equally valid.
The text was updated successfully, but these errors were encountered: