Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable synonyms for scientific terminology #16

Open
Tracked by #18
patrick-austin opened this issue Jan 21, 2022 · 0 comments
Open
Tracked by #18

Enable synonyms for scientific terminology #16

patrick-austin opened this issue Jan 21, 2022 · 0 comments

Comments

@patrick-austin
Copy link
Contributor

Lucene supports synonym files, which can be used for both alternate spellings (e.g. "ionisation" vs "ionization") and scientific terms which are equivalent (such as chemical symbols to element names).

This can either be applied on indexing, on search, or both. It seems that injecting synonyms at search times is the expected approach (https://lucene.apache.org/core/8_5_1/core/org/apache/lucene/analysis/package-summary.html?is-external=true).

The Solr format for synonym files is supported by both elasticsearch, so should be compatible if we move to that.

The other thing to consider is where in the current analyzer we put the synonym injection. To recognise chemical symbols, keeping upper case letters might be useful? However this might be problematic for going the other way round, e.g. hydrogen, Hydrogen and HYDROGEN are all equally valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant