-
-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337
Comments
cc. @ShogunPanda |
I agree with this issue, and I was actually thinking the same. It might be worth it to decouple stemmers and stopwords from Orama core for v1.0.0 stable. |
Stemmers are now exported as |
@ShogunPanda Sorry, but how does it solve the ticket? I meant the ability to import each stemmer separately, but if you export an object Possible solutions for the ticket are as follows:
However, the second option is worse than the first because application bundlers do not know which stemmers from the object will be dynamically imported, so the bundler will create separate chunks for each stemmer, even for those that are not used. These chunks will not actually be used or dynamically loaded by the application if they are not used there, but they will be remnants in the final bundle. |
@skoropadas we’re open to accepting contributions to solving this issue. It looks like you gave enough thought to it to provide a satisfying solution, and I agree with your points. Orama is a monorepo and we would appreciate a PR to add a new package containing the stemmers. Thanks a lot |
Now that stemming is disabled by default we could in theory release each stemmer as separate package (which should be easy given we are in a monorepo). This way only the required stemmer can be included and bundlers won't have to mess with them too much. @micheleriva WDYT? |
@ShogunPanda let’s catch up and offline this early next week |
@ShogunPanda @micheleriva Hey guys, do you have any updates on this feature? Or are you planning to reopen this ticket? Sorry, I don't have much free time I'm doing solo development and maintenance of my open-source project for creating documentation, which is quite large. I'm asking this to understand whether I should wait for you to implement it or if I should try to implement it myself. Just after switching from |
@skoropadas I plan to do that either tomorrow or Friday (I'm in PDT time zone at the moment). |
@ShogunPanda got it! Thanks for the lightning-fast response! :) |
@skoropadas This is now implemented in #376. Hope to land it soon. |
@ShogunPanda looking forward to try it, thank you! |
@skoropadas Beta 16 is out, go grab it! |
@ShogunPanda sry, couldn't check it I've had a vacation. Looks nice, I have an issue with types, typescript says that it cannot find types, but it imports stemmer if I use |
Hi @skoropadas , Thanks |
Is your feature request related to a problem? Please describe.
Orama includes all stemmers by default.
Describe the solution you'd like
I would like to have the ability to include only the stemmers that are necessary for my application. At the moment, I am only using one stemmer for the English language, but other language stemmers are included in my build. As a result, the
orama
package takes up approximately 400kb of my bundle size before minification.Describe alternatives you've considered
I was trying to create my own Map where the key is the language key and value is a dynamic import of a stemmer based on this constant:
https://github.com/oramasearch/orama/blob/590bdc3b2bf2f7d2fd32c35168d607f0924975cc/packages/orama/src/components/tokenizer/languages.ts#L1
But it seems stemmers are not exported, so webpack throws an error. Anyway it would be cool to support this by default and load only stemmers that are needed.
Additional context
Nothing to add.
The text was updated successfully, but these errors were encountered: