Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

Closed
skoropadas opened this issue Apr 3, 2023 · 15 comments
Closed

Comments

@skoropadas
Copy link

skoropadas commented Apr 3, 2023

Is your feature request related to a problem? Please describe.
Orama includes all stemmers by default.

Describe the solution you'd like
I would like to have the ability to include only the stemmers that are necessary for my application. At the moment, I am only using one stemmer for the English language, but other language stemmers are included in my build. As a result, the orama package takes up approximately 400kb of my bundle size before minification.

Describe alternatives you've considered
I was trying to create my own Map where the key is the language key and value is a dynamic import of a stemmer based on this constant:
https://github.com/oramasearch/orama/blob/590bdc3b2bf2f7d2fd32c35168d607f0924975cc/packages/orama/src/components/tokenizer/languages.ts#L1

But it seems stemmers are not exported, so webpack throws an error. Anyway it would be cool to support this by default and load only stemmers that are needed.

Additional context
Nothing to add.

@micheleriva
Copy link
Member

cc. @ShogunPanda

@micheleriva
Copy link
Member

I agree with this issue, and I was actually thinking the same. It might be worth it to decouple stemmers and stopwords from Orama core for v1.0.0 stable.

@ShogunPanda
Copy link
Contributor

Stemmers are now exported as stemmers and the dynamic import is gone.
It will go out in beta 10. Closing this.

@skoropadas
Copy link
Author

@ShogunPanda Sorry, but how does it solve the ticket?

I meant the ability to import each stemmer separately, but if you export an object stemmers from the library that contains all the stemmers, this means that they will all be included in the final application bundle, regardless of how many keys of the object I use.

Possible solutions for the ticket are as follows:

  • Create a separate package with the stemmers that can be imported from @orama/stemmers and passed to the create method, or create a separate folder inside the @orama/orama package from which the stemmers can be imported. However, the @orama/orama package itself should not import these stemmers within its own functions and methods.

  • You can proceed as you did by creating an object stemmers, but then dynamic imports should be used as values for each key, which the user can use to load the stemmer dynamically.

However, the second option is worse than the first because application bundlers do not know which stemmers from the object will be dynamically imported, so the bundler will create separate chunks for each stemmer, even for those that are not used. These chunks will not actually be used or dynamically loaded by the application if they are not used there, but they will be remnants in the final bundle.

@micheleriva
Copy link
Member

micheleriva commented Apr 21, 2023

@skoropadas we’re open to accepting contributions to solving this issue. It looks like you gave enough thought to it to provide a satisfying solution, and I agree with your points.

Orama is a monorepo and we would appreciate a PR to add a new package containing the stemmers.

Thanks a lot

@ShogunPanda
Copy link
Contributor

Now that stemming is disabled by default we could in theory release each stemmer as separate package (which should be easy given we are in a monorepo). This way only the required stemmer can be included and bundlers won't have to mess with them too much.

@micheleriva WDYT?

@micheleriva
Copy link
Member

@ShogunPanda let’s catch up and offline this early next week

@skoropadas
Copy link
Author

@ShogunPanda @micheleriva Hey guys, do you have any updates on this feature? Or are you planning to reopen this ticket?

Sorry, I don't have much free time I'm doing solo development and maintenance of my open-source project for creating documentation, which is quite large. I'm asking this to understand whether I should wait for you to implement it or if I should try to implement it myself.

Just after switching from lunr to orama, the size has increased significantly, and users of my library want to have the smallest bundle size possible because it affects performance and sometimes hosting costs.

@ShogunPanda
Copy link
Contributor

@skoropadas I plan to do that either tomorrow or Friday (I'm in PDT time zone at the moment).

@skoropadas
Copy link
Author

@ShogunPanda got it! Thanks for the lightning-fast response! :)

@ShogunPanda
Copy link
Contributor

@skoropadas This is now implemented in #376. Hope to land it soon.

@skoropadas
Copy link
Author

@ShogunPanda looking forward to try it, thank you!

@ShogunPanda
Copy link
Contributor

@skoropadas Beta 16 is out, go grab it!

@skoropadas
Copy link
Author

@ShogunPanda sry, couldn't check it I've had a vacation. Looks nice, I have an issue with types, typescript says that it cannot find types, but it imports stemmer if I use @ts-ignore, so I will leave this issue for my users.
image

@micheleriva
Copy link
Member

Hi @skoropadas ,
this looks like a different problem, would you mind opening a separate issue for that?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants