[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

skoropadas · 2023-04-03T16:02:25Z

Is your feature request related to a problem? Please describe.
Orama includes all stemmers by default.

Describe the solution you'd like
I would like to have the ability to include only the stemmers that are necessary for my application. At the moment, I am only using one stemmer for the English language, but other language stemmers are included in my build. As a result, the orama package takes up approximately 400kb of my bundle size before minification.

Describe alternatives you've considered
I was trying to create my own Map where the key is the language key and value is a dynamic import of a stemmer based on this constant:
https://github.com/oramasearch/orama/blob/590bdc3b2bf2f7d2fd32c35168d607f0924975cc/packages/orama/src/components/tokenizer/languages.ts#L1

But it seems stemmers are not exported, so webpack throws an error. Anyway it would be cool to support this by default and load only stemmers that are needed.

Additional context
Nothing to add.

The text was updated successfully, but these errors were encountered:

micheleriva · 2023-04-06T23:15:40Z

cc. @ShogunPanda

micheleriva · 2023-04-06T23:19:42Z

I agree with this issue, and I was actually thinking the same. It might be worth it to decouple stemmers and stopwords from Orama core for v1.0.0 stable.

ShogunPanda · 2023-04-20T15:58:05Z

Stemmers are now exported as stemmers and the dynamic import is gone.
It will go out in beta 10. Closing this.

skoropadas · 2023-04-21T18:29:28Z

@ShogunPanda Sorry, but how does it solve the ticket?

I meant the ability to import each stemmer separately, but if you export an object stemmers from the library that contains all the stemmers, this means that they will all be included in the final application bundle, regardless of how many keys of the object I use.

Possible solutions for the ticket are as follows:

Create a separate package with the stemmers that can be imported from @orama/stemmers and passed to the create method, or create a separate folder inside the @orama/orama package from which the stemmers can be imported. However, the @orama/orama package itself should not import these stemmers within its own functions and methods.
You can proceed as you did by creating an object stemmers, but then dynamic imports should be used as values for each key, which the user can use to load the stemmer dynamically.

However, the second option is worse than the first because application bundlers do not know which stemmers from the object will be dynamically imported, so the bundler will create separate chunks for each stemmer, even for those that are not used. These chunks will not actually be used or dynamically loaded by the application if they are not used there, but they will be remnants in the final bundle.

micheleriva · 2023-04-21T18:51:03Z

@skoropadas we’re open to accepting contributions to solving this issue. It looks like you gave enough thought to it to provide a satisfying solution, and I agree with your points.

Orama is a monorepo and we would appreciate a PR to add a new package containing the stemmers.

Thanks a lot

ShogunPanda · 2023-04-21T19:42:45Z

Now that stemming is disabled by default we could in theory release each stemmer as separate package (which should be easy given we are in a monorepo). This way only the required stemmer can be included and bundlers won't have to mess with them too much.

@micheleriva WDYT?

micheleriva · 2023-04-21T20:42:06Z

@ShogunPanda let’s catch up and offline this early next week

skoropadas · 2023-05-10T17:08:41Z

@ShogunPanda @micheleriva Hey guys, do you have any updates on this feature? Or are you planning to reopen this ticket?

Sorry, I don't have much free time I'm doing solo development and maintenance of my open-source project for creating documentation, which is quite large. I'm asking this to understand whether I should wait for you to implement it or if I should try to implement it myself.

Just after switching from lunr to orama, the size has increased significantly, and users of my library want to have the smallest bundle size possible because it affects performance and sometimes hosting costs.

ShogunPanda · 2023-05-10T17:09:51Z

@skoropadas I plan to do that either tomorrow or Friday (I'm in PDT time zone at the moment).

skoropadas · 2023-05-10T17:12:07Z

@ShogunPanda got it! Thanks for the lightning-fast response! :)

ShogunPanda · 2023-05-11T23:56:34Z

@skoropadas This is now implemented in #376. Hope to land it soon.

skoropadas · 2023-05-12T05:28:08Z

@ShogunPanda looking forward to try it, thank you!

ShogunPanda · 2023-05-12T13:11:16Z

@skoropadas Beta 16 is out, go grab it!

skoropadas · 2023-05-20T17:18:16Z

@ShogunPanda sry, couldn't check it I've had a vacation. Looks nice, I have an issue with types, typescript says that it cannot find types, but it imports stemmer if I use @ts-ignore, so I will leave this issue for my users.

micheleriva · 2023-05-20T17:43:43Z

Hi @skoropadas ,
this looks like a different problem, would you mind opening a separate issue for that?

Thanks

ShogunPanda closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

skoropadas commented Apr 3, 2023 •

edited

Loading

micheleriva commented Apr 6, 2023

micheleriva commented Apr 6, 2023

ShogunPanda commented Apr 20, 2023

skoropadas commented Apr 21, 2023

micheleriva commented Apr 21, 2023 •

edited

Loading

ShogunPanda commented Apr 21, 2023

micheleriva commented Apr 21, 2023

skoropadas commented May 10, 2023

ShogunPanda commented May 10, 2023

skoropadas commented May 10, 2023

ShogunPanda commented May 11, 2023

skoropadas commented May 12, 2023

ShogunPanda commented May 12, 2023

skoropadas commented May 20, 2023

micheleriva commented May 20, 2023

[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

[Feature] Optimizing Stemmer Usage for Efficient Bundling in Libraries #337

Comments

skoropadas commented Apr 3, 2023 • edited Loading

micheleriva commented Apr 6, 2023

micheleriva commented Apr 6, 2023

ShogunPanda commented Apr 20, 2023

skoropadas commented Apr 21, 2023

micheleriva commented Apr 21, 2023 • edited Loading

ShogunPanda commented Apr 21, 2023

micheleriva commented Apr 21, 2023

skoropadas commented May 10, 2023

ShogunPanda commented May 10, 2023

skoropadas commented May 10, 2023

ShogunPanda commented May 11, 2023

skoropadas commented May 12, 2023

ShogunPanda commented May 12, 2023

skoropadas commented May 20, 2023

micheleriva commented May 20, 2023

skoropadas commented Apr 3, 2023 •

edited

Loading

micheleriva commented Apr 21, 2023 •

edited

Loading