Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mod] lower memory footprint by lazy loading JSON data #3443

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

return42
Copy link
Member

This patch implements lazy loading of the JSON data.

Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files:

  • currencies.json ~550KB
  • engine_descriptions.json ~1,3MB
  • external_bangs.json ~1,3MB
  • osm_keys_tags.json ~ 2,2MB

most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker.

This patch implements lazy loading of the JSON data.

Motivation: in most requests not all JSON data is needed, but loaded.  By
example these four JSON files:

- currencies.json ~550KB
- engine_descriptions.json ~1,3MB
- external_bangs.json ~1,3MB
- osm_keys_tags.json ~ 2,2MB

most often not used and consume a lot of memory and BTW they also extend the
time required to instantiate a walker.

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Copy link
Member

@Bnyro Bnyro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected from testing 👍

@dalf
Copy link
Member

dalf commented May 4, 2024

There are two points here:

  • speed up the start of app
  • lower the memory footprint

The speed up is clear.

The memory footprint is different: an long running instance will have the same memory footprint as now. It requires one request on:

  • the currency engine
  • one on OSM engine
  • one on ddg definitions info or unit conversion
  • on engines tab of the preferences pages (okay, the mouse has to be over an engine name)

==> it won't reduce the memory footprint of darmarit.org/searx/ , paulgo.io, searx.be for example (according the stats)

IMO, the solution is sqlite: #2633

dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* #2633
* #3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 4, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
@return42
Copy link
Member Author

return42 commented May 4, 2024

The memory footprint is different: an long running instance

Yes, this is known .. I assumed uWSGI worker processes do not live long .. but TBH I don't really know how many request processed before a new process is spawned --> max-requests

@dalf
Copy link
Member

dalf commented May 4, 2024

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

@return42
Copy link
Member Author

return42 commented May 4, 2024

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

I would have expected the processes to be restarted regularly ... I would be surprised if the processes should live indefinitely and process millions of requests (memory leaks?) ... but I can't find any reasonable documentation either ... we could customize our wsgi config and add max-config next to lazy-apps

lazy-apps = true

@dalf
Copy link
Member

dalf commented May 4, 2024

Same I can't find for sure the default of max_requests.
Even the source code is not clear for me:
https://github.com/unbit/uwsgi/blob/353b7dd19c9af762f3874ed46a604766e1d7c6d5/core/uwsgi.c#L279

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

we could customize our wsgi config

The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

return42 added a commit to return42/searxng that referenced this pull request May 5, 2024
As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment.  To give just one example, the workers run indefinitely
and the memory leaks aggregate.

- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
  leak over a week: the memory never dropped to the initial value. Same for my
  instance using Docker." [3]

[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] searxng#3443 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member Author

return42 commented May 5, 2024

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

Wow, never thought the defaults of uWSGI are not suitable for a productive environment / we can discuss here in this PR:

Since you have now sent PR

we should focus on the SQL solution .. I change this PR to DRAFT.


The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

Not related to this PR but in general we should not weight deployment questions over improvements of SearXNG core.

@return42 return42 marked this pull request as draft May 5, 2024 08:13
@return42
Copy link
Member Author

return42 commented May 9, 2024

Superseded by #3458

@return42 return42 closed this May 9, 2024
dalf added a commit to dalf/searxng that referenced this pull request May 9, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
@return42 return42 reopened this May 11, 2024
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
dalf added a commit to dalf/searxng that referenced this pull request May 18, 2024
To reduce memory usage, use a SQLite database to store the engine descriptions.
A dump of the database is stored in Git to facilitate maintenance,
especially the pull requests made automatically every month.

Related to
* searxng#2633
* searxng#3443
return42 added a commit to return42/searxng that referenced this pull request May 28, 2024
As stated in .. and other posts, the defaults of uWSGI not suitable for a
productive environment.  To give just one example, the workers run indefinitely
and the memory leaks aggregate.

- "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1]
- "Configuring uWSGI for Production Deployment" [2]
- "When Paul has tested some PR on his instance, we could clearly see a memory
  leak over a week: the memory never dropped to the initial value. Same for my
  instance using Docker." [3]

[1] https://av.tib.eu/media/44810
[2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/
[3] searxng#3443 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants