[mod] lower memory footprint by lazy loading JSON data #3443

return42 · 2024-04-29T16:46:23Z

This patch implements lazy loading of the JSON data.

Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files:

currencies.json ~550KB
engine_descriptions.json ~1,3MB
external_bangs.json ~1,3MB
osm_keys_tags.json ~ 2,2MB

most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker.

This patch implements lazy loading of the JSON data. Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files: - currencies.json ~550KB - engine_descriptions.json ~1,3MB - external_bangs.json ~1,3MB - osm_keys_tags.json ~ 2,2MB most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

Bnyro

Works as expected from testing 👍

dalf · 2024-05-04T06:20:32Z

There are two points here:

speed up the start of app
lower the memory footprint

The speed up is clear.

The memory footprint is different: an long running instance will have the same memory footprint as now. It requires one request on:

the currency engine
one on OSM engine
one on ddg definitions info or unit conversion
on engines tab of the preferences pages (okay, the mouse has to be over an engine name)

==> it won't reduce the memory footprint of darmarit.org/searx/ , paulgo.io, searx.be for example (according the stats)

IMO, the solution is sqlite: #2633

To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443

To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * #2633 * #3443

To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443

return42 · 2024-05-04T17:37:59Z

The memory footprint is different: an long running instance

Yes, this is known .. I assumed uWSGI worker processes do not live long .. but TBH I don't really know how many request processed before a new process is spawned --> max-requests

dalf · 2024-05-04T18:22:10Z

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

return42 · 2024-05-04T18:44:36Z

As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant.

I would have expected the processes to be restarted regularly ... I would be surprised if the processes should live indefinitely and process millions of requests (memory leaks?) ... but I can't find any reasonable documentation either ... we could customize our wsgi config and add max-config next to lazy-apps

searxng/dockerfiles/uwsgi.ini

Line 21 in dbed8da

lazy-apps = true

searxng/utils/templates/etc/uwsgi/apps-available/searxng.ini

Line 41 in dbed8da

lazy-apps = true

dalf · 2024-05-04T18:56:59Z

Same I can't find for sure the default of max_requests.
Even the source code is not clear for me:
https://github.com/unbit/uwsgi/blob/353b7dd19c9af762f3874ed46a604766e1d7c6d5/core/uwsgi.c#L279

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

we could customize our wsgi config

The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

As stated in .. and other posts, the defaults of uWSGI not suitable for a productive environment. To give just one example, the workers run indefinitely and the memory leaks aggregate. - "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1] - "Configuring uWSGI for Production Deployment" [2] - "When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker." [3] [1] https://av.tib.eu/media/44810 [2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/ [3] searxng#3443 (comment) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

return42 · 2024-05-05T08:10:29Z

When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.

Wow, never thought the defaults of uWSGI are not suitable for a productive environment / we can discuss here in this PR:

[mod] uWSGI config: configuring uwsgi for production #3460

Since you have now sent PR

data: currencies, engine descriptions and osm_keys_tags: use SQLite instead of JSON #3458

we should focus on the SQL solution .. I change this PR to DRAFT.

The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ?

Not related to this PR but in general we should not weight deployment questions over improvements of SearXNG core.

return42 · 2024-05-09T15:39:29Z

Superseded by #3458

To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443

As stated in .. and other posts, the defaults of uWSGI not suitable for a productive environment. To give just one example, the workers run indefinitely and the memory leaks aggregate. - "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1] - "Configuring uWSGI for Production Deployment" [2] - "When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker." [3] [1] https://av.tib.eu/media/44810 [2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/ [3] searxng#3443 (comment) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

return42 force-pushed the lazy-data branch from 1c8a785 to 82fd0da Compare April 29, 2024 17:02

return42 marked this pull request as ready for review April 30, 2024 03:55

return42 mentioned this pull request Apr 30, 2024

[mod] improve unit converter plugin #3435

Merged

Bnyro approved these changes May 2, 2024

View reviewed changes

return42 mentioned this pull request May 3, 2024

Set unused engines as inactive #3213

Open

dalf mentioned this pull request May 4, 2024

data: currencies, engine descriptions and osm_keys_tags: use SQLite instead of JSON #3458

Open

return42 mentioned this pull request May 5, 2024

[mod] uWSGI config: configuring uwsgi for production #3460

Open

return42 marked this pull request as draft May 5, 2024 08:13

return42 closed this May 9, 2024

return42 reopened this May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mod] lower memory footprint by lazy loading JSON data #3443

[mod] lower memory footprint by lazy loading JSON data #3443

return42 commented Apr 29, 2024

Bnyro left a comment

dalf commented May 4, 2024

return42 commented May 4, 2024

dalf commented May 4, 2024

return42 commented May 4, 2024

dalf commented May 4, 2024 •

edited

return42 commented May 5, 2024 •

edited

return42 commented May 9, 2024

[mod] lower memory footprint by lazy loading JSON data #3443

Are you sure you want to change the base?

[mod] lower memory footprint by lazy loading JSON data #3443

Conversation

return42 commented Apr 29, 2024

Bnyro left a comment

Choose a reason for hiding this comment

dalf commented May 4, 2024

return42 commented May 4, 2024

dalf commented May 4, 2024

return42 commented May 4, 2024

dalf commented May 4, 2024 • edited

return42 commented May 5, 2024 • edited

return42 commented May 9, 2024

dalf commented May 4, 2024 •

edited

return42 commented May 5, 2024 •

edited