-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mod] lower memory footprint by lazy loading JSON data #3443
base: master
Are you sure you want to change the base?
Conversation
This patch implements lazy loading of the JSON data. Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files: - currencies.json ~550KB - engine_descriptions.json ~1,3MB - external_bangs.json ~1,3MB - osm_keys_tags.json ~ 2,2MB most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected from testing 👍
There are two points here:
The speed up is clear. The memory footprint is different: an long running instance will have the same memory footprint as now. It requires one request on:
==> it won't reduce the memory footprint of darmarit.org/searx/ , paulgo.io, searx.be for example (according the stats) IMO, the solution is sqlite: #2633 |
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
Yes, this is known .. I assumed uWSGI worker processes do not live long .. but TBH I don't really know how many request processed before a new process is spawned --> |
As far I understand, this require a change of the uwsgi configuration in all instances to make this PR relevant. |
I would have expected the processes to be restarted regularly ... I would be surprised if the processes should live indefinitely and process millions of requests (memory leaks?) ... but I can't find any reasonable documentation either ... we could customize our wsgi config and add max-config next to lazy-apps Line 21 in dbed8da
|
Same I can't find for sure the default of max_requests. When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker.
The docker image can be updated, but what about all the instances using the installation script or something else (like the Arch package) ? |
As stated in .. and other posts, the defaults of uWSGI not suitable for a productive environment. To give just one example, the workers run indefinitely and the memory leaks aggregate. - "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1] - "Configuring uWSGI for Production Deployment" [2] - "When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker." [3] [1] https://av.tib.eu/media/44810 [2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/ [3] searxng#3443 (comment) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Wow, never thought the defaults of uWSGI are not suitable for a productive environment / we can discuss here in this PR: Since you have now sent PR we should focus on the SQL solution .. I change this PR to DRAFT.
Not related to this PR but in general we should not weight deployment questions over improvements of SearXNG core. |
Superseded by #3458 |
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
To reduce memory usage, use a SQLite database to store the engine descriptions. A dump of the database is stored in Git to facilitate maintenance, especially the pull requests made automatically every month. Related to * searxng#2633 * searxng#3443
As stated in .. and other posts, the defaults of uWSGI not suitable for a productive environment. To give just one example, the workers run indefinitely and the memory leaks aggregate. - "Configuring uWSGI for Production: The defaults are all wrong" EuroPython 2019 [1] - "Configuring uWSGI for Production Deployment" [2] - "When Paul has tested some PR on his instance, we could clearly see a memory leak over a week: the memory never dropped to the initial value. Same for my instance using Docker." [3] [1] https://av.tib.eu/media/44810 [2] https://www.bloomberg.com/company/stories/configuring-uwsgi-production-deployment/ [3] searxng#3443 (comment) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
This patch implements lazy loading of the JSON data.
Motivation: in most requests not all JSON data is needed, but loaded. By example these four JSON files:
most often not used and consume a lot of memory and BTW they also extend the time required to instantiate a walker.