You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.
For one user request there are most of the time different engine requests. among other searx instances (in the general category : wikipedia, wikidata, qwant, google, etc...)
Purpose
For one user request, the purpose is to spread the individual engine requests to different searx instances. Of course, It's required trust in other searx instances from the searx administrator and from the users.
How
Let's call :
the searx instance receiving the user request, main instance.
the other instances, peer instances.
The first step, searx should improve response time measures :
the main searx should measure the median response time, and the 95th percentile response time (95% of the time the response time is bellow that measure) :
** for each engine of the main instance
** for each (peer instance, engine) couple.
For example (YMMV) : There are slow and fast engines : wikipedia is fast, google is slow (double). So for user request using google and wikipedia, the main instance can send the request for the google engine, and use a peer instance for the wikipedia. The user response time won't change, since the google response time set the upper limit.
So one way to spread the requests is :
for slow engines : do it in the normal way on the main instance.
for fast engines : use a peer instance to proxy the request.
The global response time should not change too much.
So when there is user request on the main instance :
the main instance guesses the global response time without peers, using the maximum the median of each engine of the user request (not sure if the median is the best choice).
the main instance selects a peer instance for each engine according to the guessed p95 response time of the couple (peer instance, engine). Using the p95 gives the worst case scenario (or nearly).
the main instance spreads the request to the different peers.
the response time of each (peer instance, engine) improves the statistics.
Problems not solved
How to select a (peer instance, engine) if no request has been sent to it ? Perhaps the main instance should bootstrap by sending one or two requests for each (peer instance, engine). And here, starts the problem of flood.
How the main instance and the peer instances should communicate ? Using the rss API ? Using something similar to morty with a Hash ?
[EDIT] clarification.
The text was updated successfully, but these errors were encountered:
For one user request there are most of the time different engine requests. among other searx instances (in the general category : wikipedia, wikidata, qwant, google, etc...)
Purpose
For one user request, the purpose is to spread the individual engine requests to different searx instances. Of course, It's required trust in other searx instances from the searx administrator and from the users.
How
Let's call :
The first step, searx should improve response time measures :
** for each engine of the main instance
** for each (peer instance, engine) couple.
For example (YMMV) : There are slow and fast engines : wikipedia is fast, google is slow (double). So for user request using google and wikipedia, the main instance can send the request for the google engine, and use a peer instance for the wikipedia. The user response time won't change, since the google response time set the upper limit.
So one way to spread the requests is :
The global response time should not change too much.
So when there is user request on the main instance :
Problems not solved
[EDIT] clarification.
The text was updated successfully, but these errors were encountered: