Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

P2P idea #792

Open
dalf opened this issue Dec 27, 2016 · 0 comments
Open

P2P idea #792

dalf opened this issue Dec 27, 2016 · 0 comments
Labels

Comments

@dalf
Copy link
Contributor

dalf commented Dec 27, 2016

For one user request there are most of the time different engine requests. among other searx instances (in the general category : wikipedia, wikidata, qwant, google, etc...)

Purpose

For one user request, the purpose is to spread the individual engine requests to different searx instances. Of course, It's required trust in other searx instances from the searx administrator and from the users.

How

Let's call :

  • the searx instance receiving the user request, main instance.
  • the other instances, peer instances.

The first step, searx should improve response time measures :

  • the main searx should measure the median response time, and the 95th percentile response time (95% of the time the response time is bellow that measure) :
    ** for each engine of the main instance
    ** for each (peer instance, engine) couple.

For example (YMMV) : There are slow and fast engines : wikipedia is fast, google is slow (double). So for user request using google and wikipedia, the main instance can send the request for the google engine, and use a peer instance for the wikipedia. The user response time won't change, since the google response time set the upper limit.

So one way to spread the requests is :

  • for slow engines : do it in the normal way on the main instance.
  • for fast engines : use a peer instance to proxy the request.

The global response time should not change too much.

So when there is user request on the main instance :

  • the main instance guesses the global response time without peers, using the maximum the median of each engine of the user request (not sure if the median is the best choice).
  • the main instance selects a peer instance for each engine according to the guessed p95 response time of the couple (peer instance, engine). Using the p95 gives the worst case scenario (or nearly).
  • the main instance spreads the request to the different peers.
  • the response time of each (peer instance, engine) improves the statistics.

Problems not solved

  • How to select a (peer instance, engine) if no request has been sent to it ? Perhaps the main instance should bootstrap by sending one or two requests for each (peer instance, engine). And here, starts the problem of flood.
  • How the main instance and the peer instances should communicate ? Using the rss API ? Using something similar to morty with a Hash ?

[EDIT] clarification.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant