Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

Commit

Permalink
Merge pull request #2800 from kvch/add-httpx
Browse files Browse the repository at this point in the history
Replace requests with httpx to speed up searx
  • Loading branch information
kvch committed May 3, 2021
2 parents f045c38 + 75d1f38 commit d93ac96
Show file tree
Hide file tree
Showing 44 changed files with 1,167 additions and 467 deletions.
64 changes: 51 additions & 13 deletions docs/admin/settings.rst
Expand Up @@ -130,21 +130,20 @@ Global Settings
request_timeout : 2.0 # default timeout in seconds, can be override by engine
# max_request_timeout: 10.0 # the maximum timeout in seconds
useragent_suffix : "" # informations like an email address to the administrator
pool_connections : 100 # Number of different hosts
pool_maxsize : 10 # Number of simultaneous requests by host
pool_connections : 100 # Maximum number of allowable connections, or None for no limits. The default is 100.
pool_maxsize : 10 # Number of allowable keep-alive connections, or None to always allow. The default is 10.
enable_http2: True # See https://www.python-httpx.org/http2/
# uncomment below section if you want to use a proxy
# proxies:
# http:
# - http://proxy1:8080
# - http://proxy2:8080
# https:
# all://:
# - http://proxy1:8080
# - http://proxy2:8080
# uncomment below section only if you have more than one network interface
# which can be the source of outgoing search requests
# source_ips:
# - 1.1.1.1
# - 1.1.1.2
# - fe80::/126
``request_timeout`` :
Expand All @@ -157,20 +156,46 @@ Global Settings
Suffix to the user-agent searx uses to send requests to others engines. If an
engine wish to block you, a contact info here may be useful to avoid that.

.. _requests proxies: https://requests.readthedocs.io/en/latest/user/advanced/#proxies
.. _PySocks: https://pypi.org/project/PySocks/
``keepalive_expiry``:
Number of seconds to keep a connection in the pool. By default 5.0 seconds.

.. _httpx proxies: https://www.python-httpx.org/advanced/#http-proxying

``proxies`` :
Define one or more proxies you wish to use, see `requests proxies`_.
Define one or more proxies you wish to use, see `httpx proxies`_.
If there are more than one proxy for one protocol (http, https),
requests to the engines are distributed in a round-robin fashion.

- Proxy: `see <https://2.python-requests.org/en/latest/user/advanced/#proxies>`__.
- SOCKS proxies are also supported: `see <https://2.python-requests.org/en/latest/user/advanced/#socks>`__

``source_ips`` :
If you use multiple network interfaces, define from which IP the requests must
be made. This parameter is ignored when ``proxies`` is set.
be made. Example:

* ``0.0.0.0`` any local IPv4 address.
* ``::`` any local IPv6 address.
* ``192.168.0.1``
* ``[ 192.168.0.1, 192.168.0.2 ]`` these two specific IP addresses
* ``fe80::60a2:1691:e5a2:ee1f``
* ``fe80::60a2:1691:e5a2:ee1f/126`` all IP addresses in this network.
* ``[ 192.168.0.1, fe80::/126 ]``

``retries`` :
Number of retry in case of an HTTP error.
On each retry, searx uses an different proxy and source ip.

``retry_on_http_error`` :
Retry request on some HTTP status code.

Example:

* ``true`` : on HTTP status code between 400 and 599.
* ``403`` : on HTTP status code 403.
* ``[403, 429]``: on HTTP status code 403 and 429.

``enable_http2`` :
Enable by default. Set to ``False`` to disable HTTP/2.

``max_redirects`` :
30 by default. Maximum redirect before it is an error.


``locales:``
Expand Down Expand Up @@ -216,6 +241,13 @@ Engine settings
api_key : 'apikey'
disabled : True
language : en_US
#enable_http: False
#enable_http2: False
#retries: 1
#retry_on_http_error: True # or 403 or [404, 429]
#max_connections: 100
#max_keepalive_connections: 10
#keepalive_expiry: 5.0
#proxies:
# http:
# - http://proxy1:8080
Expand Down Expand Up @@ -270,6 +302,12 @@ Engine settings
``display_error_messages`` : default ``True``
When an engine returns an error, the message is displayed on the user interface.

``network``: optional
Use the network configuration from another engine.
In addition, there are two default networks:
* ``ipv4`` set ``local_addresses`` to ``0.0.0.0`` (use only IPv4 local addresses)
* ``ipv6`` set ``local_addresses`` to ``::`` (use only IPv6 local addresses)

.. note::

A few more options are possible, but they are pretty specific to some
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Expand Up @@ -16,3 +16,4 @@ sphinx-tabs==2.1.0
sphinxcontrib-programoutput==0.17
sphinx-autobuild==2021.3.14
linuxdoc==20210324
aiounittest==1.4.0
8 changes: 6 additions & 2 deletions requirements.txt
Expand Up @@ -2,11 +2,15 @@ certifi==2020.12.05
babel==2.9.1
flask-babel==2.0.0
flask==1.1.2
idna==2.10
jinja2==2.11.3
lxml==4.6.3
pygments==2.8.0
python-dateutil==2.8.1
pyyaml==5.4.1
requests[socks]==2.25.1
httpx[http2]==0.17.1
Brotli==1.0.9
uvloop==0.15.2; python_version >= '3.7'
uvloop==0.14.0; python_version < '3.7'
httpx-socks[asyncio]==0.3.1
langdetect==1.0.8
setproctitle==1.2.2
7 changes: 4 additions & 3 deletions searx/autocomplete.py
Expand Up @@ -20,10 +20,11 @@
from json import loads
from urllib.parse import urlencode

from requests import RequestException
from httpx import HTTPError


from searx import settings
from searx.poolrequests import get as http_get
from searx.network import get as http_get
from searx.exceptions import SearxEngineResponseException


Expand Down Expand Up @@ -136,5 +137,5 @@ def search_autocomplete(backend_name, query, lang):

try:
return backend(query, lang)
except (RequestException, SearxEngineResponseException):
except (HTTPError, SearxEngineResponseException):
return []
6 changes: 3 additions & 3 deletions searx/engines/__init__.py
Expand Up @@ -27,7 +27,7 @@
from searx import logger
from searx.data import ENGINES_LANGUAGES
from searx.exceptions import SearxEngineResponseException
from searx.poolrequests import get, get_proxy_cycles
from searx.network import get, initialize as initialize_network, set_context_network_name
from searx.utils import load_module, match_language, get_engine_from_settings, gen_useragent


Expand Down Expand Up @@ -89,8 +89,6 @@ def load_engine(engine_data):
engine.categories = []
else:
engine.categories = list(map(str.strip, param_value.split(',')))
elif param_name == 'proxies':
engine.proxies = get_proxy_cycles(param_value)
else:
setattr(engine, param_name, param_value)

Expand Down Expand Up @@ -289,9 +287,11 @@ def load_engines(engine_list):

def initialize_engines(engine_list):
load_engines(engine_list)
initialize_network(engine_list, settings['outgoing'])

def engine_init(engine_name, init_fn):
try:
set_context_network_name(engine_name)
init_fn(get_engine_from_settings(engine_name))
except SearxEngineResponseException as exc:
logger.warn('%s engine: Fail to initialize // %s', engine_name, exc)
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/dictzone.py
Expand Up @@ -52,7 +52,7 @@ def response(resp):
to_results.append(to_result.text_content())

results.append({
'url': urljoin(resp.url, '?%d' % k),
'url': urljoin(str(resp.url), '?%d' % k),
'title': from_result.text_content(),
'content': '; '.join(to_results)
})
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/duckduckgo.py
Expand Up @@ -6,7 +6,7 @@
from lxml.html import fromstring
from json import loads
from searx.utils import extract_text, match_language, eval_xpath, dict_subset
from searx.poolrequests import get
from searx.network import get

# about
about = {
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/duckduckgo_images.py
Expand Up @@ -8,7 +8,7 @@
from searx.exceptions import SearxEngineAPIException
from searx.engines.duckduckgo import get_region_code
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url # NOQA # pylint: disable=unused-import
from searx.poolrequests import get
from searx.network import get

# about
about = {
Expand Down
3 changes: 1 addition & 2 deletions searx/engines/elasticsearch.py
Expand Up @@ -4,7 +4,6 @@
"""

from json import loads, dumps
from requests.auth import HTTPBasicAuth
from searx.exceptions import SearxEngineAPIException


Expand Down Expand Up @@ -32,7 +31,7 @@ def request(query, params):
return params

if username and password:
params['auth'] = HTTPBasicAuth(username, password)
params['auth'] = (username, password)

params['url'] = search_url
params['method'] = 'GET'
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/gigablast.py
Expand Up @@ -8,7 +8,7 @@
from json import loads
from urllib.parse import urlencode
# from searx import logger
from searx.poolrequests import get
from searx.network import get

# about
about = {
Expand Down
5 changes: 2 additions & 3 deletions searx/engines/google.py
Expand Up @@ -10,7 +10,7 @@

# pylint: disable=invalid-name, missing-function-docstring

from urllib.parse import urlencode, urlparse
from urllib.parse import urlencode
from lxml import html
from searx import logger
from searx.utils import match_language, extract_text, eval_xpath, eval_xpath_list, eval_xpath_getindex
Expand Down Expand Up @@ -186,8 +186,7 @@ def get_lang_info(params, lang_list, custom_aliases):
return ret_val

def detect_google_sorry(resp):
resp_url = urlparse(resp.url)
if resp_url.netloc == 'sorry.google.com' or resp_url.path.startswith('/sorry'):
if resp.url.host == 'sorry.google.com' or resp.url.path.startswith('/sorry'):
raise SearxEngineCaptchaException()


Expand Down
2 changes: 1 addition & 1 deletion searx/engines/pubmed.py
Expand Up @@ -7,7 +7,7 @@
from lxml import etree
from datetime import datetime
from urllib.parse import urlencode
from searx.poolrequests import get
from searx.network import get

# about
about = {
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/qwant.py
Expand Up @@ -8,7 +8,7 @@
from urllib.parse import urlencode
from searx.utils import html_to_text, match_language
from searx.exceptions import SearxEngineAPIException, SearxEngineCaptchaException
from searx.raise_for_httperror import raise_for_httperror
from searx.network import raise_for_httperror

# about
about = {
Expand Down
7 changes: 3 additions & 4 deletions searx/engines/seznam.py
Expand Up @@ -3,9 +3,9 @@
Seznam
"""

from urllib.parse import urlencode, urlparse
from urllib.parse import urlencode
from lxml import html
from searx.poolrequests import get
from searx.network import get
from searx.exceptions import SearxEngineAccessDeniedException
from searx.utils import (
extract_text,
Expand Down Expand Up @@ -46,8 +46,7 @@ def request(query, params):


def response(resp):
resp_url = urlparse(resp.url)
if resp_url.path.startswith('/verify'):
if resp.url.path.startswith('/verify'):
raise SearxEngineAccessDeniedException()

results = []
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/sjp.py
Expand Up @@ -6,7 +6,7 @@
from lxml.html import fromstring
from searx import logger
from searx.utils import extract_text
from searx.raise_for_httperror import raise_for_httperror
from searx.network import raise_for_httperror

logger = logger.getChild('sjp engine')

Expand Down
2 changes: 1 addition & 1 deletion searx/engines/soundcloud.py
Expand Up @@ -9,7 +9,7 @@
from dateutil import parser
from urllib.parse import quote_plus, urlencode
from searx import logger
from searx.poolrequests import get as http_get
from searx.network import get as http_get

# about
about = {
Expand Down
5 changes: 3 additions & 2 deletions searx/engines/spotify.py
Expand Up @@ -5,9 +5,10 @@

from json import loads
from urllib.parse import urlencode
import requests
import base64

from searx.network import post as http_post

# about
about = {
"website": 'https://www.spotify.com',
Expand Down Expand Up @@ -38,7 +39,7 @@ def request(query, params):

params['url'] = search_url.format(query=urlencode({'q': query}), offset=offset)

r = requests.post(
r = http_post(
'https://accounts.spotify.com/api/token',
data={'grant_type': 'client_credentials'},
headers={'Authorization': 'Basic ' + base64.b64encode(
Expand Down
5 changes: 2 additions & 3 deletions searx/engines/stackoverflow.py
Expand Up @@ -3,7 +3,7 @@
Stackoverflow (IT)
"""

from urllib.parse import urlencode, urljoin, urlparse
from urllib.parse import urlencode, urljoin
from lxml import html
from searx.utils import extract_text
from searx.exceptions import SearxEngineCaptchaException
Expand Down Expand Up @@ -41,8 +41,7 @@ def request(query, params):

# get response from search-request
def response(resp):
resp_url = urlparse(resp.url)
if resp_url.path.startswith('/nocaptcha'):
if resp.url.path.startswith('/nocaptcha'):
raise SearxEngineCaptchaException()

results = []
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/wikidata.py
Expand Up @@ -12,7 +12,7 @@

from searx import logger
from searx.data import WIKIDATA_UNITS
from searx.poolrequests import post, get
from searx.network import post, get
from searx.utils import match_language, searx_useragent, get_string_replaces_function
from searx.external_urls import get_external_url, get_earth_coordinates_url, area_to_osm_zoom
from searx.engines.wikipedia import _fetch_supported_languages, supported_languages_url # NOQA # pylint: disable=unused-import
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/wikipedia.py
Expand Up @@ -7,7 +7,7 @@
from json import loads
from lxml.html import fromstring
from searx.utils import match_language, searx_useragent
from searx.raise_for_httperror import raise_for_httperror
from searx.network import raise_for_httperror

# about
about = {
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/wolframalpha_noapi.py
Expand Up @@ -7,7 +7,7 @@
from time import time
from urllib.parse import urlencode

from searx.poolrequests import get as http_get
from searx.network import get as http_get

# about
about = {
Expand Down
2 changes: 1 addition & 1 deletion searx/engines/wordnik.py
Expand Up @@ -6,7 +6,7 @@
from lxml.html import fromstring
from searx import logger
from searx.utils import extract_text
from searx.raise_for_httperror import raise_for_httperror
from searx.network import raise_for_httperror

logger = logger.getChild('Wordnik engine')

Expand Down

0 comments on commit d93ac96

Please sign in to comment.