Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib socket.timeout, no indication as to why, timeout=0.1 #730

Closed
geudrik opened this issue Feb 18, 2018 · 9 comments
Closed

urllib socket.timeout, no indication as to why, timeout=0.1 #730

geudrik opened this issue Feb 18, 2018 · 9 comments

Comments

@geudrik
Copy link

geudrik commented Feb 18, 2018

I'm seeing the infrequent exception being raised with a strange timeout value, related to connection pooling. The exception below is the entire exception - zero reference to the code that I'm writing. The only redacted info is the host IP, nothing else has been altered.

GET http://ip.ip.ip.ip:9200/_nodes/_all/http [status:N/A request:0.101s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 149, in perform_request
    response = self.pool.urlopen(method, url, body, retries=False, headers=request_headers, **kw)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 333, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 686, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 389, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 309, in _raise_timeout
    raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='ip.ip.ip.ip', port=9200): Read timed out. (read timeout=0.1)

My ES instance is being created as follows

        self.es = elasticsearch.Elasticsearch(eshost,
                                              sniff_on_start=True,
                                              sniff_on_connection_fail=True,
                                              sniffer_timeout=60)

When I'm inserting data, I'm using

        try:
            bulk(self.es, self.to_ingest, chunk_size=1000, request_timeout=15)
        except ConnectionTimeout as e:
            try:
                bulk(self.es, self.to_ingest, chunk_size=500, request_timeout=40)

Am I missing something painfully obvious here? It has been "one of those weekends" 😛

@geudrik
Copy link
Author

geudrik commented Feb 20, 2018

This exception isn't thrown until my worker has been running for several hours, I should add. Per the doc suggestion, I'm creating a single instance within my worker and re-using it.

@geudrik
Copy link
Author

geudrik commented Feb 27, 2018

I've done more testing, and this issue actually appears to be (still) random, stemming from my use of sniffing.

The setup that I'm using is a single master/ingest combo, and two data nodes. The master/ingest is a container, and the two data nodes are physical boxes.

Removing sniffing, these exceptions disappear. I suppose we can consider this solved... but... 🤒

@fxdgear
Copy link
Contributor

fxdgear commented Mar 30, 2018

I'm trying to recreate and have trouble doing so. This is weird behavior.

I've been able to verify that sniffing works:

  1. on connection
  2. on error

I started 3 containers clustered together.
Created a 4th container with the client library and instantiated a connection using just one host.
Then I verified I could connect and see the _cat/nodes. then I periodically would kill the one that the python client was connecting to and tried to connect to it again. I would receive a connection error and a notice that the node was on a 60s timeout (had failed 1 times in a row)

But when I tried the request again I was able to successfully hit the cluster agin using a different node this time and seeing that there were now 2 nodes in the cluster.

have you been able to verify that there's been no cluster issues during your ingestion process? (maybe your master/ingest crashes/restarts iteself).

@vovavovavovavova
Copy link

another possible issue and strange behaivour with socket timeout and python-exceptions.

es.index(something, request_timeout = 0.1)
#chain of exceptions if i get timeout:
#socket.timeout -->
#urllib3.exceptions.ConnecdtionTimeoutError/urllib3.exceptions.ReadTimeOutError -->
#elasticsearch.exceptions.ConnectionTimeout.

If i put this in try/except pass scope, i get errors with chain: "#socket.timeout-->#urllib.exceptions.*" . Further execution after except block is stopped.
Finally i have timeout error even if i put all in try/except block.
Double try/except block have the same result as single try/except.
I tried: empty except block, except Exception, except elasticsearch.exceptions.ConnectionTimeout. with the same result.

So where am i wrong or timeout can't be excepted in try? I haven't troubles with catching other es exceptions.

@daveisfera
Copy link

I'm having the same issue and would love to find a way to be able to catch this exception and make sure it's handled correctly

@magorbalassy
Copy link

I have the same issue as well and while in some cluster i never get the error, the majority of the es clusters produce the symptoms.

@yoogie
Copy link

yoogie commented Feb 10, 2021

We have also seen this, or at least a very similar, error. In our case it seems to be related to the fact that we are ingesting large documents and the bulk request sometimes takes longer time than we had configured for sniffer_timeout. By making the sniffer_timeout large we could resolve our problem. We are also running with the sniff_on_connection_fail disabled, if that makes any difference.

@technige
Copy link
Contributor

technige commented Jun 1, 2022

This issue mentions a number of possibly-related (or not) connection issues. Unfortunately, there is not enough information here to recreate the scenario locally. Given also that this issue has been inactive for over a year, I am going to close it.

Please feel free to open a new issue if connection issues are still occurring. It would be most helpful to us to provide as much detail as possible, to allow us to recreate. Also, unless you are certain that you issue is the same as that posted, please refrain from "me too" posting, and open a new issue with your specific problem.

@technige technige closed this as completed Jun 1, 2022
@geocomm-shenningsgard
Copy link

geocomm-shenningsgard commented Mar 23, 2023

For what it's worth, we're seeing this as well (older Elastic 7x, admittedly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants