Regression: Connection fails re-opening connections in 1.0.0? #133

gorlins · 2016-08-23T15:28:54Z

I believe that ConnectionPool doesn't play nice with 1.0, and can yield broken connections even in the best case scenarios. Here's my reproducing code:

import happybase
from happybase.pool import ConnectionPool, TException, socket
import thriftpy

for mod in [happybase, thriftpy]:
    print mod.__name__, mod.__version__

kwargs = load_connection_kwargs()
print kwargs
pool = ConnectionPool(1, **kwargs)

for i in range(10):
    print 'Loop', i,
    try:
        with pool.connection() as conn:
            print 'opened',  # Ensure yield succeeds
            conn.tables()
            conn.close()  # Or any random failure/timeout etc
    except (TException, socket.error) as e:
        print 'failed', e
    else:
        print 'succeeded'

Works for 0.9:

happybase 0.9
thriftpy 0.3.8
{u'host': 'XXXX', 'protocol': 'binary', 'transport': 'framed'}
Loop 0 opened succeeded
Loop 1 opened succeeded
Loop 2 opened succeeded
Loop 3 opened succeeded
Loop 4 opened succeeded
Loop 5 opened succeeded
Loop 6 opened succeeded
Loop 7 opened succeeded
Loop 8 opened succeeded
Loop 9 opened succeeded

But not 1.0.0:

happybase 1.0.0
thriftpy 0.3.8
{u'host': 'XXXX', 'protocol': 'binary', 'transport': 'framed'}
Loop 0 opened succeeded
Loop 1 opened failed [Errno 9] Bad file descriptor
Loop 2 opened succeeded
Loop 3 opened failed [Errno 9] Bad file descriptor
Loop 4 opened succeeded
Loop 5 opened failed [Errno 9] Bad file descriptor
Loop 6 opened succeeded
Loop 7 opened failed [Errno 9] Bad file descriptor
Loop 8 opened succeeded
Loop 9 opened failed [Errno 9] Bad file descriptor

Given that ConnectionPool.connection() calls open() before yielding in both versions, I can't immediately see what causes this. I am using CDH 5.7 w/ hbase 1.2.0-cdh5.7.1.

The text was updated successfully, but these errors were encountered:

wbolster · 2016-08-24T07:00:42Z

could be different behaviour in other python lib: thriftpy vs thrift (different dependencies in those versions).

also, stack trace gives more info perhaps?

gorlins · 2016-08-24T17:01:02Z

So I can actually simplify the error, happybase just cannot cleanly close and re-open a connection:

conn = Connection(**kwargs)
conn.open()
print 'First', conn.tables()  # Succeeds
conn.close()
conn.open()
print 'Second', conn.tables()  # Fails

Fails on the second call. Stack trace is the same for both

Traceback (most recent call last):
  File "junk.py", line 503, in <module>
    conn.tables()
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/site-packages/happybase/connection.py", line 244, in tables
    names = self.client.getTableNames()
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/site-packages/thriftpy/thrift.py", line 157, in _req
    self._send(_api, **kwargs)
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/site-packages/thriftpy/thrift.py", line 169, in _send
    self._oprot.trans.flush()
  File "thriftpy/transport/framed/cyframed.pyx", line 107, in thriftpy.transport.framed.cyframed.TCyFramedTransport.flush (thriftpy/transport/framed/cyframed.c:2310)
  File "thriftpy/transport/framed/cyframed.pyx", line 95, in thriftpy.transport.framed.cyframed.TCyFramedTransport.c_flush (thriftpy/transport/framed/cyframed.c:2044)
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/site-packages/thriftpy/transport/socket.py", line 129, in write
    self.sock.sendall(buff)
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/socket.py", line 228, in meth
    return getattr(self._sock,name)(*args)
  File "/Users/sgorlin/miniconda/envs/dna-f7cfb5b8/lib/python2.7/socket.py", line 174, in _dummy
    raise error(EBADF, 'Bad file descriptor')
socket.error: [Errno 9] Bad file descriptor

gorlins · 2016-08-25T13:46:54Z

For context I'll add my justifying use case. I get frequent timeouts in the form:

conn = Connection(**kwargs)
conn.tables()  # works
from time import sleep; sleep(75)
conn.tables()
TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)

So I use a subclass of ConnectionPool which tries to shield the consuming code from bad connections:

class AutoRetryPool(ConnectionPool):
    """Connection pool which does its best to ensure open connections

    May impose some overhead (ie not good for low-latency stuff) but should
    increase reliability of getting a working, open connection when needed

    """
    def __init__(self, size, retries=5, **kwargs):
        self.retries = retries
        super(AutoRetryPool, self).__init__(size, **kwargs)

    @contextmanager
    def connection(self, timeout=None):
        attempts = 0
        opened = False
        while True:
            try:
                with super(AutoRetryPool, self).connection(timeout=timeout) as conn:
                    # Force refresh the client
                    conn.close()
                    conn.open()

                    # Make an actual call to ensure
                    _ = conn.tables()
                    opened = True
                    yield conn
                return

            except (TException, socket.error):
                # These trigger refreshed connections in parent
                if opened:
                    # aka the failure was inside the yield
                    raise
                attempts += 1
                # print 'Replacing connection, attempt', attempts
                if attempts >= self.retries:
                    raise

I can't simply let ConnectionPool refresh the connection because it could idle in the queue long enough before it is requested again (aside: perhaps open() should not be called in the except block in ConnectionPool.connection(), so it only opens lazily when the conn is up). Perhaps the right code to use entails Connection._refresh_thrift_client but as a private method, this is unsatisfactory - the public interfaces of Connection.open() and Connection.close() should be the only methods used by other classes to refresh the client and it's not clear to me how that method relates.

wbolster · 2016-08-25T17:13:09Z

since Connection.open() only calls transport.open(),at first glance it seems the problem is in the thriftpy lib. can you try opening a bug report there (and refer to this one)?

https://github.com/wbolster/happybase/blob/master/happybase/connection.py#L169-L178

gorlins · 2016-08-25T17:51:05Z

I'm not sure I understand the underlying bug/difference between thriftpy and thrift, but more importantly, based on my understanding of the code, I think this should be addressable by Connection and not upstream.

Expected behavior of conn.close(); conn.open(), to me, is that the thing return to a clean working state. What is the purpose of conn._refresh_thrift_client if it is not envoked in that sequence? Maybe:

def open(self):
    if self.transport is None:
        self._refresh_thrift_client()
    # rest of method omitted

def close(self):
    if self.transport is None or not self.transport.is_open():
        self.transport = None
        return

    # logging omitted
    self.transport.close()
    self.transport = None  # Release all socket-like resources for gc
    self.client = None  # not sure about this one

Unless you need persistent transport/client/etc objects for some reason? I know that conn.client use leaks out in Table but ideally these should be totally encapsulated inside Connection

gorlins · 2016-08-25T18:57:37Z

cause identified in other ticket, relying on self.transport.is_open() is unsafe in this instance.

gorlins · 2016-08-26T13:10:54Z

thriftpy is bumped to 0.3.9, do you want to up requirements and release 1.0.1? I'll consider a pull request for the other issues

wbolster · 2016-09-15T08:32:44Z

does happybase need changes? or is just bumping the thriftpy dep enough?

gorlins · 2016-09-15T13:37:29Z

i think the dep is enough here, if i have time i'll try a pull request to make more robust but the acute issue is resolved by thriftpy

aswinQuiklo · 2017-02-20T08:44:52Z

I still get the TTransportException(message='TSocket read 0 bytes', type=4) exception if I keep my connection alive for some ~2 minutes. I tried creating connection by specifying timeout=0, but that doesn't seem to help. Can you guys tell me what to do if I have to keep a connection alive forever and run queries on HBase? I am using HBase version 1.2.1

gorlins · 2017-03-01T17:59:26Z

Here is what I currently have deployed, although I have not recently verified whether the extra work is still necessary

class AutoRetryPool(ConnectionPool):
    """Connection pool which does its best to ensure open connections

    May impose some overhead (ie not good for low-latency stuff) but should
    increase reliability of getting a working, open connection when needed

    """
    def __init__(self, size, retries=5, **kwargs):
        self.retries = retries
        super(AutoRetryPool, self).__init__(size, **kwargs)

    @contextmanager
    def connection(self, timeout=None):
        attempts = 0
        opened = False
        while True:
            try:
                with super(AutoRetryPool, self).connection(timeout=timeout) as conn:
                    # Make an actual call to ensure
                    _ = conn.tables()
                    opened = True
                    yield conn
                return

            except (TException, socket.error):
                # These trigger refreshed connections in parent
                if opened:
                    # aka the failure was inside the yield, can't fix
                    raise

                attempts += 1
                # print 'Replacing connection, attempt', attempts
                if attempts >= self.retries:
                    raise

    def _return_connection(self, connection):
        # Don't leave idling connections around - too easy to timeout before use
        connection.close()
        return super(AutoRetryPool, self)._return_connection(connection)

aswinQuiklo · 2017-03-02T09:34:14Z

Thanks! I will test this and get back to you!

ecederstrand · 2017-06-20T06:41:52Z

I believe this may be caused by TSocket.is_open() returning True even after raising TTransportException. See Thriftpy/thriftpy#298

pranny · 2017-07-03T18:55:35Z

@gorlins Thanks for the AutoRetryPool. I tried with the AutoRetryPool with _return_connection and I keep getting error like below. This happens when i scan a Client table which has 2 rows. If I remove the _return_connection function, it works fine. It's a great relief as I struggled for weeks to understand why I keep getting random inconsistent errors originating from thriftpy.

>>> [x for x in Client.scan()]
INFO:happybase.pool:Replacing tainted pool connection
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/pranav/workspace/venv/lib/python2.7/site-packages/happybase/table.py", line 402, in scan
    self.name, scan, {})
  File "/Users/pranav/workspace/venv/lib/python2.7/site-packages/thriftpy/thrift.py", line 195, in _req
    self._send(_api, **kwargs)
  File "/Users/pranav/workspace/venv/lib/python2.7/site-packages/thriftpy/thrift.py", line 206, in _send
    self._oprot.write_message_end()
  File "thriftpy/protocol/cybin/cybin.pyx", line 463, in cybin.TCyBinaryProtocol.write_message_end (thriftpy/protocol/cybin/cybin.c:6845)
  File "thriftpy/transport/buffered/cybuffered.pyx", line 80, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.c_flush (thriftpy/transport/buffered/cybuffered.c:2147)
  File "/Users/pranav/workspace/venv/lib/python2.7/site-packages/thriftpy/transport/socket.py", line 129, in write
    self.sock.sendall(buff)

Also, sometimes I see the following error happening during scan. Not sure how to handle this

Traceback (most recent call last):
  File "/home/apps/etl_worker2/src/batchpostprocessor.py", line 39, in run
    res = ETLData.get_values_in_timerange(tag, *duration, values_only=True)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/deming/models/etldata.py", line 45, in get_values_in_timerange
    res = [x for x in cls.scan(row_start=row_start, row_stop=row_stop)]
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/happybase/table.py", line 434, in scan
    self.connection.client.scannerClose(scan_id)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 198, in _req
    return self._recv(_api)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 210, in _recv
    fname, mtype, rseqid = self._iprot.read_message_begin()
  File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
ProtocolError: No protocol version header

gorlins · 2017-07-04T17:30:49Z

@pranny it Iooks like you didn't copy the error on the first issue, so I can't really comment... the latter issue looks like you have not configured the client correctly, so you need to check your server's transport and protocol and configure the client as such https://happybase.readthedocs.io/en/latest/api.html#happybase.Connection. Also, just to beg the question, why are you scanning/using HBase at all if your table only has 2 rows?

nackjicholson · 2018-12-26T17:01:24Z

@gorlins @wbolster I scanned this page for info, but it's quite a long time since the last reply. Is there any resolution to this problem? Is there any plan to address this problem? I am getting errno 32 broken pipe anytime I try to use pool.connection() twice during a process. To me, that's unusably broken.

Thanks for the AutoRetryPoolcode above, I'll give that a shot.

ecederstrand · 2019-01-02T06:52:11Z

According to https://github.com/Thriftpy/thriftpy, thriftpy is deprecated and users should migrate to thriftpy2. I don't know what the implications are for happybase.

ethe · 2019-05-24T16:11:22Z

I think it has already fixed in thriftpy2, see Thriftpy/thriftpy2@ce522cc, thanks.

wbolster · 2019-05-24T16:47:48Z

happybase 1.2.0 uses thriftpy2: https://happybase.readthedocs.io/en/latest/news.html#happybase-1-2-0

ankitgaur · 2019-08-01T23:15:29Z

Hi I am using Happybase 1.2.0 and thriftpy2 0.4.4 and still facing this issue.

Steps to reproduce:-

Create Connection pool.
self.pool = happybase.ConnectionPool(size=self.POOL_SIZE, host=self.HBASE_HOST)

Get connection and get row from hbase table

 with self.pool.connection() as connection:
     table = connection.table(self.HBASE_TABLE)
     row = table.row(id)

sleep 3 minutes
Do step 2 again

Result:- I get the following exception.

thriftpy2.transport.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')

ethe · 2019-08-02T02:02:04Z

I think it is by design in thriftpy2, EOF would be returned if this connection closed.

ankitgaur · 2019-08-02T15:57:37Z

So as a conclusion the connection pooling in happybase does not work?

ethe · 2019-08-03T02:14:24Z

I think HappyBase should catch this exception if it builds a connection pool in itself.

nackjicholson · 2019-10-18T23:15:17Z

The AutoRetryPool solution written by @gorlins above helps to repair broken connections which are released back to the pool. This is nice, but doesn't solve all connection problems. A random blog on the internet pretty clearly demonstrates a problem with hbase and happybase:

http://www.programmersought.com/article/41421562488/;jsessionid=660BCDAD2F7713EAE1A111B2255FD2FA

HBASE is hanging up on us while we have an active connection, and there is nothing gorlins code or happybase can really do to repair that. That is this bit:

if opened:
    # aka the failure was inside the yield, can't fix
    raise

So, if you're doing something like:

with pool.connection() as conn:
    t = conn.table("foo")
    t.put("1", {})

    # do something expensive for 60s
    sleep(60)

    # try to write some more data, BOOM BrokenPipeError!
    t.put("2", {})

You're going to get an explosion when that second t.put tries to write data, because the pipe to HBASE is gone. You'll get a BrokenPipeError.

The solution unfortunately is to open new connections for each set of operations after the expensive thing.

with pool.connection() as conn:
    t = conn.table("foo")
    t.put("1", {})

# close that one.
# do something expensive and time consuming, this time without an active conn.
sleep(60)

# open up another one
with pool.connection() as conn:
    t = conn.table("foo")
    t.put("2", {})

@ankitgaur So yes, connection pooling in happybase does not work. Also, even if it did work HBASE is going to hang up on you anyway.

ethe · 2019-10-19T04:57:43Z

Hi @nackjicholson, it can be a walk-around. However, catching the losing connection exception in HappyBase can totally resolve this issue.

gorlins changed the title ~~Regression: ConnectionPool fails re-opening connections in 1.0.0?~~ Regression: Connection fails re-opening connections in 1.0.0? Aug 24, 2016

gorlins mentioned this issue Aug 25, 2016

Transport is_open() doesn't return False after closure Thriftpy/thriftpy#230

Closed

ecederstrand mentioned this issue Oct 28, 2016

Explicitly close the opened connection #143

Closed

pranny mentioned this issue Jul 4, 2017

Getting intermittent ProtocolError: No protocol version header #168

Closed

wbolster closed this as completed May 24, 2019

CodeingBoy mentioned this issue Aug 16, 2021

After instantiating Connection, and waiting for 5 minutes, the query table will report an error #225

Open

wbolster mentioned this issue Oct 14, 2022

Table Put - How Do We Assign And Use A Variable For the 'Row Key' #255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: Connection fails re-opening connections in 1.0.0? #133

Regression: Connection fails re-opening connections in 1.0.0? #133

gorlins commented Aug 23, 2016

wbolster commented Aug 24, 2016

gorlins commented Aug 24, 2016

gorlins commented Aug 25, 2016

wbolster commented Aug 25, 2016 •

edited

Loading

gorlins commented Aug 25, 2016 •

edited

Loading

gorlins commented Aug 25, 2016

gorlins commented Aug 26, 2016

wbolster commented Sep 15, 2016

gorlins commented Sep 15, 2016

aswinQuiklo commented Feb 20, 2017

gorlins commented Mar 1, 2017

aswinQuiklo commented Mar 2, 2017

ecederstrand commented Jun 20, 2017

pranny commented Jul 3, 2017 •

edited

Loading

gorlins commented Jul 4, 2017

nackjicholson commented Dec 26, 2018

ecederstrand commented Jan 2, 2019

ethe commented May 24, 2019

wbolster commented May 24, 2019

ankitgaur commented Aug 1, 2019

ethe commented Aug 2, 2019

ankitgaur commented Aug 2, 2019

ethe commented Aug 3, 2019

nackjicholson commented Oct 18, 2019

ethe commented Oct 19, 2019 •

edited

Loading

Regression: Connection fails re-opening connections in 1.0.0? #133

Regression: Connection fails re-opening connections in 1.0.0? #133

Comments

gorlins commented Aug 23, 2016

wbolster commented Aug 24, 2016

gorlins commented Aug 24, 2016

gorlins commented Aug 25, 2016

wbolster commented Aug 25, 2016 • edited Loading

gorlins commented Aug 25, 2016 • edited Loading

gorlins commented Aug 25, 2016

gorlins commented Aug 26, 2016

wbolster commented Sep 15, 2016

gorlins commented Sep 15, 2016

aswinQuiklo commented Feb 20, 2017

gorlins commented Mar 1, 2017

aswinQuiklo commented Mar 2, 2017

ecederstrand commented Jun 20, 2017

pranny commented Jul 3, 2017 • edited Loading

gorlins commented Jul 4, 2017

nackjicholson commented Dec 26, 2018

ecederstrand commented Jan 2, 2019

ethe commented May 24, 2019

wbolster commented May 24, 2019

ankitgaur commented Aug 1, 2019

ethe commented Aug 2, 2019

ankitgaur commented Aug 2, 2019

ethe commented Aug 3, 2019

nackjicholson commented Oct 18, 2019

ethe commented Oct 19, 2019 • edited Loading

wbolster commented Aug 25, 2016 •

edited

Loading

gorlins commented Aug 25, 2016 •

edited

Loading

pranny commented Jul 3, 2017 •

edited

Loading

ethe commented Oct 19, 2019 •

edited

Loading