Getting intermittent ProtocolError: No protocol version header #168

pranny · 2017-07-04T07:47:15Z

My Happybase version is 1.1.0, Hbase 1.1.2, Python 2.7.12. Sometimes while performing scan or put on table I get ProtocolError: No protocol version header. I see this error intermittently. The execution happens for like 1000s of time and then it crashes with below trace:

Traceback (most recent call last):
  File "/home/apps/etl_worker2/src/batch.py", line 231, in register_batch_phase_update
    ProcessBatch.put_data(lastbatch_id, lastbatch_data)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/models/basehbase.py", line 49, in put_data
    with cls.connection_pool.connection() as connection:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/autoretrypool.py", line 30, in connection
    _ = conn.tables()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/happybase/connection.py", line 242, in tables
    names = self.client.getTableNames()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 198, in _req
    return self._recv(_api)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 210, in _recv
    fname, mtype, rseqid = self._iprot.read_message_begin()
  File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
ProtocolError: No protocol version header

My put_data function makes use of the connection pool as follows. However I am using AutoRetryPool (this version) to avoid a lot of socket errors that I'd been encountering earlier. The timeout param is None.

    def put_data(cls, key, data):
        if data:
            with cls.connection_pool.connection() as connection:
                table = connection.table(cls.table_name)
                table.put(key, data)

The stacktrace with scan goes like below

Traceback (most recent call last):
 File "/home/apps/etl_worker2/src/batchpostprocessor.py", line 39, in run
 res = ETLData.get_values_in_timerange(tag, *duration, values_only=True)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/models/etldata.py", line 45, in get_values_in_timerange
 res = [x for x in cls.scan(row_start=row_start, row_stop=row_stop)]
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/happybase/table.py", line 434, in scan
 self.connection.client.scannerClose(scan_id)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 198, in _req
 return self._recv(_api)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 210, in _recv
 fname, mtype, rseqid = self._iprot.read_message_begin()
 File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
ProtocolError: No protocol version header

I've already looked at #161 and I don't have those settings in my hbase-site.xml. I can not try with older version of happybase because we use reverse param at a lot of places. That does not seem to be supported in 0.9. Any suggestions is highly appreciated !!

The text was updated successfully, but these errors were encountered:

pranny · 2017-07-04T09:07:11Z

As a workaround, I've wrapped all my calls in a try, catch block with retries. It's conceptually similar to the AutoRetryPool.

from thriftpy.protocol.cybin import ProtocolError

    def put_data(cls, key, data):
        if data:
            attempt = 0
            while True:
                try:
                    with cls.connection_pool.connection() as connection:
                        table = connection.table(cls.table_name)
                        table.put(key, data)
                        break
                except ProtocolError as e:
                    attempt += 1
                    if attempt >= cls.MAX_ATTEMPTS:
                        raise e

UPDATE:
This isn't helping much. With a retry count of 5 I still experience the errors.

P.S. I am using a pool of size 32 which is shared amongst 20 processes. The operations are read and write intensive.

wbolster · 2017-07-07T16:11:28Z

a pool of size 32 which is shared amongst 20 processes

whoo wait wait wait. SHARING a pool between processes? that is not possible.

it is supposed to be thread-safe but tcp sockets cannot be shared between processes.

try using a pool of size 1 (yes, one) per single-threaded process.

pranny · 2017-07-08T08:30:58Z

I think you are right. That explains why it happens only on some of the apps. The apps that are single threaded works great. All the other multi process apps encounter this issue.

wbolster · 2017-07-08T09:23:48Z

if you're feeling adventurous you could open a pr that makes the pool remember the os.getpid() that created it, and verifies that it is the same in the with pool.connection(...) logic. ;)

pranny · 2017-07-09T11:40:18Z

This saved everything. I made sure that all happybase connection are setup after the process is forked. And it worked like a charm. On another note, I also encountered similar stuff with my django + uwsgi app. So i just set the lazy-apps=True setting in uwsgi so that the apps are loaded after fork. This saved them also. What a beautiful insight into multi processing. Thanks a ton !!

The patch sounds like a great idea to help prevent such "accidents". I'd be working on it coming few weeks.

wbolster · 2017-07-09T12:00:49Z

great to hear your issue is solved!

pranny closed this as completed Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting intermittent ProtocolError: No protocol version header #168

Getting intermittent ProtocolError: No protocol version header #168

pranny commented Jul 4, 2017 •

edited

Loading

pranny commented Jul 4, 2017 •

edited

Loading

wbolster commented Jul 7, 2017 •

edited

Loading

pranny commented Jul 8, 2017

wbolster commented Jul 8, 2017

pranny commented Jul 9, 2017

wbolster commented Jul 9, 2017

Getting intermittent ProtocolError: No protocol version header #168

Getting intermittent ProtocolError: No protocol version header #168

Comments

pranny commented Jul 4, 2017 • edited Loading

pranny commented Jul 4, 2017 • edited Loading

wbolster commented Jul 7, 2017 • edited Loading

pranny commented Jul 8, 2017

wbolster commented Jul 8, 2017

pranny commented Jul 9, 2017

wbolster commented Jul 9, 2017

pranny commented Jul 4, 2017 •

edited

Loading

pranny commented Jul 4, 2017 •

edited

Loading

wbolster commented Jul 7, 2017 •

edited

Loading