Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting intermittent ProtocolError: No protocol version header #168

Closed
pranny opened this issue Jul 4, 2017 · 6 comments
Closed

Getting intermittent ProtocolError: No protocol version header #168

pranny opened this issue Jul 4, 2017 · 6 comments

Comments

@pranny
Copy link

pranny commented Jul 4, 2017

My Happybase version is 1.1.0, Hbase 1.1.2, Python 2.7.12. Sometimes while performing scan or put on table I get ProtocolError: No protocol version header. I see this error intermittently. The execution happens for like 1000s of time and then it crashes with below trace:

Traceback (most recent call last):
  File "/home/apps/etl_worker2/src/batch.py", line 231, in register_batch_phase_update
    ProcessBatch.put_data(lastbatch_id, lastbatch_data)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/models/basehbase.py", line 49, in put_data
    with cls.connection_pool.connection() as connection:
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/autoretrypool.py", line 30, in connection
    _ = conn.tables()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/happybase/connection.py", line 242, in tables
    names = self.client.getTableNames()
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 198, in _req
    return self._recv(_api)
  File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 210, in _recv
    fname, mtype, rseqid = self._iprot.read_message_begin()
  File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
ProtocolError: No protocol version header

My put_data function makes use of the connection pool as follows. However I am using AutoRetryPool (this version) to avoid a lot of socket errors that I'd been encountering earlier. The timeout param is None.

    def put_data(cls, key, data):
        if data:
            with cls.connection_pool.connection() as connection:
                table = connection.table(cls.table_name)
                table.put(key, data)

The stacktrace with scan goes like below

Traceback (most recent call last):
 File "/home/apps/etl_worker2/src/batchpostprocessor.py", line 39, in run
 res = ETLData.get_values_in_timerange(tag, *duration, values_only=True)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/dem/models/etldata.py", line 45, in get_values_in_timerange
 res = [x for x in cls.scan(row_start=row_start, row_stop=row_stop)]
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/happybase/table.py", line 434, in scan
 self.connection.client.scannerClose(scan_id)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 198, in _req
 return self._recv(_api)
 File "/home/apps/etl_worker2/venv/local/lib/python2.7/site-packages/thriftpy/thrift.py", line 210, in _recv
 fname, mtype, rseqid = self._iprot.read_message_begin()
 File "thriftpy/protocol/cybin/cybin.pyx", line 439, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6470)
ProtocolError: No protocol version header

I've already looked at #161 and I don't have those settings in my hbase-site.xml. I can not try with older version of happybase because we use reverse param at a lot of places. That does not seem to be supported in 0.9. Any suggestions is highly appreciated !!

@pranny
Copy link
Author

pranny commented Jul 4, 2017

As a workaround, I've wrapped all my calls in a try, catch block with retries. It's conceptually similar to the AutoRetryPool.

from thriftpy.protocol.cybin import ProtocolError

    def put_data(cls, key, data):
        if data:
            attempt = 0
            while True:
                try:
                    with cls.connection_pool.connection() as connection:
                        table = connection.table(cls.table_name)
                        table.put(key, data)
                        break
                except ProtocolError as e:
                    attempt += 1
                    if attempt >= cls.MAX_ATTEMPTS:
                        raise e

UPDATE:
This isn't helping much. With a retry count of 5 I still experience the errors.

P.S. I am using a pool of size 32 which is shared amongst 20 processes. The operations are read and write intensive.

@wbolster
Copy link
Member

wbolster commented Jul 7, 2017

a pool of size 32 which is shared amongst 20 processes

whoo wait wait wait. SHARING a pool between processes? that is not possible.

it is supposed to be thread-safe but tcp sockets cannot be shared between processes.

try using a pool of size 1 (yes, one) per single-threaded process.

@pranny
Copy link
Author

pranny commented Jul 8, 2017

I think you are right. That explains why it happens only on some of the apps. The apps that are single threaded works great. All the other multi process apps encounter this issue.

@wbolster
Copy link
Member

wbolster commented Jul 8, 2017

if you're feeling adventurous you could open a pr that makes the pool remember the os.getpid() that created it, and verifies that it is the same in the with pool.connection(...) logic. ;)

@pranny
Copy link
Author

pranny commented Jul 9, 2017

This saved everything. I made sure that all happybase connection are setup after the process is forked. And it worked like a charm. On another note, I also encountered similar stuff with my django + uwsgi app. So i just set the lazy-apps=True setting in uwsgi so that the apps are loaded after fork. This saved them also. What a beautiful insight into multi processing. Thanks a ton !!

The patch sounds like a great idea to help prevent such "accidents". I'd be working on it coming few weeks.

@pranny pranny closed this as completed Jul 9, 2017
@wbolster
Copy link
Member

wbolster commented Jul 9, 2017

great to hear your issue is solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants