Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table Put - How Do We Assign And Use A Variable For the 'Row Key' #255

Closed
KaiquanMah opened this issue Sep 28, 2022 · 8 comments
Closed
Assignees

Comments

@KaiquanMah
Copy link

KaiquanMah commented Sep 28, 2022

References:

  1. https://happybase.readthedocs.io/en/latest/api.html#happybase.Table.put
  2. https://happybase.readthedocs.io/en/latest/faq.html

When using Table Put,

within python3 shell

Approach 1 - Hardcoding values => put was successful

pool = hb.ConnectionPool(size=3, host='localhost')
table = conn.table('tableName')
with pool.connection() as conn: \
table.put('2', {'cf1:colA': 'value1', \
                            'cf2:colB': 'value2'}, timestamp=123456)

Approach 2 - Turning 'key' and 'value' pairs within the dictionary into bytes => put encountered errors

pool = hb.ConnectionPool(size=3, host='localhost')
table = conn.table('tableName')
with pool.connection() as conn: \
table.put('2', {b'cf1:colA': b'value1', \
                            b'cf2:colB': b'value2'}, timestamp=123456)

Approach 3 - Use variables for all other things except for the row key => put was successful

pool = hb.ConnectionPool(size=3, host='localhost')
table = conn.table('tableName')

key1 = 'cf1:colA'
value1 = 'value1'
key2 = 'cf2:colB'
value2 = 'value2'
timestamp = 123456
values_dict = dict()
values_dict[key1] = value1
values_dict[key2] = value2

with pool.connection() as conn: \
table.put('2', values_dict, timestamp=timestamp)

Approach 4 - Use variables for all other things except for the row key => put encountered errors

pool = hb.ConnectionPool(size=3, host='localhost')
table = conn.table('tableName')

row_key = '2'            # alternatively, I also tried assigning these other values to the row_key variable: str(2), b'2', 2
                                 # I still encountered an error when assigning these other values
                                 # **Is there a reason why I need to hard-code the 'row key' for the put to work?**
                                 # **How Do We Assign And Use A Variable For the 'Row Key'**
key1 = 'cf1:colA'
value1 = 'value1'
key2 = 'cf2:colB'
value2 = 'value2'
timestamp = 123456
values_dict = dict()
values_dict[key1] = value1
values_dict[key2] = value2

with pool.connection() as conn: \
table.put(row_key, values_dict, timestamp=timestamp)

Approach 4's error

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/table.py", line 464, in put
    batch.put(row, data)
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 137, in __exit__
    self.send()
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 62, in send
    self._table.connection.client.mutateRowsTs(
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 219, in _req
    return self._recv(_api)
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 231, in _recv
    fname, mtype, rseqid = self._iprot.read_message_begin()
  File "thriftpy2/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin
  File "thriftpy2/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 65, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.c_read
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 69, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.read_trans
  File "thriftpy2/transport/cybase.pyx", line 61, in thriftpy2.transport.cybase.TCyBuffer.read_trans
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/transport/socket.py", line 131, in read
    raise TTransportException(type=TTransportException.END_OF_FILE,
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')

@KaiquanMah
Copy link
Author

If there is a solution to the above questions, I intend to scale up to writing from a CSV, reading into Python, then writing into HBase using HappyBase.
Sample Tutorial: https://jarrettmeyer.com/2016/02/15/inserting-data-into-hbase-with-python

@wbolster
Copy link
Member

wbolster commented Sep 28, 2022

you actually did not show the errors at all, only some very incomplete snippets of code.

your problem has nothing to do with hardcoding values at all. you're likely not using the connection pool and connections correctly. a connection obtained from a pool is only valid in a with block, and any operations like .table(...) etc need to be covered by that with block as well. the documentation has correct example code: https://happybase.readthedocs.io/en/latest/user.html#obtaining-connections

@KaiquanMah
Copy link
Author

KaiquanMah commented Sep 30, 2022

ok, if I do not use any pool connection,

row_key = '2' 
key1 = 'cf1:colA'
value1 = 'value1'
key2 = 'cf2:colB'
value2 = 'value2'
timestamp = 123456
values_dict = dict()
values_dict[key1] = value1
values_dict[key2] = value2



import happybase as hb
conn = hb.Connection('localhost',9090)
table = conn.table('tableName')


table.put(row_key, values_dict, timestamp=timestamp)  # **Comment: Here is the error I encounter. If I run using a hard-coded row key, the put works**
> Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/table.py", line 464, in put
    batch.put(row, data)
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 137, in __exit__
    self.send()
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 62, in send
    self._table.connection.client.mutateRowsTs(
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 219, in _req
    return self._recv(_api)
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 251, in _recv
    raise v
Hbase_thrift.IOError: IOError(message=b"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family cf1 does not exist in region tableName,,1664349103498.d1d8b193f042bf27efb16bc70d35a3d5. in table 'tableName', ...\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:758)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:713)\n\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)\n\tat org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)\n\tat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2180)\n\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)\n\tat org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)\n\tat java.lang.Thread.run(Thread.java:750)\n: 1 time, \n\tat org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:247)\n\tat org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1800(AsyncProcess.java:227)\n\tat org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1765)\n\tat org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240)\n\tat org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:190)\n\tat org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1422)\n\tat org.apache.hadoop.hbase.client.HTable.put(HTable.java:1025)\n\tat org.apache.hadoop.hbase.thrift.ThriftServerRunner$HBaseHandler.mutateRowsTs(ThriftServerRunner.java:1355)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.hbase.thrift.HbaseHandlerMetricsProxy.invoke(HbaseHandlerMetricsProxy.java:67)\n\tat com.sun.proxy.$Proxy10.mutateRowsTs(Unknown Source)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRowsTs.getResult(Hbase.java:4416)\n\tat org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRowsTs.getResult(Hbase.java:4400)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n")
>>> 

@wbolster
Copy link
Member

the error is right there if you read carefully:

IOError(message=b"org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family IPS does not exist in region dns,,1664349103498.d1d8b193f042bf27efb16bc70d35a3d5

@KaiquanMah
Copy link
Author

The 'NoSuchColumnFamilyException' does not make sense to me.
If we are comparing approach 3 and 4
=> The put was successful in approach 3 using the same 'values_dict' containing the same column family and column qualifier.
=> The difference between approach 3 and 4 was in how the 'row_key' was defined - approach 3 hardcoded the row key within the put, and approach 4 used a 'row_key' variable in the put => I was not expecting a 'NoSuchColumnFamilyException'

@wbolster
Copy link
Member

wbolster commented Oct 11, 2022

your approach 4 is broken code. you're not using connections and connection pools correctly. the basic structure of your code should look like this:

pool = happybase.ConnectionPool(size=3, host=...)
with pool.connection() as connection:
    table = connection.table(...)
    table.put(...)

@KaiquanMah
Copy link
Author

KaiquanMah commented Oct 14, 2022

Hi @wbolster,

Thank you for your help. I tried your approach with and without using variables in the put - and it worked!

# Without variables in the put
import happybase as hb
pool = hb.ConnectionPool(size=3, host='localhost')
with pool.connection() as conn:
  table = conn.table('table1')
  table.put('rowKey1', {'key1': 'value1'}, timestamp=123456)

with pool.connection() as conn:
  table = conn.table('table')
  table.row('rowKey1')
>>>{b'key1': b'value1'}

With variables in the put

import happybase as hb
pool = hb.ConnectionPool(size=3, host='localhost')
row_key = 'rowKey2'
value_dict_key_1 = 'key1'
value_dict_value_1 = 'value1'
timestamp = 123457
values_dict = dict()
values_dict[value_dict_key_1] = value_dict_value_1            #{'key1': 'value1'}

with pool.connection() as conn:
  table = conn.table('table1')
  table.put(row_key, values_dict, timestamp=timestamp)
  table.row(row_key)
>>>{b'key1': b'value1'}

Observation/What I don't quite understand is: when there is a period of time between my first with pool... and second 'with pool', I encounter the error below.
What solved the issue: What helped to solve the issue was to call pool = hb.ConnectionPool(size=3, host='localhost') again if there is a period of time before I run my second with pool.

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/table.py", line 464, in put
    batch.put(row, data)
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 137, in __exit__
    self.send()
  File "/home/kai/.local/lib/python3.8/site-packages/happybase/batch.py", line 62, in send
    self._table.connection.client.mutateRowsTs(
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 219, in _req
    return self._recv(_api)
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/thrift.py", line 231, in _recv
    fname, mtype, rseqid = self._iprot.read_message_begin()
  File "thriftpy2/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin
  File "thriftpy2/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 65, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.c_read
  File "thriftpy2/transport/buffered/cybuffered.pyx", line 69, in thriftpy2.transport.buffered.cybuffered.TCyBufferedTransport.read_trans
  File "thriftpy2/transport/cybase.pyx", line 61, in thriftpy2.transport.cybase.TCyBuffer.read_trans
  File "/home/kai/.local/lib/python3.8/site-packages/thriftpy2/transport/socket.py", line 131, in read
    raise TTransportException(type=TTransportException.END_OF_FILE,
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')

@wbolster
Copy link
Member

i guess the server closed the connection in the mean time. see #255, #133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants