# Python Client Issues: Query and Scan
Reproducible cases for potential software bugs and/or documentation issues.
<br>
This notebook requires Aerospike datbase running on localhost and that python and the Aerospike python client have been installed (`pip install aerospike`). Visit [Aerospike notebooks repo](https://github.com/aerospike-examples/interactive-notebooks) for additional details and the docker container.

## Ensure database is running
This notebook requires that Aerospike datbase is running.

In [1]:
!asd >& /dev/null
!pgrep -x asd >/dev/null && echo "Aerospike database is running!" || echo "**Aerospike database is not running!**"

Aerospike database is running!


## Connect to database and populate test data
The test data has ten records with user-key "id1-10", two bins (fields) "bin1" and "bin2", in the namespace "test" and set "test_small". 

In [2]:
# import the module
from __future__ import print_function
import aerospike

# Configure the client
config = {
  'hosts': [ ('127.0.0.1', 3000) ],
}
policy = {'key': aerospike.POLICY_KEY_SEND}   # policy to store the user_key along with the record

# Create a client and connect it to the cluster
try:
  client = aerospike.client(config).connect()
except:
  import sys
  print("failed to connect to the cluster with", config['hosts'])
  sys.exit(1)

namespace = 'test'
small_set = 'test_small'

import random
random.seed(1)
try:
    for i in range(10):
      # create records in small_set
      client.put((namespace, small_set, 'id-'+str(i+1)), 
                 {'bin1': random.randint(1,100),
                  'bin2': random.randint(1,1000) },
                 policy=policy)
    for i in range(10):
      # create records in null set
      client.put((namespace, None, 'id-'+str(i+1)), 
                 {'bin1': random.randint(1,100),
                  'bin2': random.randint(1,1000) },
                 policy=policy)
except Exception as e:
  import sys
  print("error: {0}".format(e), file=sys.stderr)

print('Test data populated.')

Test data populated.


## Create secondary index
To use the query API, a secondary index must exist on the query field. We will create an integer secondary index on the "bin1" bin.

In [3]:
# Must create an index to query on a bin
from aerospike import exception as ex
sindex_name = 'test_small_bin1_number_idx'
try:
    client.index_integer_create("test", small_set, 'bin1', sindex_name)
except ex.IndexFoundError:
    pass

print('Secondary index created.')

Secondary index created.


# Issue: Query add_ops does not work with update operations in foreground mode (results and foreach). No error is returned.
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.add_ops

The API should disallow update operation with results and foreach. Documentation should warn users against such use.

In [4]:
import time
from aerospike_helpers.operations import operations as op_helpers
from aerospike import predicates as p

ops = [
    op_helpers.increment("bin2", 1),
]

print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# Results
query = client.query('test', small_set)
query.where( p.equals('bin1', 33) )
query.add_ops(ops)
records = query.results( )
print('query/add_ops/results returned')
for rec in sorted(records):
    print(rec[0][2], rec[2])

time.sleep(2)
print('current state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)


# foreach
def print_result(result_tuple):
    print(result_tuple[0][2], result_tuple[2])
print('query/add_ops/foreach returned')
query = client.query('test', small_set)
query.where( p.equals('bin1', 33) )
query.add_ops(ops)
query.foreach(print_result)  

time.sleep(2)
print('final state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

initial state
{'bin1': 33, 'bin2': 121}
query/add_ops/results returned
id-3 {'bin1': 33, 'bin2': 121}
current state
{'bin1': 33, 'bin2': 121}
query/add_ops/foreach returned
id-3 {'bin1': 33, 'bin2': 121}
final state
{'bin1': 33, 'bin2': 121}


# Issue: Scan add_ops throws a misleading exception for foreground execution (results and foreach). However the update silently executed in the background mode.
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.add_ops

The API should disallow update operation with results and foreach. Documentation should warn users against such use.

In [5]:
from aerospike import predexp as predexp
preds = [
        predexp.integer_bin('bin1'),
        predexp.integer_value(33),
        predexp.integer_equal()
    ]
policy = {
        'predexp': preds,
        'max_retries': 0
    }

print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# results: exception Max retries exceeded: silently converted into background execution!
scan = client.scan('test', small_set)
scan.add_ops(ops)
try:
    records = scan.results(policy=policy) 
except Exception as e:
  import sys
  print("scan/add_ops/results error: {0}".format(e), file=sys.stderr)
else:
    records = scan.results( )
    print('scan/add_ops/results')
    for rec in sorted(records):
        print(rec[0][2], rec[2]) 
        
time.sleep(2)
print('current state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# foreach: exception Max retries exceeded: silently converted into background execution!
scan = client.scan('test', small_set)
scan.add_ops(ops)
try:
    scan.foreach(print_result, policy=policy) 
except Exception as e:
  import sys
  print("scan/add_ops/foreach: error: {0}".format(e), file=sys.stderr)
else:
    print('scan/add_ops/foreach')
    
time.sleep(2)
print('final state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

initial state
{'bin1': 33, 'bin2': 121}


scan/add_ops/results error: (-12, 'Max retries exceeded: 0', 'src/main/aerospike/as_partition_tracker.c', 318, False)


current state
{'bin1': 33, 'bin2': 122}


scan/add_ops/foreach: error: (-12, 'Max retries exceeded: 0', 'src/main/aerospike/as_partition_tracker.c', 318, False)


final state
{'bin1': 33, 'bin2': 123}


**UDF issues**

In [6]:
#client.close()
udf_file = 'udf_test.lua'
udf_path = './udf'
config = {
    'hosts': [ ('127.0.0.1', 3000)],
    'lua': { 'user_path': udf_path} # needed for only stream/aggregate processing?
}
client = aerospike.client(config).connect()
client.udf_remove(udf_file)
print('current udf modules:', client.udf_list())
client.udf_put(udf_path + '/' + udf_file)
print('current udf modules:', client.udf_list())
print(client.udf_get(udf_file))

current udf modules: []
current udf modules: [{'name': 'udf_test.lua', 'hash': bytearray(b'1c9810c0e6cc510fc9660bbf0c53581b3663ec52'), 'type': 0, 'content': bytearray(b'')}]
-- udf_test.lua 

-- read
function get_sum(rec, bin1, bin2)
    local ret = map()                     -- Initialize the return value (a map)
    --ret['sum'] = rec[bin1] + rec[bin2]
    ret['status'] = 'OK'
    ret['userdata'] = rec[bin1] + rec[bin2]
    return ret
end

-- write
function increment(rec, binName, value)
    rec[binName] = rec[binName] + value
    aerospike:update(rec)
end

-- read-write
function increment_and_get(rec, binName, value)
    local ret = map()                     -- Initialize the return value (a map)
    rec[binName] = rec[binName] + value
    ret['status'] = 'OK'
    ret['userdata'] = rec[binName]
    aerospike:update(rec)
    return ret
end


# Issue: Record UDF (update) operations work with query in background execution. Doc seems to suggest only stream UDF functions work with query.
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.apply

The documentation should reflect valid and allowed use of record UDFs with a query.

In [7]:
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

q = client.query('test', small_set)
q.apply('udf_test', 'increment', [ 'bin2', 10 ])
q.where( p.equals('bin1', 33) )
job_id = q.execute_background()
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_QUERY)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('query/record_udf update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

initial state
{'bin1': 33, 'bin2': 123}
query/record_udf update/beckground
record bins: {'bin1': 33, 'bin2': 133}


# Issue: Query apply (udf op) does not work for read operations with results and foreach. 
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.apply
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.results
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.foreach

The API should disallow apply operation with results and foreach. Documentation should warn users against such use.

In [8]:
q = client.query('test', small_set)
q.where( p.equals('bin1', 33) )
q.apply('udf_test', 'get_sum', [ 'bin1', 'bin2' ])
try:
    records = q.results()
except Exception as e:
    import sys
    print("scan/record-udf read/results: error: {0}".format(e), file=sys.stderr)
else:
    print('scan/record-udf read/results')
    for rec in sorted(records):
        print(rec[0][2], rec[2])

scan/record-udf read/results: error: (100, 'UDF: Execution Error 1', 'src/main/aerospike/aerospike_query.c', 1008, False)


# Issue: Scan apply (udf op) does not work for read operations in foreground mode (results and foreach). A cryptic error is returned.
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.apply
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.results
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.foreach

The API should disallow apply with results and foreach. Documentation should warn users against such use.

In [9]:
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

scan = client.scan('test', small_set)
scan.apply('udf_test', 'get_sum', [ 'bin1', 'bin2' ])
try:
    records = scan.results(policy)
except Exception as e:
  import sys
  print("scan/record-udf read/results: error: {0}".format(e), file=sys.stderr)
else:
    print('scan/record-udf read/results')
    for rec in sorted(records):
        print(rec[0][2], rec[2])
time.sleep(2)
print('state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

initial state
{'bin1': 33, 'bin2': 133}


scan/record-udf read/results: error: (-12, 'Max retries exceeded: 0', 'src/main/aerospike/as_partition_tracker.c', 318, False)


state
{'bin1': 33, 'bin2': 133}


# Issue: Query apply (udf op) does not work for update operations in foreground mode (results and foreach).
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.apply
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.results
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.foreach

The API should disallow apply operation with results and foreach. Documentation should warn users against such use.

In [10]:
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# try if record udfs work with query for update/results
# -> attempt to perform on a nil value field?
q = client.query('test', small_set)
q.where( p.equals('bin1', 33) )
q.apply('udf_test', 'increment', [ 'bin2', 10 ])
try:
    records = q.results()
except Exception as e:
  import sys
  print("query/record-udf update/results: error: {0}".format(e), file=sys.stderr)
else:
    print('query/record-udf update/results')
    for rec in sorted(records):
        print(rec[0][2], rec[2])
time.sleep(1)
print('state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

initial state
{'bin1': 33, 'bin2': 133}


query/record-udf update/results: error: (100, 'UDF: Execution Error 1', 'src/main/aerospike/aerospike_query.c', 1008, False)


state
{'bin1': 33, 'bin2': 133}


# Issue: Scan apply (udf op) returns a crypic error for update operations in foreground mode (results and foreach). However the update succeeds in the background.
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.apply
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.results
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.foreach

The API should disallow update operation with results and foreach. Documentation should warn users against such use.

In [11]:
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# results will return immediately, but the operation will work like execute_background
# foreach also should work similarly
scan = client.scan('test', small_set)
scan.apply('udf_test', 'increment', [ 'bin2', 10 ])
try:
    records = scan.results(policy)
except Exception as e:
  import sys
  print("scan/record-udf read/results: error: {0}".format(e), file=sys.stderr)
else:
    print('scan/record-udf update/results')
    for rec in sorted(records):
        print(rec[0][2], rec[2])
time.sleep(2)
print('state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

initial state
{'bin1': 33, 'bin2': 133}


scan/record-udf read/results: error: (-12, 'Max retries exceeded: 0', 'src/main/aerospike/as_partition_tracker.c', 318, False)


state
{'bin1': 33, 'bin2': 143}


# Issue: If both add-ops and apply are used in a scan to perform updates in background mode, only apply operations work. Updates in add-ops are silently ignored.
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.apply
https://www.aerospike.com/apidocs/python/scan.html#aerospike.Scan.add_ops

The API should disallow combined apply and add_ops. Documentation should warn users against such use.

In [12]:
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

ops = [
    op_helpers.increment("bin2", 1),
]
# First scan
# Scan - only add-op works
scan = client.scan('test', small_set)
scan.add_ops(ops)
job_id = scan.execute_background(policy)
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_SCAN)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after scan/op update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

# Scan - only apply works
scan = client.scan('test', small_set)
scan.apply('udf_test', 'increment', [ 'bin2', 10 ])
job_id = scan.execute_background(policy)
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_SCAN)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after scan/record_udf update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

# Scan - if both add-op and apply are issued, add-op is silently ignored
scan = client.scan('test', small_set)
scan.apply('udf_test', 'increment', [ 'bin2', 10 ])
scan.add_ops(ops)
job_id = scan.execute_background(policy)
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_SCAN)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after scan/record_udf+op combo update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)


initial state
{'bin1': 33, 'bin2': 143}
after scan/op update/beckground
record bins: {'bin1': 33, 'bin2': 144}
after scan/record_udf update/beckground
record bins: {'bin1': 33, 'bin2': 154}
after scan/record_udf+op combo update/beckground
record bins: {'bin1': 33, 'bin2': 164}


# Issue: If both add-ops and apply are used in a query to perform updates in background mode, only apply operations work. Updates in add-ops are silently ignored. 
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.apply
https://www.aerospike.com/apidocs/python/query.html#aerospike.Query.add_ops

The API should disallow combined use of add_ops and apply. Documentation should warn users against such use.

In [13]:
# now query
print('initial state')
key, _, bins = client.get(('test',small_set,'id-3'))
print(bins)

# only add-ops update works
q = client.query('test', small_set)
q.where( p.equals('bin1', 33) )
q.add_ops(ops)
job_id = q.execute_background()
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_QUERY)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after query/op update update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

# only apply update works
q = client.query('test', small_set)
q.where( p.equals('bin1', 33) )
q.apply('udf_test', 'increment', [ 'bin2', 10 ])
job_id = q.execute_background()
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_QUERY)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after query/record_udf update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

# in combined add-ops+apply, add-ops update is silently ignored
q = client.query('test', small_set)
q.where( p.equals('bin1', 33) )
q.apply('udf_test', 'increment', [ 'bin2', 10 ])
q.add_ops(ops)
job_id = q.execute_background()
while True:
    time.sleep(0.25)
    response = client.job_info(job_id, aerospike.JOB_QUERY)
    if response['status'] == aerospike.JOB_STATUS_COMPLETED:
        break
print('after query/record_udf+op combo update/beckground')
key, _, bins = client.get(('test',small_set,'id-3'))
print('record bins:', bins)

initial state
{'bin1': 33, 'bin2': 164}
after query/op update update/beckground
record bins: {'bin1': 33, 'bin2': 165}
after query/record_udf update/beckground
record bins: {'bin1': 33, 'bin2': 175}
after query/record_udf+op combo update/beckground
record bins: {'bin1': 33, 'bin2': 185}


## Clean up

In [14]:
# Close the connection to the Aerospike cluster
client.close()
print('Connection closed.')

Connection closed.


## Next steps

Visit [Aerospike notebooks repo](https://github.com/aerospike-examples/interactive-notebooks) to run additional Aerospike notebooks. To run a different notebook, download the notebook from the repo to your local machine, and then click on File->Open, and select Upload.