Skip to content
This repository was archived by the owner on Apr 1, 2026. It is now read-only.
This repository was archived by the owner on Apr 1, 2026. It is now read-only.

BigTable: Async client for high throughput mutate_rows #9

@mayurjain0312

Description

@mayurjain0312

Ubuntu 16.04
Python version and virtual environment information python --version - python 2.7.12

With a batch size of 300, total 3 nodes in an instance, the write throughput is not good for BigTable by using mutate_rows api call.

BULK_WRITE_BATCH_SIZE = 300
with open(file_path) as sensor_data_input_file:
    list_direct_row_obj = []
    for line in sensor_data_input_file:
        if not line.strip():
            continue

        sensor_json_data = json.loads(line)
        row_key = create_sensor_data_id(sensor_json_data)
        value = line

        direct_row_obj = bigtable.row.DirectRow(row_key, table)

        column_id = 'column_id_data'.encode('utf-8')
        direct_row_obj.set_cell(column_family_id, column_id, value.encode('utf-8'))


        list_direct_row_obj.append(direct_row_obj)
        direct_row_obj = None

        if len(list_direct_row_obj) == BULK_WRITE_BATCH_SIZE:
            table.mutate_rows(list_direct_row_obj)
            list_direct_row_obj[:] = []

    if list_direct_row_obj:
        table.mutate_rows(list_direct_row_obj)

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigtableIssues related to the googleapis/python-bigtable API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions