Importing Data

If you have large amounts of data, it is more efficient to import it to Pilosa instead of several Set or Clear queries.

pilosa.imports module defines several format functions. Depending on the data, the following format is expected:

row_id_column_id: ROW_ID,COLUMN_ID
row_id_column_key: ROW_ID,COLUMN_KEY
row_key_column_id: ROW_KEY,COLUMN_ID
row_key_column_key: ROW_KEY,COLUMN_KEY

Optionally, a timestamp with GMT time zone can be added:

ROW_ID,COLUMN_ID,TIMESTAMP

Note that, each line corresponds to a single bit and the lines end with a new line (\n or \r\n). The target index and field must have been created before hand.

Here's some sample code that uses csv_row_id_column_id formatter along with a timestamp:

import pilosa
from pilosa.imports import csv_column_reader, csv_row_id_column_id
import time

try:
    # python 2.7 and 3
    from io import StringIO
except ImportError:
    # python 2.6 and 2.7
    from StringIO import StringIO

text = u"""
    1,10,2019-11-30T02:00
    5,20,2020-09-29T03:30
    3,41,2017-09-23T03:08
    10,10485760,2018-09-23T03:05
"""
time_func = lambda s: int(time.mktime(time.strptime(s, "%Y-%m-%dT%H:%M")))
reader = csv_column_reader(StringIO(text), timefunc=time_func)
client = pilosa.Client()
schema = client.schema()
index = schema.index("sample-index")
field = index.field("sample-field", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY_HOUR)
client.sync_schema(schema)
client.import_field(field, reader)

client.import_field function imports Set bits by default. If you want to import Clear bits instead, pass clear=True:

client.import_field(field, reader, clear=True)

Pilosa supports a fast way of importing bits for row ID/Column ID data by transferring bits from the client to the server by packing bits into a roaring bitmap. You can enable that by passing fast_import=True:

client.import_field(field, reader, fast_import=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

imports.md

imports.md

Importing Data

Files

imports.md

Latest commit

History

imports.md

File metadata and controls

Importing Data