Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
2 contributors

Users who have contributed to this file

@yuce @asvetlik
58 lines (47 sloc) 1.95 KB

Importing Data

If you have large amounts of data, it is more efficient to import it to Pilosa instead of several Set or Clear queries.

pilosa.imports module defines several format functions. Depending on the data, the following format is expected:

  • row_id_column_id: ROW_ID,COLUMN_ID
  • row_id_column_key: ROW_ID,COLUMN_KEY
  • row_key_column_id: ROW_KEY,COLUMN_ID
  • row_key_column_key: ROW_KEY,COLUMN_KEY

Optionally, a timestamp with GMT time zone can be added:

ROW_ID,COLUMN_ID,TIMESTAMP

Note that, each line corresponds to a single bit and the lines end with a new line (\n or \r\n). The target index and field must have been created before hand.

Here's some sample code that uses csv_row_id_column_id formatter along with a timestamp:

import pilosa
from pilosa.imports import csv_column_reader, csv_row_id_column_id
import time

try:
    # python 2.7 and 3
    from io import StringIO
except ImportError:
    # python 2.6 and 2.7
    from StringIO import StringIO

text = u"""
    1,10,2019-11-30T02:00
    5,20,2020-09-29T03:30
    3,41,2017-09-23T03:08
    10,10485760,2018-09-23T03:05
"""
time_func = lambda s: int(time.mktime(time.strptime(s, "%Y-%m-%dT%H:%M")))
reader = csv_column_reader(StringIO(text), timefunc=time_func)
client = pilosa.Client()
schema = client.schema()
index = schema.index("sample-index")
field = index.field("sample-field", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY_HOUR)
client.sync_schema(schema)
client.import_field(field, reader)

client.import_field function imports Set bits by default. If you want to import Clear bits instead, pass clear=True:

client.import_field(field, reader, clear=True)

Pilosa supports a fast way of importing bits for row ID/Column ID data by transferring bits from the client to the server by packing bits into a roaring bitmap. You can enable that by passing fast_import=True:

client.import_field(field, reader, fast_import=True)
You can’t perform that action at this time.