Skip to content
This repository has been archived by the owner on Sep 28, 2022. It is now read-only.

Latest commit

 

History

History
57 lines (47 loc) · 1.95 KB

imports.md

File metadata and controls

57 lines (47 loc) · 1.95 KB

Importing Data

If you have large amounts of data, it is more efficient to import it to Pilosa instead of several Set or Clear queries.

pilosa.imports module defines several format functions. Depending on the data, the following format is expected:

  • row_id_column_id: ROW_ID,COLUMN_ID
  • row_id_column_key: ROW_ID,COLUMN_KEY
  • row_key_column_id: ROW_KEY,COLUMN_ID
  • row_key_column_key: ROW_KEY,COLUMN_KEY

Optionally, a timestamp with GMT time zone can be added:

ROW_ID,COLUMN_ID,TIMESTAMP

Note that, each line corresponds to a single bit and the lines end with a new line (\n or \r\n). The target index and field must have been created before hand.

Here's some sample code that uses csv_row_id_column_id formatter along with a timestamp:

import pilosa
from pilosa.imports import csv_column_reader, csv_row_id_column_id
import time

try:
    # python 2.7 and 3
    from io import StringIO
except ImportError:
    # python 2.6 and 2.7
    from StringIO import StringIO

text = u"""
    1,10,2019-11-30T02:00
    5,20,2020-09-29T03:30
    3,41,2017-09-23T03:08
    10,10485760,2018-09-23T03:05
"""
time_func = lambda s: int(time.mktime(time.strptime(s, "%Y-%m-%dT%H:%M")))
reader = csv_column_reader(StringIO(text), timefunc=time_func)
client = pilosa.Client()
schema = client.schema()
index = schema.index("sample-index")
field = index.field("sample-field", time_quantum=pilosa.TimeQuantum.YEAR_MONTH_DAY_HOUR)
client.sync_schema(schema)
client.import_field(field, reader)

client.import_field function imports Set bits by default. If you want to import Clear bits instead, pass clear=True:

client.import_field(field, reader, clear=True)

Pilosa supports a fast way of importing bits for row ID/Column ID data by transferring bits from the client to the server by packing bits into a roaring bitmap. You can enable that by passing fast_import=True:

client.import_field(field, reader, fast_import=True)