# Basics on Python-Blosc2

Python-Blosc2 is a thin wrapper for the C-Blosc2 format and compression library.  It allows to easily and quickly create, append, insert, update and delete data and metadata in a super-chunk container (SChunk class).

In [1]:
import blosc2
import numpy as np

## Create a new SChunk instance

Let's configure the parameters that are different from defaults:

In [2]:
cparams = {
    "codec": blosc2.Codec.BLOSCLZ,
    "typesize": 4,
    "nthreads": 8,
}

dparams = {
    "nthreads": 16,
}

storage = {
    "contiguous": True,
    "urlpath": "myfile.b2frame",
    "cparams": cparams,
    "dparams": dparams,
}

And let's remove a possible existing serialized super-chunk (frame):

In [3]:
blosc2.remove_urlpath("myfile.b2frame")

Now, we can already create a SChunk!

In [4]:
schunk = blosc2.SChunk(chunksize=10_000_000, **storage)
schunk

<blosc2.SChunk.SChunk at 0x7f71c83a67c0>

Great! So you have created your first super-chunk with your desired compression codec and typesize, and it is going to be persistent on disk.

## Append and read data

We are going to add some data.  First, let's create the dataset (4 MB):

In [5]:
buffer = [i * np.arange(2_500_000, dtype="int32") for i in range(100)]

In [6]:
%%time
for i in range(100):
    nchunks = schunk.append_data(buffer[i])
    assert nchunks == (i + 1)

CPU times: user 355 ms, sys: 0 ns, total: 355 ms
Wall time: 69.6 ms


In [7]:
!ls -lh myfile.b2frame

-rw-rw-r-- 1 faltet2 faltet2 11M jun 28 19:03 myfile.b2frame


So, while we have added 100 chunks of 10 MB each, the data size of the frame on-disk is a little above 10 MB.  This is how compression is helping you to use less resources.

Now, let's read the chunks from disk:

In [8]:
dest = np.empty(2_500_000, dtype="int32")

In [9]:
%%time
for i in range(100):
    chunk = schunk.decompress_chunk(i, dest)

CPU times: user 65.9 ms, sys: 82 ms, total: 148 ms
Wall time: 39.5 ms


In [10]:
check = 99 * np.arange(2_500_000, dtype="int32")
np.testing.assert_equal(dest, check)

## Updating and inserting

First, let's update the first chunk:

In [11]:
data_up = np.arange(2_500_000, dtype='int32')
chunk = blosc2.compress2(data_up)

In [12]:
%%time
schunk.update_chunk(nchunk=0, chunk=chunk)

CPU times: user 258 µs, sys: 348 µs, total: 606 µs
Wall time: 351 µs


100

And then, insert another one at position 4:

In [13]:
%%time
schunk.insert_chunk(nchunk=4, chunk=chunk)

CPU times: user 116 µs, sys: 158 µs, total: 274 µs
Wall time: 173 µs


101

In this case the return value is the new number of chunks in the super-chunk.

## Add user meta info

The user can also add meta-information via the `vlmeta` accessor.  `vlmeta` stands for "variable length metadata", and, as the name suggests, it is meant to store general, variable length data (incidentally, this is more flexible than what you can store as regular data, which is always the same `typesize`).

`vlmeta` follows the dictionary interface, so adding info is as easy as:

In [14]:
schunk.vlmeta['info1'] = 'This is an example'
schunk.vlmeta['info2'] = 'of user meta handling'
schunk.vlmeta.getall()

{'info1': 'This is an example', 'info2': 'of user meta handling'}

You can also delete an entry as you would do with a dictionary:

In [15]:
del schunk.vlmeta['info1']
schunk.vlmeta.getall()

{'info2': 'of user meta handling'}

That's all for now.  There are more examples in the examples directory for you to explore.  Enjoy!