-
Notifications
You must be signed in to change notification settings - Fork 93
Using the LibraryTool to look at a library's internal state
IvoDD edited this page Oct 16, 2024
·
3 revisions
The LibraryTool is a tool that can be used to explore that ArcticDB stores on disk.
It is very useful when you want/need to:
- get a better understanding for how ArcticDB works under the hood
- debug the state of ArcticDB on disk
The LibraryTool can be used both with ArcticDB and Arcticc.
For the most part, it is used in the same way, but notes will be made if there is are differences in the interface.
You can use LibraryTool with any ArcticDB library, you simply pass it to the library tool like so:
from arcticdb.toolbox.library_tool import KeyType
ac = Arctic(...)
lib = ac[...]
lib_tool = lib._nvs.library_tool()
If you are using the old Arcticc bindings:
from arcticcxx.tools import LibraryTool
from arcticc.toolbox.storage import KeyType
lib_tool = LibraryTool(lib._nvs._library)
In [215]: lib_tool.find_keys(KeyType.VERSION_REF)
Out[215]: [r:my_symbol, r:test, r:test2]
In [216]: lib_tool.find_keys(KeyType.SYMBOL_LIST)
Out[216]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
In [220]: lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "test2")
Out[220]: [r:test2]
In [221]: keys = lib_tool.find_keys(KeyType.SYMBOL_LIST)
In [222]: keys
Out[222]:
[l:__add__:0:0xfab0b4706aa5f11b@1692777518476933264[test2,test2],
l:__symbols__:0:0x493c712e5bc5e669@1692777423948255193[0,0]]
In [223]: lib_tool.read_to_dataframe(keys[1])
If you are using arcticc
You will need to recreate the DataFrame from the underlying segments. \ You can use this snippet to do so:from arcticcxx_toolbox.codec import Buffer, decode_segment
from arcticc.version_store._normalization import FrameData
from arcticcxx.version_store import PythonOutputFrame
import pandas as pd
def read_to_df(lib_tool, key):
segment = lib_tool.read(key).segment
field_names = [f.name for f in segment.header.stream_descriptor.fields]
frame_data = FrameData.from_cpp(PythonOutputFrame(decode_segment(segment)))
cols = {}
for idx, field_name in enumerate(field_names):
cols[field_name] = frame_data.data[idx]
return pd.DataFrame(cols, columns=field_names)
You can use lib_too.read_to_keys
to read a key which contains links to other keys. This can be used to iterate over the version chain and inspect if there is something surprising with it:
>>> # We find the version ref key for the symbol
>>> vref = lib_tool.find_keys_for_symbol(KeyType.VERSION_REF, "sym")[0]
>>> vref
r:sym
>>> # Reading the keys inside the version ref shows a link to the last version (which tombstones v0)
>>> vref_keys = lib_tool.read_to_keys(vref)
>>> vref_keys
[x:sym:0:0x599c329f212e9b1d@1729069015917091586[0,172800000000001], V:sym:0:0xbd95682775eb0561@1729069015917161585[0,0]]
>>> # Reading the keys inside the last version key in the chain shows the tombstone and the link to the previous version key
>>> version_key = vref_keys[-1]
>>> version_keys = lib_tool.read_to_keys(version_key)
>>> version_keys
[x:sym:0:0x599c329f212e9b1d@1729069015917091586[0,172800000000001], V:sym:1:0xf4de0df7f4a2664c@1729068970655744750[0,0]]
>>> # Reading the previous version key shows the link to the latest index key
>>> prev_version_key = version_keys[-1]
>>> prev_version_keys = lib_tool.read_to_keys(prev_version_key)
>>> prev_version_keys
[i:sym:1:0x40a6734b5581f255@1729068970646451036[0,172800000000001], V:sym:0:0x6fdab687b265d67b@1729068960386415821[0,0]]
>>> # Reading the index key we can find the data key
>>> index_key = prev_version_keys[0]
>>> index_keys = lib_tool.read_to_keys(index_key)
>>> index_keys
[d:sym:1:0x1c34a96809b98d75@1729068970639555561[0,172800000000001]]
>>> # And we can read the data key
>>> data_key = index_keys[0]
>>> lib_tool.read_to_dataframe(data_key)
col
index
1970-01-01 1
1970-01-02 2
1970-01-03 3
You can see other examples of lib tool usage inside the tests.
ArcticDB Wiki