Start `piker.storage` subsys: cross-(ts)db middlewares #486

goodboy · 2023-03-09T20:35:02Z

Launch pad for work towards the task list in #485 🏄🏼

As a start this introduces a new piker.storage subsystem to
provide for database related middleware(s) as well as new storage
backend using polars and apache parquet files to implement
a built-in, local-filesystem managed "time series database":
nativedb.

After some extensive tinkering and brief performance measures I'm
tempted to go all in on this home grown solution for a variety of
reasons (see details in 27932e4) but re-summarizing some of them
here:

wayy faster load times with no "datums to load limit" required.
smaller space footprint and we haven't even touched compression
settings yet!
wayyy more compatible with other systems which can lever the apache
ecosystem.
gives us finer grained control over the filesystem usage so we can
choose to swap out stuff like the replication system or networking
access.
polars already has a multi-db compat layer with multi-engine
support we can leverage and completely sidestep integration work
with multiple standard tsdbs?
- https://pola-rs.github.io/polars-book/user-guide/io/database/

Core dev discusssion

we've put some work into marketstore support machinery
including:
- docker supervisor subsys, spawning, monitoring (which is still
  super useful going forward FWIW!).
- anyio-marketstore an async client written and maintained by
  our devs.
- originally using the rt ingest and re-publish over websocket
  features.
  AND the question is whether we're ok abandoning some of this
  and/or reimplementing it on our own around the new apache data
  file/model ecosystems?
we can definitely accomplish ingest, pub-sub and replication
on our own (without really much effort) with the following
existing subsystems and frameworks:
- ingest: tractor actor which writes to apache arrow (IPC)
  files and flushes to parquet on size constraints.
- pub-sub: again with tractor actor and trio-websocket
- replication: use a modern filesystem (btrfs or zfs) and or
  something like borg (with it's unofficial API client) to
  accomplish file syncing across many user-hosts.
  - borg has a community API: https://github.com/spslater/borgapi
  - other file systems:
    - https://nixos.wiki/wiki/ZFS
should we drop all the existing marketstore code?
- it's quite a bit of noise and not going to work anyway given
  the new implementation changes in the .data.history layer.
- the issues we've reported are not getting resolved and are more
  or less deal breakers, like marketstore: gRPC snappy compression crash? #443
- likely the new arcticdb is a better solution longer run then
  mkts was anyway given it's large insti usage..?

ToDo:

goodboy · 2023-03-09T20:44:58Z

We can convert this to a draft if necessary if/when #483 lands

Previously we were passing the `fqme: str` which isn't as extensive nor were we able to pass `MktPair` direct to backend history manager-loading routines (which should be able to rely on always receiving it since currently `stream_quotes()` is always called first for setup). This also starts a slight bit of configuration oriented tsdb info loading (via a new `conf.toml`) such that a user can decide to host their (marketstore) db on a remote host and our container spawning and client code will do the right startup automatically based on the config. |-> Related to this I've added some comments about doing storage backend module loading which should get actually written out as part of patches coming in #486 (or something related). Don't allow overruns again in history context since it seems it was never a problem?

guilledk · 2023-06-07T16:29:56Z

I'm in favor of doing or own solution and I would rather stop maintaining any marketstore related coded, in the end we were almost gonna spend as much work mantaining marketstore that just doing our own thing right.

goodboy · 2023-06-08T22:05:51Z

I'm in favor of doing or own solution and I would rather stop maintaining any marketstore related coded, in the end we were almost gonna spend as much work mantaining marketstore that just doing our own thing right.

yup totally agree!

ok then i'll be putting up some finishing functionality touches, hopefully tests, and then dropping all that junk 🏄🏼

goodboy · 2023-06-14T18:10:22Z

To give an idea of what the parquet subdir looks like now, much in similarity to how marketstore laid out it's own internal per table binary format files except using less space and actually being a file type data people can use 😂

The plan is to offer multiple tsdb and other storage backends (for a variety of use cases) and expose them similarly to how we do for broker and data providers B)

To kick off our (tsdb) storage backends this adds our first implementing a new `Storage(Protocol)` client interface. Going foward, the top level `.storage` pkg-module will now expose backend agnostic APIs and helpers whilst specific backend implementations will adhere to that middle-ware layer. Deats: - add `.storage.marketstore.Storage` as the first client implementation, moving all needed (import) dependencies out from `.service.marketstore` as well as `.ohlc_key_map` and `get_client()`. - move root `conf.toml` loading from `.data.history` into `.storage.__init__.open_storage_client()` which now takes in a `name: str` and does all the work of loading the correct backend module, its config, and determining if a service-instance can be contacted and a client loaded; in the case where this fails we raise a new `StorageConnectionError`. - add a new `.storage.get_storagemod()` just like we have for brokers. - make `open_storage_client()` also return the backend module such that the history-data layer can make backend specific calls as needed (eg. ohlc_key_map). - fall back to a basic non-tsdb backfill when `open_storage_client()` raises the new connection error.

Turns out you can mix and match `click` with `typer` so this moves what was the `.data.cli` stuff into `storage.cli` and uses the integration api to make it all work B) New subcmd: `piker store` - add `piker store ls` which lists all fqme keyed time-series from backend. - add `store delete` to remove any such key->time-series. - now uses a nursery for multi-timeframe concurrency B) Mask out all the old `marketstore` specific subcmds for now (streaming, ingest, storesh, etc..) in anticipation of moving them into a subpkg-module and make sure to import the sub-cmd module in our top level cli package. Other `.storage` api tweaks: - drop the reraising with custom error (for now). - rename `Storage` -> `StorageClient` (or should it be API?).

Since we want to be able to support user-configurable vnc socketaddrs, this preps for passing the piker client direct into the vnc hacker routine so that we can (eventually load) and read the ib brokers config settings into the client and then read those in the `asyncvnc` task spawner.

Not sure how this lasted so long without complaint (literally since we added history 1m OHLC it seems; guess it means most backends are pretty tolerant XD ) but we've been sending 2 cancels per order (dialog) due to the mirrored lines on each chart: 1s and 1m. This fixes that by reworking the `OrderMode` methods to be a bit more sane and less conflated with the graphics (lines) layer. Deatz: - add new methods: - `.oids_from_lines()` line -> oid extraction, - `.cancel_orders()` which makes the order client cancel requests from a `oids: list[str]`. - re-impl `.cancel_all_orders()` and `.cancel_orders_under_cursor()` to use the above methods thus fixing the original bug B)

This was actually incorrect prior, we were rounding triggered limit orders with the `.size_tick` value's digits when we should have been using the `.price_tick` (facepalm). So fix that and compute the rounding number of digits (as passed to the round(<value>, ndigits=<here>)` builtin) and store it in the `DarkBook.triggers` tuples so that at trigger/match time the round call is done *just prior* to msg send to `brokerd` given the last known live L1 queue price.

Since crypto backends now also may expand an FQME like `xbteur.kraken` -> `xbteur.spot.kraken` (by filling in the venue token), we need to use this identifier when looking up per-market order dialogs or submitting new requests. The simple fix is to simply look up that expanded from from the `Feed.flumes` table which is always keyed by the `MktPair.fqme: str` - the expanded form.

…issue

Since we only ever want to do incremental y-range calcs based on the price always skip any tick types emitted by the data daemon which aren't defined in the fundamental set. Further, toss in a new `debug_n_trade: bool` toggle which by default turns off all loggin and profiler calls; if you want to do profiling this has to now be adjusted manually!

…field casting

Means commenting out the `data.cli.ingest()` as it will be deleted in the up coming #486 anyway.

A common usage error is to run `piker anal mnq.cme.ib` where the CLI passed fqme is not actually fully-qualified (in this case missing an expiry token) and we get an underlying `FileNotFoundError` from the `StorageClient.read_ohlcv()` call. In such key misses, scan the existing `StorageClient._index` for possible matches and report in a `raise from` the new error. CHERRY into #486

TO CHERRY into #486

I guess since i started supporting the whole "allow a gap between the latest tsdb sample and the latest retrieved history frame" the overlap slicing has been completely borked XD where we've been sticking in duplicate history samples and this has caused all sorts of down stream time-series processing issues.. So fix that but ensuring whenever there IS an overlap between history in the latest frame and the tsdb that we always prefer the latest frame's data and slice OUT the tsdb's duplicate indices.. CHERRY TO #486

Since the `diff: int` serves as a predicate anyway (when `0` nothing duplicate was detected) might as well just return it directly since it's likely also useful for the caller when doing deeper anal. Also, handle the zero-diff case by just returning early with a copy of the input frame and a `diff=0`. CHERRY INTO #486

Using a bunch of fancy `numpy` vec ops (and ideally eventually extending the same to `polars`) this is a first draft of `get_null_segs()` a `col: str` field-value-is-zero detector which filters to all zero-valued input frame segments and returns the corresponding useful slice-indexes: - gap absolute (in shm buffer terms) index-endpoints as `absi_zsegs` for slicing to each null-segment in the src frame. - ALL abs indices of rows with zeroed `col` values as `absi_zeros`. - the full set of the input frame's row-entries (view) which are null valued for the chosen `col` as `zero_t`. Use this new null-segment-detector in the `.data.history.start_backfill()` task to attempt to fill null gaps that might be extant from some prior backfill attempt. Since `get_null_segs()` should now deliver a sequence of slices for each gap we don't really need to have the `while gap_indices:` loop any more, so just move that to the end-of-func and warn log (for now) if all gaps aren't eventually filled. TODO: -[ ] do the null-seg detection and filling concurrently from most-recent-frame backfilling. -[ ] offer the same detection in `.storage.cli` cmds for manual tsp anal. -[ ] make the graphics layer actually update correctly when null-segs are filled (currently still broken somehow in the `Viz` caching layer?) CHERRY INTO #486

Thinking about just moving all of that module (after a content breakup) to a new `.piker.tsp` which will mostly depend on the `.data` and `.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and orchestration with real-time (shm) buffers and the graphics layer into a common spot for both manual analysis/research work and better separation of low level data structure primitives from their higher level usage. Add a better `data.history` mod doc string in prep for this move as well as clean out a bunch of legacy commented cruft from the `trimeter` and `marketstore` days. TO CHERRY #486 (if we can)

Can't ref `dt_eps` and `tsdb_entry` if they don't exist.. like for 1s sampling from `binance` (which dne). So make sure to add better logic guard and only open the finaly backload nursery if we actually need to fill the gap between latest history and where tsdb history ends. TO CHERRY #486

goodboy force-pushed the service_subpkg branch from a70960e to 7cc9911 Compare March 9, 2023 20:37

goodboy force-pushed the storage_middleware_layer branch from a0e1464 to 4a13c89 Compare March 9, 2023 20:38

goodboy added tsdb time series db stuff factoring less codez dawg labels Mar 9, 2023

Base automatically changed from service_subpkg to master March 10, 2023 15:37

goodboy force-pushed the storage_middleware_layer branch from 4a13c89 to d3da84e Compare March 10, 2023 15:38

goodboy force-pushed the storage_middleware_layer branch from d3da84e to 465a4c3 Compare May 28, 2023 20:09

goodboy force-pushed the storage_middleware_layer branch from 7ff80fc to 4fdac6a Compare June 7, 2023 04:20

goodboy requested review from guilledk, wygud and jaredgoldman and removed request for guilledk June 7, 2023 16:18

goodboy mentioned this pull request Jun 9, 2023

Binancial secs: futes for the peepz! #520

Open

21 tasks

goodboy added a commit that referenced this pull request Jun 14, 2023

TOCHERRY: into #486, add polars/apache deps for nix

641726b

goodboy added a commit that referenced this pull request Jun 20, 2023

TOCHERRY: into #486, add polars/apache deps for nix

c174e5e

goodboy force-pushed the storage_middleware_layer branch from 0a07530 to bad71fa Compare June 20, 2023 16:41

goodboy mentioned this pull request Jun 21, 2023

the boss, conf.toml #221

Open

5 tasks

goodboy added 7 commits June 27, 2023 13:41

Start piker.storage subsys: cross-(ts)db middlewares

29211b2

The plan is to offer multiple tsdb and other storage backends (for a variety of use cases) and expose them similarly to how we do for broker and data providers B)

Move .data.cli to .storage.cli

1ec9b05

Lower the paper engine order-cancel latency

33c4645

Relegate old marketstore cli eps to masked module

e83de29

kucoin: drop shm-array numpy dtype def, our default is the same

8ccb8b0

goodboy added 15 commits June 27, 2023 13:41

Refine accounting related config loading routine doc strings

bc58e42

clearing._messages: don't require .symbol in brokerd side error msgs

84613cd

ib: pull vnc sockaddrs from brokers.toml config if defined

a149e71

view_mode: drop rounding dispersions and "debug print"

736bbbf

Always round order pane $limit to 3 digits

ad31631

order_mode: add comment around Order being a dict bug

6c10c2f

ib: map some tick types particulary "volumeRate" to avoid auto-range …

c57d4b2

…issue

Define the flattened "fundamental double auction" emitted tick type set

d42aa60

binance: pre-#520 fixes for open_cached_client() import and struct-…

a44bc4a

…field casting

.brokers._daemon: add notes around needed brokerd respawn tech

3535986

goodboy force-pushed the storage_middleware_layer branch from 1e89f98 to 3535986 Compare June 27, 2023 18:03

.nativedb: ignore an expired/ subdir

27ec461

goodboy mentioned this pull request Jul 13, 2023

User dir layout and naming #529

Open

goodboy added a commit that referenced this pull request Aug 30, 2023

Drop marketstore mod import from CLIs loader

19a6f5c

Means commenting out the `data.cli.ingest()` as it will be deleted in the up coming #486 anyway.

goodboy mentioned this pull request Sep 22, 2023

Ib py311 fixes #535

Draft

5 tasks

goodboy mentioned this pull request Oct 31, 2023

Quote ingest using apache stack: arrow / parquet #536

Open

4 tasks

goodboy removed request for wygud and jaredgoldman December 4, 2023 21:05

guilledk pushed a commit that referenced this pull request Jan 18, 2024

Factor TimeseriesNotFound to top level

b9af617

TO CHERRY into #486

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start `piker.storage` subsys: cross-(ts)db middlewares #486

Start `piker.storage` subsys: cross-(ts)db middlewares #486

goodboy commented Mar 9, 2023 •

edited

Loading

goodboy commented Mar 9, 2023

guilledk commented Jun 7, 2023

goodboy commented Jun 8, 2023

goodboy commented Jun 14, 2023

Start piker.storage subsys: cross-(ts)db middlewares #486

Are you sure you want to change the base?

Start piker.storage subsys: cross-(ts)db middlewares #486

Conversation

goodboy commented Mar 9, 2023 • edited Loading

Core dev discusssion

ToDo:

goodboy commented Mar 9, 2023

guilledk commented Jun 7, 2023

goodboy commented Jun 8, 2023

goodboy commented Jun 14, 2023

Start `piker.storage` subsys: cross-(ts)db middlewares #486

Start `piker.storage` subsys: cross-(ts)db middlewares #486

goodboy commented Mar 9, 2023 •

edited

Loading