`.service` subpkg #483

goodboy · 2023-03-08T20:39:09Z

Move all actor-service control APIs and the service manager (basically all the stuff that was in piker._daemon) into a new piker.service sub package; this will serve (get it..) as the high-level actor-service orchestration sub-system, the business logic for piker's distributed architecture and runtime deployment.

Of note,

includes improvements to .service._ahab which repair a regression: a silent startup failure with marketstore due to the log processing poll sleep period change..
- adds proper async log msg proxying without blocking the ahabd supervisor task
- proxies through pikerd logging correctly to the docker super code in general..
moved the _ahab.py docker supervisor into the sub-package as well since it is literally an external, containerized service supervisor 😂
moved both the .data.elastix and .marketstore mods here as well until we figure out how to formalize our storage layer subsystems, and since most of that code is related to db interaction/mangement vs. data processing.

ToDo:

~~extend the tests for both dbs to have a client connect and do something or other..~~
~~add cancellation tests for the dockerized services~~
=> moved these to a follow up testing task ahabd supervision tests #487
consider breaking out the daemon endpoints to a ._daemon.py module in each sub-system package?
- at the least move the group into a .service._daemon
- ~~eg. pikerd.data._daemon contains the routines for spawning the brokerd actor?~~ => moved to How to expose/organize <daemon>d entryoints? #488
move Services api into it's own module?
move the startup machinery into _runtime.py? (** chose .service._actor_runtime.py instead **)
~~move the ._exec.py stuff (which is more or less just Qt runtime startup) into this pkg as well?~~ => also moved to How to expose/organize <daemon>d entryoints? #488
make test harness use the tmp_dir fixure for our config dir path during testing
- required augmenting the runtime init to passthrough data to tractor._state._runtime_vars

jaredgoldman · 2023-03-09T00:58:04Z

piker/data/cli.py

+                if delete:
+                    for fqsn in symbols:
+                        syms = await storage.client.list_symbols()
+                        breakpoint()


left in a breakpoint unintentionally?

ish, was WIP, fixed now (ish) via 6a0ae58

jaredgoldman · 2023-03-09T01:09:10Z

piker/service/_ahab.py

@@ -322,37 +354,94 @@ async def open_ahabd(
        ) = ep_func(client)
        cntr = Container(dcntr)

-        with trio.move_on_after(start_timeout):
-            found = await cntr.process_logs_until(start_lambda)
+        conf: ChainMap[str, Any] = ChainMap(


Just curious, why use a ChainMap here instead of a dict?

Because it's easier then writing a lot of dict.get(blah, default) stuff and makes everything extensible for if we ever need to offer overrides of container configs for specific use cases..

if you haven't read it yet, check out the docs on this type btw, this is kinda why it was added to stdlib afaiu 🏄🏼

jaredgoldman · 2023-03-09T01:19:41Z

piker/service/marketstore.py

@@ -135,7 +135,7 @@ def start_marketstore(

    # create dirs when dne
    if not os.path.isdir(config._config_dir):
-       Path(config._config_dir).mkdir(parents=True, exist_ok=True)
+        Path(config._config_dir).mkdir(parents=True, exist_ok=True)


jaredgoldman · 2023-03-09T01:20:57Z

piker/service/marketstore.py

@@ -659,7 +662,7 @@ async def tsdb_history_update(
    #     - https://github.com/pikers/piker/issues/98
    #
    profiler = Profiler(
-        disabled=False,  # not pg_profile_enabled(),
+        disabled=True,  # not pg_profile_enabled(),


is default true on the Profiler purposeful?

jaredgoldman

lgtm once ci is passing bruddr

goodboy · 2023-03-09T01:23:29Z

🙏🏼 to da CI gawdz

goodboy · 2023-03-09T02:04:02Z

nothing like hanging for 40 mins

Adds a `piker storage` subcmd with a `-d` flag to wipe a particular fqsn's time series (both 1s and 60s). Obviously this needs to be extended much more but provides a start point.

With the addition of a new `elastixsearch` docker support in #464, adjustments were made to container startup sync logic (particularly the `trio` checkpoint sleep period - which itself is a hack around a sync client api) which caused a regression in upstream startup logic wherein container error logs were not being bubbled up correctly causing a silent failure mode: - `marketstore` container started with corrupt input config - `ahabd` super code timed out on startup phase due to a larger log polling period, skipped processing startup logs from the container, and continued on as though the container was started - history client fails on grpc connection with no clear error on why the connection failed. Here we revert to the old poll period (1ms) to avoid any more silent failures and further extend supervisor control through a configuration override mechanism. To address the underlying design issue, this patch adds support for container-endpoint-callbacks to override supervisor startup configuration parameters via the 2nd value in their returned tuple: the already delivered configuration `dict` value. The current exposed values include: { 'startup_timeout': 1.0, 'startup_query_period': 0.001, 'log_msg_key': 'msg', }, This allows for container specific control over the startup-sync query period (the hack mentioned above) as well as the expected log msg key and of course the startup timeout.

Previously we would make the `ahabd` supervisor-actor sync to docker container startup using pseudo-blocking log message processing. This has issues, - we're forced to do a hacky "yield back to `trio`" in order to be "fake async" when reading the log stream and further, - blocking on a message is fragile and often slow. Instead, run the log processor in a background task and in the parent task poll for the container to be in the client list using a similar pseudo-async poll pattern. This allows the super to `Context.started()` sooner (when the container is actually registered as "up") and thus unblock its (remote) caller faster whilst still doing full log msg proxying! Deatz: - adds `Container.cuid: str` a unique container id for logging. - correctly proxy through the `loglevel: str` from `pikerd` caller task. - shield around `Container.cancel()` in the teardown block and use cancel level logging in that method.

For now just moves everything that was in `piker._daemon` to a subpkg module but a reorg is coming pronto!

Not really sure there's much we can do besides dump Grpc stuff when we detect an "error" `str` for the moment.. Either way leave a buncha complaints (como siempre) and do linting fixups..

@esme

Thanks @esme! XD Also, do a linter pass and remove a buncha unused references.

Due to making ahabd supervisor init more async we need to be more tolerant to mkts server startup: the grpc machinery needs to be up otherwise a client which connects to early may just hang on requests.. Add a reconnect loop (which might end up getting factored into client code too) so that we only block on requests once we know the client connection is actually responsive.

Provides a more correct solution (particularly for distributed testing) to override the `piker` configuration directory by reading the path from a specific `tractor._state._runtime_vars` entry that can be provided by the test harness. Also fix some typing and comments.

Needed to move the startup sequence inside the `try:` block to guarantee we always do the (now shielded) `.cancel()` call if we get a cancel during startup. Also, support an optional `started_afunc` field in the config if backends want to just provide a one-off blocking async func to sync container startup. Add a `drop_root_perms: bool` to allow persisting sudo perms for testing or dyanmic container spawning purposes.

goodboy · 2023-03-10T00:17:05Z

THERE. moved all oustanding bullets to new follow up issues 🏄🏼

So whoever decides to review this once more and decides we gud, please merge.
If not i'll land later tonight 💥

goodboy · 2023-03-10T08:16:32Z

Heh, i can tell y'all aren't testing non-disti mode.. 😂

This broke non-disti-mode actor tree spawn / runtime, seemingly because the cli entrypoint for a `piker chart` also sends these values down through the call stack independently? Pretty sure we don't need to send the `enable_modules` from the chart actor anyway.

goodboy added integration external stack and/or lib augmentations (sub-)systems general sw design and eng tsdb time series db stuff labels Mar 8, 2023

goodboy requested review from guilledk, jaredgoldman and esmegl March 8, 2023 20:39

jaredgoldman reviewed Mar 9, 2023

View reviewed changes

jaredgoldman approved these changes Mar 9, 2023

View reviewed changes

goodboy force-pushed the service_subpkg branch from c76f8ab to a70960e Compare March 9, 2023 20:00

goodboy added 16 commits March 9, 2023 15:37

First draft storage layer cli

fe0695f

Adds a `piker storage` subcmd with a `-d` flag to wipe a particular fqsn's time series (both 1s and 60s). Obviously this needs to be extended much more but provides a start point.

Add warning around detach flag to docker client

959e423

Deliver es specific ahab-super in endpoint startup config

8c66f06

Apply Services runtime state **immediately** inside starup block

05b67c2

Doc string and types bump in loggin mod

b078a06

Always passthrough loglevel to ahabd supervisor

bb723ab

Hardcode cancel log level for ahabd for now

56629b6

Set explicit marketstore container startup timeout

bfe3ea1

Start piker.service sub-package

93c81fa

For now just moves everything that was in `piker._daemon` to a subpkg module but a reorg is coming pronto!

Move all docker and external db code to piker.service

afac553

Bump mkts timeout to 2s

dd87d11

Fix missed marketstore mod imports

b226b67

Move Services api to .service._mngr mod

31f2b01

Move actor-discovery utils to `.service._registry

a2d4093

goodboy added 15 commits March 9, 2023 15:37

Move daemon spawning endpoints to service._daemon module

eca048c

Move pikerd runtime boostrap to .service._actor_runtime

f95ea19

Import maybe_open_pikerd at module level

cec2967

Attempt to report piker storage -d <fqsn> errors

441243f

Not really sure there's much we can do besides dump Grpc stuff when we detect an "error" `str` for the moment.. Either way leave a buncha complaints (como siempre) and do linting fixups..

Don't crash on a xdotool timeout..

6f92c6b

Fix final missed marketstore mod import

cda7a54

Thanks @esme! XD Also, do a linter pass and remove a buncha unused references.

Add 10min timeout on CI job..

fbc12b1

Lul, fix imports in elasticsearch block..

6540c41

Move enabled module defs to just above where used

31392af

marketstore: Pull default socket from server config

75b7a8b

Support passing tractor "actor runtime vars" down the runtime

aa36abf

Pass a config tmp_dir: Path to the runtime when testing

79b0db4

Add connection poll loop to es test as well

7cc9911

goodboy force-pushed the service_subpkg branch from a70960e to 7cc9911 Compare March 9, 2023 20:37

goodboy mentioned this pull request Mar 9, 2023

Start piker.storage subsys: cross-(ts)db middlewares #486

Open

34 tasks

goodboy added 6 commits March 9, 2023 18:42

Add log fixture for easy test plugin

9a00c45

Hard code version from our container, predicate renaming

0772b4a

Expose drop_root_perms_for_ahab from pikerd factories to ahabd

44a3115

Never drop root perms in test harness

97290fc

Add ES client polling to ensure eventual connectivity..

8ceaa27

This was referenced Mar 10, 2023

ahabd supervision tests #487

Open

How to expose/organize <daemon>d entryoints? #488

Open

goodboy merged commit eb51033 into master Mar 10, 2023

goodboy deleted the service_subpkg branch March 10, 2023 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.service` subpkg #483

`.service` subpkg #483

goodboy commented Mar 8, 2023 •

edited

Loading

jaredgoldman Mar 9, 2023

goodboy Mar 9, 2023

jaredgoldman Mar 9, 2023 •

edited

Loading

goodboy Mar 9, 2023 •

edited

Loading

jaredgoldman Mar 9, 2023

jaredgoldman Mar 9, 2023

jaredgoldman left a comment

goodboy commented Mar 9, 2023

goodboy commented Mar 9, 2023

goodboy commented Mar 10, 2023

goodboy commented Mar 10, 2023

.service subpkg #483

.service subpkg #483

Conversation

goodboy commented Mar 8, 2023 • edited Loading

Of note,

ToDo:

jaredgoldman Mar 9, 2023

Choose a reason for hiding this comment

goodboy Mar 9, 2023

Choose a reason for hiding this comment

jaredgoldman Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

goodboy Mar 9, 2023 • edited Loading

Choose a reason for hiding this comment

jaredgoldman Mar 9, 2023

Choose a reason for hiding this comment

jaredgoldman Mar 9, 2023

Choose a reason for hiding this comment

jaredgoldman left a comment

Choose a reason for hiding this comment

goodboy commented Mar 9, 2023

goodboy commented Mar 9, 2023

goodboy commented Mar 10, 2023

goodboy commented Mar 10, 2023

`.service` subpkg #483

`.service` subpkg #483

goodboy commented Mar 8, 2023 •

edited

Loading

jaredgoldman Mar 9, 2023 •

edited

Loading

goodboy Mar 9, 2023 •

edited

Loading