async but refactored #1752

technillogue · 2024-06-19T09:02:20Z

this is like #1512, but I squashed and reordered some commits and then rebased onto main

conceptually we could split it up like:

the commits up to Mux prediction events (async runner, async predict, minimal async worker). this is uncertain because much but not all of it is fully dropped later
httpx
AsyncConnection, optimizations, tweak names
omnibus actual concurrency
remaining async features (emit_metric, time_share_metric, log traceback, batch_size metric)
however, I don't think we can actually release anything before "actual concurrency", the intermediate commits would likely be broken
we could alternatively write new code starting with the httpx refactor like you said, or otherwise carve out changes from that branch as new PRs (rather than rebasing the existing commits)

run CI for this branch the same way as for main
async runner (async runner #1352)
support async predict functions (support async predict functions #1350)
AsyncConcatenateIterator
create event loop before predictor setup (create event loop before predictor setup #1366)
minimal async worker (async worker event pipe #1410)
Mux prediction events (Mux prediction events #1405)
replace requests with httpx and factor out clients (replace requests with httpx and factor out clients #1574, Fix upload logging #1707, fix upload redirect handling #1714)
implement mp.Connection with async streams (implement mp.Connection with async streams #1640)
optimize webhook serialization and logging (optimize webhook serialization and logging #1651)
tweak names and style
omnibus actual concurrency and major refactor (omnibus actual concurrency and major refactor #1530)
function to emit metrics (function to emit metrics #1649)
predict_time_share metric (predict_time_share metric #1643, predict_time_share needs to be set before sending the completed webhook #1683)
Backport Secret type to async branch (Backport Secret type to async branch #1706)
log traceback properly (log traceback properly #1734)

Signed-off-by: technillogue <technillogue@gmail.com>

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9) Signed-off-by: technillogue <technillogue@gmail.com>

this is the first step towards supporting continuous batching and concurrent predictions. eventually, you will be configure it so your predict function will be called concurrently * bare minimum to support async predict * add async tests Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

* conditionally create the event loop if predictor is async, and add a path for hypothetical async setup * don't use async for predict loop if predict is not async * add test cases for shared loop across setup and predict + asyncio.run in setup (reverts commit b533c6b) * lints Signed-off-by: technillogue <technillogue@gmail.com>

* async Worker._wait and its consequences * AsyncPipe so that we can process idempotent endpoint and cancellation rather than _wait blocking the event loop * test_prediction_cancel can be flaky on some machines * separate _process_list to be less surprising than isasyncgen * sleep wasn't needed * suggestions from review * suggestions from review * even more suggestions from review --------- Signed-off-by: technillogue <technillogue@gmail.com> Co-authored-by: Nick Stenning <nick@whiteink.com> * format Signed-off-by: technillogue <technillogue@gmail.com>

* race utility for racing awaitables * start mux, tag events with id, read pipe in a task, get events from mux * use async pipe for async child loop * _shutting_down vs _terminating * race with shutdown event * keep reading events during shutdown, but call terminate after the last Done * emit heartbeats from mux.read * don't use _wait. instead, setup reads event from the mux too * worker semaphore and prediction ctx * where _wait used to raise a fatal error, have _read_events set an error on Mux, and then Mux.read can raise the error in the right context. otherwise, the exception is stuck in a task and doesn't propagate correctly * fix event loop errors for <3.9 * keep track of predictions in flight explicitly and use that to route logs * don't wait for executor shutdown * progress: check for cancelation in task done_handler * let mux check if child is alive and set mux shutdown after leaving read event loop * close pipe when exiting * predict requires IDLE or PROCESSING * try adding a BUSY state distinct from PROCESSING when we no longer have capacity * move resetting events to setup() instead of _read_events() previously this was in _read_events because it's a coroutine that will have the correct event loop. however, _read_events actually gets created in a task, which can run *after* the first mux.read call by setup. since setup is now the first async entrypoint in worker and in tests, we can safely move it there * state_from_predictions_in_flight instead of checking the value of semaphore * make prediction_ctx "private" Signed-off-by: technillogue <technillogue@gmail.com>

* input downloads, output uploads, and webhooks are now handled by ClientManager, which persists for the lifetime of runner, allowing us to reuse connections, which may significantly help with large uploads. * although I was originally going to drop output_file_prefix, it's not actually hard to maintain. the behavior is changed now and objects are uploaded as soon as they're outputted rather than after the prediction is completed. * there's an ugly hack with uploading an empty body to get the redirect instead of making api time out from trying to upload an 140GB file. that can be fixed by implemented an MPU endpoint and/or a "fetch upload url" endpoint. * the behavior of the non-indempotent endpoint is changed; the id is now randomly generated if it's not provided in the body. this isn't strictly required for this change alone, but is hard to carve out. * the behavior of Path is changed significantly. see https://www.notion.so/replicate/Cog-Setup-Path-Problem-2fc41d40bcaf47579ccd8b2f4c71ee24 Co-authored-by: Mattt <mattt@replicate.com> * format * stick a %s on line 190 clients.py (#1707) * local upload server can be called cluster.local in addition to .internal (#1714) Signed-off-by: technillogue <technillogue@gmail.com>

* wip * some tweaks * ignore some type errors * test connection roundtrip * add notes from python source code Signed-off-by: technillogue <technillogue@gmail.com>

* optimize webhook serialization and logging * optimize logging by binding structlog proxies * fix tests --------- Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

* add concurrency to config * more descriptive names for predict functions * don't cancel from signal handler if a loop is running. expose worker busy state to runner * move handle_event_stream to PredictionEventHandler * make setup and canceling work * keep track of multiple runner prediction tasks to make idempotent endpoint return the same result and fix tests somewhat * drop Runner._result, comments * move create_event_handler into PredictionEventHandler.__init__ * break out Path.validate into value_to_path and inline get_filename and File.validate * split out URLPath into BackwardsCompatibleDataURLTempFilePath and URLThatCanBeConvertedToPath with the download part of URLFile inlined * let's make DataURLTempFilePath also use convert and move value_to_path back to Path.validate * prediction->request * split up predict/inner/prediction_ctx into enter_predict/exit_predict/prediction_ctx/inner_async_predict/predict/good_predict as one way to do it. however, exposing all of those for runner predict enter/coro exit still sucks, but this is still an improvement * bigish change: inline predict_and_handle_errors * inline make_error_handler into setup * move runner.setup into runner.Runner.setup * add concurrency to config in go * try explicitly using prediction_ctx __enter__ and __exit__ * relax setup argument requirement to str * glom worker into runner * add logging message * fix prediction retry and improve logging * split out handle_event * use CURL_CA_BUNDLE for file upload * dubious upload fix * skip worker and webhook tests since those were erroring on removed imports. fix or xfail runner tests * validate prediction response to raise errors, but return the unvalidated output to avoid converting urls to File/Path * expose concurrency in healthcheck * mediocre logging that works like print * COG_DISABLE_CANCEL to ignore cancelations * COG_CONCURRENCY_OVERRIDE * add ready probe as an http route * encode webhooks only after knowing they will be sent, and bail our of upload type checks early for strs * don't validate outputs * add AsyncConcatenateIterator * should_exit is not actually used by http * format * codecov * describe the remaining problems with this PR and add comments about cancelation and validation * add a test * fix test (#1669) * fix config schema * allow setting both max and target concurrency in cog.yaml (#1672) * drop default_target (#1685) --------- Signed-off-by: technillogue <technillogue@gmail.com> Co-authored-by: Mattt <mattt@replicate.com>

* function to emit metrics * add metrics docs --------- Signed-off-by: technillogue <technillogue@gmail.com>

predict_time_share tracks the portion of the worker's processing time that was dedicated to each individual prediction. the "cost" of each second is split across the predictions running during that second. Co-authored-by: Zeke Sikelianos <zeke@sikelianos.com> predict_time_share needs to be set before sending the completed webhook (#1683) allow disabling time share metric with COG_DISABLE_TIME_SHARE_METRIC Signed-off-by: technillogue <technillogue@gmail.com>

* log traceback correctly * use repr(exception) instead of str(exception) if str(exception) is blank --------- Signed-off-by: technillogue <technillogue@gmail.com>

Signed-off-by: technillogue <technillogue@gmail.com>

ignore ruff lints added in the new ruff version Fix broken `make go-test` command This was due to conflicts in the dependencies === Errors Error: ../../../go/pkg/mod/github.com/anaskhan96/soup@v1.2.5/soup.go:20:2: missing go.sum entry for module providing package golang.org/x/net/html (imported by github.com/anaskhan96/soup); to add: go get github.com/anaskhan96/soup@v1.2.5 Error: ../../../go/pkg/mod/github.com/anaskhan96/soup@v1.2.5/soup.go:21:2: missing go.sum entry for module providing package golang.org/x/net/html/charset (imported by github.com/anaskhan96/soup); to add: go get github.com/anaskhan96/soup@v1.2.5 Running `go mod tidy` fixes the issue and this commit contains the updated go.mod and go.sum files. Add fixes for ruff issues (#1799)

We have a problem in production where a broken model is not correctly shutting down when requested, which means that director comes back up, sees a healthy model (status READY/BUSY) and starts sending it new predictions, even though it's supposed to be shutting down. For now, try and improve the situation by poisoning the model healthcheck on shutdown. This doesn't solve the underlying problem but it should stop us losing more predictions to a known-broken pod.

Based on the implementation in #1698 for sync cog. If the request to /predict contains headers `traceparent` and `tracestate` defined by w3c Trace Context[^1] then these headers are forwarded on to the webhook and upload calls. This allows observability systems to link requests passing through cog. [^1]: https://www.w3.org/TR/trace-context/ Signed-off-by: technillogue <technillogue@gmail.com>

* Cast TraceContext into Mapping[str, str] to fix linter * Include prediction id upload request Based on #1667 This PR introduces two small changes to the file upload interface. 1. We now allow downstream services to include the destination of the asset in a `Location` header, rather than assuming that it's the same as the final upload url (either the one passed via `--upload-url` or the result of a 307 redirect response. 2. We now include the `X-Prediction-Id` header in upload request, this allows the downstream client to potentially do configuration/routing based on the prediction ID. This ID should be considered unsafe and needs to be validated by the downstream service. * Extract ChunkFileReader into top-level class --------- Co-authored-by: Mattt Zmuda <mattt@replicate.com>

mattt · 2024-07-31T12:52:11Z

Closing in favor of #1813

technillogue force-pushed the syl/squash-rebase-async branch 5 times, most recently from f7c789b to cf0665b Compare June 20, 2024 20:19

technillogue requested a review from mattt July 1, 2024 05:48

technillogue force-pushed the syl/squash-rebase-async branch from ba7808d to d3065dc Compare July 3, 2024 19:59

technillogue added 16 commits July 18, 2024 14:41

run CI for this branch the same way as for main

3277023

Signed-off-by: technillogue <technillogue@gmail.com>

async runner (#1352)

a76a012

* have runner return asyncio.Task instead of AsyncFuture * make tests async and fix them * delete remaining runner thread code :) * review changes to tests and server (reverts commit 828eee9) Signed-off-by: technillogue <technillogue@gmail.com>

AsyncConcatenateIterator

7e63a48

Signed-off-by: technillogue <technillogue@gmail.com>

implement mp.Connection with async streams (#1640)

0b0e56f

* wip * some tweaks * ignore some type errors * test connection roundtrip * add notes from python source code Signed-off-by: technillogue <technillogue@gmail.com>

optimize webhook serialization and logging (#1651)

57239ed

* optimize webhook serialization and logging * optimize logging by binding structlog proxies * fix tests --------- Signed-off-by: technillogue <technillogue@gmail.com>

tweak names and style

2d25fe0

Signed-off-by: technillogue <technillogue@gmail.com>

function to emit metrics (#1649)

610e920

* function to emit metrics * add metrics docs --------- Signed-off-by: technillogue <technillogue@gmail.com>

log traceback properly (#1734)

5c8bcf6

* log traceback correctly * use repr(exception) instead of str(exception) if str(exception) is blank --------- Signed-off-by: technillogue <technillogue@gmail.com>

add batch size metric (#1750)

ce48cd3

Signed-off-by: technillogue <technillogue@gmail.com>

technillogue force-pushed the syl/squash-rebase-async branch from d3065dc to 4502b80 Compare July 18, 2024 19:05

technillogue and others added 4 commits July 18, 2024 15:05

technillogue force-pushed the syl/squash-rebase-async branch from 4502b80 to 2d37fa4 Compare July 18, 2024 19:06

mattt mentioned this pull request Jul 18, 2024

Add support for async predictors #1813

Open

mattt closed this Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async but refactored #1752

async but refactored #1752

technillogue commented Jun 19, 2024 •

edited

Loading

mattt commented Jul 31, 2024

async but refactored #1752

async but refactored #1752

Conversation

technillogue commented Jun 19, 2024 • edited Loading

mattt commented Jul 31, 2024

technillogue commented Jun 19, 2024 •

edited

Loading