Async code and cpu-bound code inside a model-server #2578

elukey · 2022-12-02T09:05:21Z

elukey
Dec 2, 2022

Hi folks,

not sure if this has already been discussed elsewhere, I didn't find anything in my research except the parallel-inference doc. At Wikimedia we are trying to port models and their respective libraries (built years ago without the async concept in mind) to KServe, and so far we have encountered a lot of scalability bottlenecks (and related gotchas). In our case we try to use predictors and transformers as much as possible, together with Python async libraries like aiohttp, aiokafka, etc.. The usual architecture that we follow is a single model for each predictor instance, and where possible (or where it makes sense) preprocess offloaded to a transformer (so a single model for each isvc resource declared).

We hit some bottlenecks when cpu-bound code needed to be executed:

Some of the libraries that I mentioned before do cpu-bound computations, mostly to calculate features etc.. (we don't really have them preprocessed in a feature store yet).
Most of the models that we use do cpu-bound computations (to various degrees of course, some are faster and some are slower).

We are aware that due to the GIL there are some intrinsic Python limitations in running parallel code, so we tried the following:

Use of Ray workers as suggested in the parallel-inference doc. We ended up skipping the option since Ray workers are powerful but they seem to create a lot of extra configs that we don't really need (grpc/http/etc.. handlers, etc..).
Creation of a separate Process Pool of workers to use via run_in_executor. Easier to use compared to Ray workers (in our limited experience!) but there are gotchas related to serialization/deserialization (everything needs to be pickle-able as far as we got).

We would really be interested to know what is the experience of other teams and companies, since this seems to be a common problem to solve when using KServe.

Thanks in advance!

elukey · 2022-12-08T10:30:21Z

elukey
Dec 8, 2022
Author

Any comments/suggestions? :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async code and cpu-bound code inside a model-server #2578

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Async code and cpu-bound code inside a model-server #2578

elukey Dec 2, 2022

Replies: 1 comment

elukey Dec 8, 2022 Author

elukey
Dec 2, 2022

elukey
Dec 8, 2022
Author