-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: Implements InferenceModule
as a pipelined module with separate preprocessor, predictor, and postprocessor modules
#2105
Conversation
…ost-processing modules
302d146
to
6ffd137
Compare
for more information, see https://pre-commit.ci
…/ludwig into enh-inference-pipeline
This is looking good, is there a plan to depreciate |
for more information, see https://pre-commit.ci
…/ludwig into enh-inference-pipeline
for more information, see https://pre-commit.ci
…/ludwig into enh-inference-pipeline
InferenceModule
as a pipelined module with separate preprocessor, predictor, and postprocessor modules
Claim: The performance of the pipeline and single_module implementations of the inference module are similar enough in a standard environment (no parallelism, no Triton) to merit a switch over to the pipeline implementation in Ludwig. Implications: Switching to the pipeline implementation would provide us a myriad of benefits, including but not limited to (1) mixed backend deployment (libtorch primarily, python if needed) and (2) the ability to tune resource allocation/scheduling per inference stage (preprocessing vs. prediction vs. postprocessing). Item (2) is particularly important in cases where preprocessing is costly, as it is in the text domain (see AGNEWS results). Methodology: We first minimally train some LudwigModel, then save out 4 torchscript artifacts: (1) an end-to-end torchscript module, (2,3,4) preprocessor, predictor and postprocessor torchscript modules. The first module is loaded back in as a ludwig_model: A vanilla LudwigModel. We make predictions using the Performance on the TITANIC Dataset
Performance on the AGNEWS dataset
|
In order to be able to create an ensemble or torchscript models, there is a benefit to having each of the steps in the inference pipeline their own module, such that these could be ensembled together to form an inference graph in triton.
I have re-written the
InferenceModule
to be anInferencePipelineModule
with the following stepsI have compared the performance of this pipeline over the original all in one, and identifier that it has very similar performance whilst being more composable.
75.9 µs ± 266 ns per loop
- InferenceModule78.3 µs ± 255 ns per loop
- InferencePipelineModule