Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

Merged
merged 81 commits into from
Jun 27, 2022

Conversation

brightsparc
Copy link
Contributor

@brightsparc brightsparc commented Jun 7, 2022

In order to be able to create an ensemble or torchscript models, there is a benefit to having each of the steps in the inference pipeline their own module, such that these could be ensembled together to form an inference graph in triton.

I have re-written the InferenceModule to be an InferencePipelineModule with the following steps

class InferencePipelineModule(nn.Module):
    """Wraps preprocessing, model forward pass, and postprocessing modules into a single module.

    The purpose of the module is to be scripted into Torchscript for native serving. The nn.ModuleDict attributes of
    this module use keys generated by feature_utils.get_module_dict_key_from_name in order to prevent name collisions
    with keywords reserved by TorchScript.
    """

    def __init__(self, model: "ECD", config: Dict[str, Any], training_set_metadata: Dict[str, Any]):
        super().__init__()

        self.preprocess_module = torch.jit.script(PreprocessModule(config, training_set_metadata))
        self.predict_module = torch.jit.script(PredictModule(model))
        self.postprocess_module = torch.jit.script(PostprocessModule(config, training_set_metadata))

    def forward(self, inputs: Dict[str, Union[List[str], List[torch.Tensor], torch.Tensor]]):
        with torch.no_grad():
            preproc_outputs = self.preprocess_module(inputs)
            predictions = self.predict_module(preproc_outputs)
            postproc_outputs = self.postprocess_module(predictions)
            return postproc_outputs

I have compared the performance of this pipeline over the original all in one, and identifier that it has very similar performance whilst being more composable.

  • 75.9 µs ± 266 ns per loop - InferenceModule
  • 78.3 µs ± 255 ns per loop - InferencePipelineModule

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

Unit Test Results

       6 files  ±  0         6 suites  ±0   2h 14m 40s ⏱️ + 8m 37s
2 880 tests +12  2 834 ✔️ ±0    46 💤 +12  0 ±0 
8 640 runs  +36  8 498 ✔️ ±0  142 💤 +36  0 ±0 

Results for commit b493597. ± Comparison against base commit a587181.

♻️ This comment has been updated with latest results.

@brightsparc
Copy link
Contributor Author

This is looking good, is there a plan to depreciate _InferenceModuleV0 as part of this release?

ludwig/models/inference.py Outdated Show resolved Hide resolved
@geoffreyangus geoffreyangus marked this pull request as ready for review June 27, 2022 16:55
@geoffreyangus geoffreyangus changed the title enh: Adding inference pipeline with seperate pre, post and predict into seperate modules. enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules Jun 27, 2022
@geoffreyangus
Copy link
Collaborator

Claim: The performance of the pipeline and single_module implementations of the inference module are similar enough in a standard environment (no parallelism, no Triton) to merit a switch over to the pipeline implementation in Ludwig.

Implications: Switching to the pipeline implementation would provide us a myriad of benefits, including but not limited to (1) mixed backend deployment (libtorch primarily, python if needed) and (2) the ability to tune resource allocation/scheduling per inference stage (preprocessing vs. prediction vs. postprocessing). Item (2) is particularly important in cases where preprocessing is costly, as it is in the text domain (see AGNEWS results).

Methodology: We first minimally train some LudwigModel, then save out 4 torchscript artifacts: (1) an end-to-end torchscript module, (2,3,4) preprocessor, predictor and postprocessor torchscript modules. The first module is loaded back in as a single_module model, the others are loaded as a pipeline model. The implementation for each of these modules can be found in ludwig/models/inference.py in this PR. We warm up each of the torchscript models by feeding in 100 warmup batches, each of size random.choice([1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]). Supplementary experiments demonstrated that the warmup batch size had a non-trivial effect on model throughput, hence the wide range of warmup batch sizes. Finally, the test batches were pre-loaded and formatted as needed (pd.DataFrame or Dict[...]). Finally we begin a timer and pass the batches through the model. The complete implementation of the evaluation script can be found in this GitHub Gist.

ludwig_model: A vanilla LudwigModel. We make predictions using the predict method.
single_module: A torchscripted module that preprocesses, predicts, and postprocesses. We make predictions using the forward method.
pipeline: A python module with torchscripted modules for each of preprocessing, prediction, and postprocessing steps (3 total). We make predictions using a forward method that passes inputs/outputs between each module.

Performance on the TITANIC Dataset

device batch_size name AVERAGE of duration (secs)
cpu 1 pipeline 0.000463
single_module 0.0002786
2 pipeline 0.0004705
single_module 0.0003006
4 pipeline 0.00048
single_module 0.0003271
8 pipeline 0.0004812
single_module 0.00035
16 pipeline 0.0005716
single_module 0.0004048
32 pipeline 0.0006246
single_module 0.0005004
64 pipeline 0.0008573
single_module 0.0006887
128 pipeline 0.0011964
single_module 0.0010467
256 pipeline 0.0018457
single_module 0.0016829
512 pipeline 0.0031429
single_module 0.0029487
1024 pipeline 0.0055807
single_module 0.0054132
2048 pipeline 0.010504
single_module 0.0103147
cuda 1 pipeline 0.0006884
single_module 0.0004686
2 pipeline 0.0007017
single_module 0.0004942
4 pipeline 0.0007413
single_module 0.0005514
8 pipeline 0.0007642
single_module 0.0006064
16 pipeline 0.0009068
single_module 0.0007139
32 pipeline 0.0010783
single_module 0.0009261
64 pipeline 0.0015346
single_module 0.0013382
128 pipeline 0.0023475
single_module 0.0021445
256 pipeline 0.0039793
single_module 0.003786
512 pipeline 0.0072207
single_module 0.0070117
1024 pipeline 0.0137522
single_module 0.0134748
2048 pipeline 0.0266859
single_module 0.0264546

Performance on the AGNEWS dataset

device batch_size name AVERAGE of duration (secs)
cpu 1 pipeline 0.002553
single_module 0.002517
2 pipeline 0.003781
single_module 0.003653
4 pipeline 0.005564
single_module 0.005872
8 pipeline 0.010067
single_module 0.009697
16 pipeline 0.019498
single_module 0.020136
32 pipeline 0.044092
single_module 0.042171
64 pipeline 0.100954
single_module 0.098596
128 pipeline 0.236934
single_module 0.240161
256 pipeline 0.468968
single_module 0.46902
512 pipeline 0.926071
single_module 0.934363
1024 pipeline 1.848435
single_module 1.842421
2048 pipeline 3.689693
single_module 3.695318
cuda 1 pipeline 0.00121
single_module 0.001098
2 pipeline 0.001757
single_module 0.001576
4 pipeline 0.002903
single_module 0.002861
8 pipeline 0.004742
single_module 0.004946
16 pipeline 0.009272
single_module 0.00914
32 pipeline 0.017959
single_module 0.017366
64 pipeline 0.036728
single_module 0.036603
128 pipeline 0.075017
single_module 0.075609
256 pipeline 0.157403
single_module 0.15643
512 pipeline 0.328297
single_module 0.324105
1024 pipeline 0.671505
single_module 0.671076
2048 pipeline 1.369388
single_module 1.368563

@geoffreyangus geoffreyangus merged commit c26e81a into master Jun 27, 2022
@geoffreyangus geoffreyangus deleted the enh-inference-pipeline branch June 27, 2022 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants