enh: Implements `InferenceModule` as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

brightsparc · 2022-06-07T05:29:56Z

In order to be able to create an ensemble or torchscript models, there is a benefit to having each of the steps in the inference pipeline their own module, such that these could be ensembled together to form an inference graph in triton.

I have re-written the InferenceModule to be an InferencePipelineModule with the following steps

class InferencePipelineModule(nn.Module):
    """Wraps preprocessing, model forward pass, and postprocessing modules into a single module.

    The purpose of the module is to be scripted into Torchscript for native serving. The nn.ModuleDict attributes of
    this module use keys generated by feature_utils.get_module_dict_key_from_name in order to prevent name collisions
    with keywords reserved by TorchScript.
    """

    def __init__(self, model: "ECD", config: Dict[str, Any], training_set_metadata: Dict[str, Any]):
        super().__init__()

        self.preprocess_module = torch.jit.script(PreprocessModule(config, training_set_metadata))
        self.predict_module = torch.jit.script(PredictModule(model))
        self.postprocess_module = torch.jit.script(PostprocessModule(config, training_set_metadata))

    def forward(self, inputs: Dict[str, Union[List[str], List[torch.Tensor], torch.Tensor]]):
        with torch.no_grad():
            preproc_outputs = self.preprocess_module(inputs)
            predictions = self.predict_module(preproc_outputs)
            postproc_outputs = self.postprocess_module(predictions)
            return postproc_outputs

I have compared the performance of this pipeline over the original all in one, and identifier that it has very similar performance whilst being more composable.

75.9 µs ± 266 ns per loop - InferenceModule
78.3 µs ± 255 ns per loop - InferencePipelineModule

…ost-processing modules

github-actions · 2022-06-07T06:19:06Z

Unit Test Results

      6 files ±  0       6 suites ±0 2h 14m 40s ⏱️ + 8m 37s
2 880 tests +12 2 834 ✔️ ±0   46 💤 +12 0 ❌ ±0
8 640 runs +36 8 498 ✔️ ±0 142 💤 +36 0 ❌ ±0

Results for commit b493597. ± Comparison against base commit a587181.

♻️ This comment has been updated with latest results.

for more information, see https://pre-commit.ci

…/ludwig into enh-inference-pipeline

brightsparc · 2022-06-09T01:34:18Z

This is looking good, is there a plan to depreciate _InferenceModuleV0 as part of this release?

for more information, see https://pre-commit.ci

…/ludwig into enh-inference-pipeline

ludwig/models/inference.py

for more information, see https://pre-commit.ci

…/ludwig into enh-inference-pipeline

geoffreyangus · 2022-06-27T21:41:59Z

Claim: The performance of the pipeline and single_module implementations of the inference module are similar enough in a standard environment (no parallelism, no Triton) to merit a switch over to the pipeline implementation in Ludwig.

Implications: Switching to the pipeline implementation would provide us a myriad of benefits, including but not limited to (1) mixed backend deployment (libtorch primarily, python if needed) and (2) the ability to tune resource allocation/scheduling per inference stage (preprocessing vs. prediction vs. postprocessing). Item (2) is particularly important in cases where preprocessing is costly, as it is in the text domain (see AGNEWS results).

Methodology: We first minimally train some LudwigModel, then save out 4 torchscript artifacts: (1) an end-to-end torchscript module, (2,3,4) preprocessor, predictor and postprocessor torchscript modules. The first module is loaded back in as a single_module model, the others are loaded as a pipeline model. The implementation for each of these modules can be found in ludwig/models/inference.py in this PR. We warm up each of the torchscript models by feeding in 100 warmup batches, each of size random.choice([1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048]). Supplementary experiments demonstrated that the warmup batch size had a non-trivial effect on model throughput, hence the wide range of warmup batch sizes. Finally, the test batches were pre-loaded and formatted as needed (pd.DataFrame or Dict[...]). Finally we begin a timer and pass the batches through the model. The complete implementation of the evaluation script can be found in this GitHub Gist.

ludwig_model: A vanilla LudwigModel. We make predictions using the predict method.
single_module: A torchscripted module that preprocesses, predicts, and postprocesses. We make predictions using the forward method.
pipeline: A python module with torchscripted modules for each of preprocessing, prediction, and postprocessing steps (3 total). We make predictions using a forward method that passes inputs/outputs between each module.

Performance on the TITANIC Dataset

device	batch_size	name	AVERAGE of duration (secs)
cpu	1	pipeline	0.000463
		single_module	0.0002786
	2	pipeline	0.0004705
		single_module	0.0003006
	4	pipeline	0.00048
		single_module	0.0003271
	8	pipeline	0.0004812
		single_module	0.00035
	16	pipeline	0.0005716
		single_module	0.0004048
	32	pipeline	0.0006246
		single_module	0.0005004
	64	pipeline	0.0008573
		single_module	0.0006887
	128	pipeline	0.0011964
		single_module	0.0010467
	256	pipeline	0.0018457
		single_module	0.0016829
	512	pipeline	0.0031429
		single_module	0.0029487
	1024	pipeline	0.0055807
		single_module	0.0054132
	2048	pipeline	0.010504
		single_module	0.0103147
cuda	1	pipeline	0.0006884
		single_module	0.0004686
	2	pipeline	0.0007017
		single_module	0.0004942
	4	pipeline	0.0007413
		single_module	0.0005514
	8	pipeline	0.0007642
		single_module	0.0006064
	16	pipeline	0.0009068
		single_module	0.0007139
	32	pipeline	0.0010783
		single_module	0.0009261
	64	pipeline	0.0015346
		single_module	0.0013382
	128	pipeline	0.0023475
		single_module	0.0021445
	256	pipeline	0.0039793
		single_module	0.003786
	512	pipeline	0.0072207
		single_module	0.0070117
	1024	pipeline	0.0137522
		single_module	0.0134748
	2048	pipeline	0.0266859
		single_module	0.0264546

Performance on the AGNEWS dataset

device	batch_size	name	AVERAGE of duration (secs)
cpu	1	pipeline	0.002553
		single_module	0.002517
	2	pipeline	0.003781
		single_module	0.003653
	4	pipeline	0.005564
		single_module	0.005872
	8	pipeline	0.010067
		single_module	0.009697
	16	pipeline	0.019498
		single_module	0.020136
	32	pipeline	0.044092
		single_module	0.042171
	64	pipeline	0.100954
		single_module	0.098596
	128	pipeline	0.236934
		single_module	0.240161
	256	pipeline	0.468968
		single_module	0.46902
	512	pipeline	0.926071
		single_module	0.934363
	1024	pipeline	1.848435
		single_module	1.842421
	2048	pipeline	3.689693
		single_module	3.695318
cuda	1	pipeline	0.00121
		single_module	0.001098
	2	pipeline	0.001757
		single_module	0.001576
	4	pipeline	0.002903
		single_module	0.002861
	8	pipeline	0.004742
		single_module	0.004946
	16	pipeline	0.009272
		single_module	0.00914
	32	pipeline	0.017959
		single_module	0.017366
	64	pipeline	0.036728
		single_module	0.036603
	128	pipeline	0.075017
		single_module	0.075609
	256	pipeline	0.157403
		single_module	0.15643
	512	pipeline	0.328297
		single_module	0.324105
	1024	pipeline	0.671505
		single_module	0.671076
	2048	pipeline	1.369388
		single_module	1.368563

Adding inference pipeline with seperate pre-processing, predict and p…

d2fbf4e

…ost-processing modules

Update to flatten outputs from predict consistent to support triton

6ffd137

brightsparc force-pushed the enh-inference-pipeline branch from 302d146 to 6ffd137 Compare June 8, 2022 06:52

geoffreyangus and others added 19 commits June 8, 2022 10:58

inference module refactor

97af3c5

add back InferenceLudwigModel

9c4888e

[pre-commit.ci] auto fixes from pre-commit.com hooks

776bc0c

for more information, see https://pre-commit.ci

unify modules into inference.py

26a1689

Merge branch 'enh-inference-pipeline' of https://github.com/ludwig-ai…

4e809a6

…/ludwig into enh-inference-pipeline

cleaned up inaccurate documentation

98abff3

clean up

e9e5a95

clean up type hints and update InferenceLudwigModel

22fe999

Merge branch 'master' into enh-inference-pipeline

d3959ad

clean up type hint; passes test_torchscript.py

2b0423a

added typing to inference module for clarity

d2032e8

remove inference_module_file_name constant

9254389

unified predict module with postproc

6914b39

removed InferencePredictor entirely

917fe11

add back the old inference module

211cf0d

add back training set metadata

4c965c6

revert change to predict module, move feature filtering to postproc

c723b97

cleanup inference_module_v0

1669f06

cleanup

2af0f05

geoffreyangus and others added 6 commits June 10, 2022 08:44

Merge branch 'master' into enh-inference-pipeline

560126e

adds device placement to InferenceLudwigModel

b4d1f6e

adds ability to save/load torchscript on particular devices

1583ea2

[pre-commit.ci] auto fixes from pre-commit.com hooks

35387eb

for more information, see https://pre-commit.ci

allows saving torchscript with dict of devices from api.py

057c124

Merge branch 'enh-inference-pipeline' of https://github.com/ludwig-ai…

88e25e9

…/ludwig into enh-inference-pipeline

geoffreyangus added 15 commits June 17, 2022 08:25

clean up how we get input device in predictor_forward

dae5136

merge

524eff3

first commit

301cae5

merge

10f4e71

Merge branch 'enh-inference-pipeline' into ts-gpu-compat

a7dbae6

wip

eb295e5

updated interfaces

f0c4744

Merge branch 'enh-inference-pipeline' into ts-gpu-compat

83f953d

postproc GPU

e3d3d56

add intelligent device placement

884e310

clean up device api

fc5f587

revert flatten op in inference_module_v0

a7e0be2

remove dtype workaround

b5f26d4

Merge branch 'master' into enh-inference-pipeline

c543e97

benchmarking code

a96a80a

tgaddair approved these changes Jun 27, 2022

View reviewed changes

ludwig/models/inference.py Outdated Show resolved Hide resolved

geoffreyangus added 4 commits June 27, 2022 17:20

Merge branch 'master' into enh-inference-pipeline

e0ff7c1

add DEVICE constant as good default for loading/saving

2cce800

added helpful logging and style

eecc00a

cleanup

80e42e5

geoffreyangus marked this pull request as ready for review June 27, 2022 16:55

geoffreyangus and others added 5 commits June 27, 2022 23:23

cleanup, adding docstrings

59d4c0e

[pre-commit.ci] auto fixes from pre-commit.com hooks

dc81d2c

for more information, see https://pre-commit.ci

docstring

2fc7523

Merge branch 'enh-inference-pipeline' of https://github.com/ludwig-ai…

18ad17e

…/ludwig into enh-inference-pipeline

Merge branch 'master' into enh-inference-pipeline

b493597

geoffreyangus changed the title ~~enh: Adding inference pipeline with seperate pre, post and predict into seperate modules.~~ enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules Jun 27, 2022

geoffreyangus merged commit c26e81a into master Jun 27, 2022

geoffreyangus deleted the enh-inference-pipeline branch June 27, 2022 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: Implements `InferenceModule` as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

enh: Implements `InferenceModule` as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

brightsparc commented Jun 7, 2022 •

edited

Loading

github-actions bot commented Jun 7, 2022 •

edited

Loading

brightsparc commented Jun 9, 2022

geoffreyangus commented Jun 27, 2022

enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

Conversation

brightsparc commented Jun 7, 2022 • edited Loading

github-actions bot commented Jun 7, 2022 • edited Loading

Unit Test Results

brightsparc commented Jun 9, 2022

geoffreyangus commented Jun 27, 2022

enh: Implements `InferenceModule` as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

enh: Implements `InferenceModule` as a pipelined module with separate preprocessor, predictor, and postprocessor modules #2105

brightsparc commented Jun 7, 2022 •

edited

Loading

github-actions bot commented Jun 7, 2022 •

edited

Loading