ORTModule GraphTransitionManager by pengwa · Pull Request #19007 · microsoft/onnxruntime

pengwa · 2024-01-04T13:51:19Z

Problem

Currently, the codebase contains some logics pertaining to model re-export checks and graph_builder reinitialization checks. Ideally, these operations should function akin to a state machine. However, upon inspecting the implementation, it becomes apparent that certain states are checked or set in various scattered locations. This fragmentation makes it challenging to comprehend when a re-export or re-initialization will be triggered. For optimal clarity and maintainability, it is advisable to consolidate these states into a cohesive component, rather than dispersing them within the current graph execution manager.

Furthermore, the process of model exports and post-export processing for stage 3 support or memory-efficient gradient management introduces considerable complexity. To enhance the codebase's structure, it would be beneficial to extract these intricate functionalities into a dedicated component, divorcing them from the current graph execution manager.

As part of the effort to improve the codebase, it's essential to address inconsistencies in handling input/output flatten/unflatten operations. Currently, there are several functions performing these operations recursively, each with slightly different implementations. This inconsistency leads to varying support for input/output data types and structures in different parts of the code. To rectify this, the proposed pull request simplifies these operations into a set of primitive functions, ensuring uniformity. This not only streamlines the code but also facilitates the maintenance of consistency when introducing bug fixes or supporting new data types. One thing to mention here: input output handling is deeply bound to the graph transition mentioned above, so it is difficult to make this change separately.

While acknowledging the complexity of these logics, it is reassuring that the codebase benefits from an extensive suite of unit tests that cover all possible branches. Despite the intricacies, ensuring the passage of all tests has been a time-intensive but necessary aspect of this development effort.

Design

Introduce GraphTransitionManager and put all model export and post-export processing logics in it.

Re-export check
Do export
Re-post-export process check
Do post-export process

Return PostExportProcessedModelInfo, which contains all the information we need, to pass to ORT to build gradient graph (currently we do the same for training or evaluating, but ideally we should not do it for evaluating, let's keep this behavior as it is now, and make the change later).

      # Input names for the pre-gradient-build graph.
      # This may be different with the one in ExportedGraph since we may modify the graph inputs as needed
      # for example when memory efficient gradient management is enabled.
      self.onnx_graph_input_names: list[str] = onnx_graph_input_names

      # A subset of onnx_graph_input_names.
      # Input names that require gradients for the pre-gradient-build graph.
      self.onnx_graph_input_names_require_grad: list[str] = onnx_graph_input_names_require_grad

      # Create symbolic names for each dimension of the graph input (e.g. onnx_graph_input_names).
      # The key is the input name, the value is a dict of {dim_index: symbolic_dim_name}
      # e.g. {"input1": {0: "input1_dim0", 1: "input1_dim1"}, "input2": {0: "input2_dim0"}}
      self.onnx_graph_input_dynamic_axes_map: dict[str, dict[int, str]] = onnx_graph_input_dynamic_axes_map

      self.buffer_for_ort_runs: dict[str, torch.Tensor] = OrderedDict()
      self.onnx_graph_input_names_user_defined = (
          onnx_graph_input_names_user_defined  # The ONNX graph input names excluding the parameters, buffers.
      )

      # The ONNX graph input names excluding the parameters, buffers.
      self.onnx_graph_input_names_require_grad_user_defined = onnx_graph_input_names_require_grad_user_defined

      self._post_export_processed_model: onnx.ModelProto | None = post_export_processed_model

      # A function to access the input data from the args and kwargs.
      # If it is not None, the length is same as onnx_graph_input_names.
      # For i-th input name, we can use the i-th function to get the input data from args and kwargs.
      self.data_accessor: list[callable] | None = data_accessor

      # Used for unflattening the outputs from the ORT forward run.
      self.module_forward_output_schema: ORTModelInputOutputSchemaType | None = module_forward_output_schema```

The GraphTransitionManager instance is a property of GraphExecutionManager (e.g. TrainingManager or ``InferenceManager),

Use 'self._graph_transition_manager.use_cache_or_reconstruct_post_processed_model(inputs, kwargs)' to check whether the PyTorch module need a re-export or re-post-export-process.
Use self._graph_transition_manager._post_export_processed_model_info.construct_inputs to construct the list of inputs used for ORT runs.
Use self._graph_transition_manager._post_export_processed_model_info.restore_outputs(user_outputs) to restore the outputs in original PyTorch output structure.

Motivation and Context

## Dependency #19007 ## ORTModule memory efficient gradient management Previously I have tried to solve the coarsed-grained gradient accumulation/update problem in ORTModule with #8979, while that resolution somehow is not fully validated with DDP or there is user hooks on the gradient accumulation on torch parameter. This PR is addressing the problem in the similar approach as PR 8979, e.g. trigger gradient accumulation once ORT computed the grad, but instead of use a AccumulateGrad op, this time with a ONNX operator PythonOp, internally it will call param.backward(grad), which will help handle all related hooks correctly. ## Design Check the details from https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ ## Convergence Validation: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe) differences are on mostly 0.000x, sometimes 0.00x, which may comes from the different order gradient apply happens before or after this change (on deepspeed zero stage 2) ## TODO Consolidate the logic with Stage3's similar logic.

pengwa

Sorry for the late response.

…pengwa/refactor_io

…ule_fallback.py

…pengwa/refactor_io

wschin · 2024-04-03T02:44:15Z

construct_inputs and restore_outputs can probably be done by calling tree_flatten and tree_unflatten in https://github.com/pytorch/pytorch/blob/15bd81bfafa86fec9d675e7f071c867c852ebe8f/torch/utils/_pytree.py#L799.

…pengwa/refactor_io

wschin

LGTM now. Thanks.

pengwa · 2024-07-03T02:53:27Z

Thanks @wschin.

## Dependency microsoft/onnxruntime#19007 ## ORTModule memory efficient gradient management Previously I have tried to solve the coarsed-grained gradient accumulation/update problem in ORTModule with microsoft/onnxruntime#8979, while that resolution somehow is not fully validated with DDP or there is user hooks on the gradient accumulation on torch parameter. This PR is addressing the problem in the similar approach as PR 8979, e.g. trigger gradient accumulation once ORT computed the grad, but instead of use a AccumulateGrad op, this time with a ONNX operator PythonOp, internally it will call param.backward(grad), which will help handle all related hooks correctly. ## Design Check the details from https://microsoftapc-my.sharepoint.com/:p:/g/personal/pengwa_microsoft_com/EaaBq4EzsFhOmsDEXCG7Ba4Bb9bwd0O2sFV_JXJ4jBLYLA?e=7Sz2g8&nav=eyJzSWQiOjI3MSwiY0lkIjozMjE4NzI1NDIzfQ ## Convergence Validation: ![image](https://github.com/microsoft/onnxruntime/assets/10530022/ccf3a213-e815-4b23-b759-165033b2d9fe) differences are on mostly 0.000x, sometimes 0.00x, which may comes from the different order gradient apply happens before or after this change (on deepspeed zero stage 2) ## TODO Consolidate the logic with Stage3's similar logic.

pengwa added 6 commits January 3, 2024 07:30

save

d1e53e4

save

527ccac

save

613136a

fix all tests

96e3d2c

fix

b2897a3

minor

a01cb88

github-advanced-security AI found potential problems Jan 4, 2024

View reviewed changes

Comment thread orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py Fixed

pengwa requested review from askhade, baijumeswani and thiagocrepaldi January 4, 2024 14:01

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Jan 4, 2024

pengwa changed the title ~~Refactor ORTModule model export/process and input/output handling~~ Introduce GraphTransitionManager for ORTModule Jan 4, 2024

pengwa changed the title ~~Introduce GraphTransitionManager for ORTModule~~ ORTModule GraphTransitionManager Jan 4, 2024

pengwa added 2 commits January 4, 2024 16:43

fix

34cdba4

fixes

8d34f43

pengwa requested a review from mindest January 5, 2024 02:54

fix

d990e0f

pengwa mentioned this pull request Jan 5, 2024

ORTModule memory improvement #18924

Merged

fix

29c8a98

github-advanced-security AI found potential problems Jan 8, 2024

View reviewed changes

Comment thread orttraining/orttraining/python/training/ortmodule/_io.py Fixed

Comment thread orttraining/orttraining/python/training/ortmodule/_graph_transition_manager.py Fixed

pengwa added 3 commits January 8, 2024 08:33

fixes

44f9f3f

fix ci

b33fd93

fix

a07f21c

baijumeswani reviewed Jan 8, 2024

View reviewed changes

refine based on review comments

2168ea7

pengwa commented Feb 21, 2024

View reviewed changes

pengwa added 3 commits February 22, 2024 09:34

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

92cb745

…pengwa/refactor_io

fix merge

958c837

fix

2d53141

pengwa added 10 commits February 26, 2024 12:57

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

d29e772

…pengwa/refactor_io

minors

f45c4b4

minor

2c69654

fix test

c4880c1

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

302b29e

…pengwa/refactor_io

fix tests orttraining/orttraining/test/python/orttraining_test_ortmod…

898ead8

…ule_fallback.py

yes, another minor fix

a1d1afe

fix memory efficient grad mangement

1958aed

minor

49cf041

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

e2daf49

…pengwa/refactor_io

pengwa requested a review from wschin March 26, 2024 01:41

pengwa added 2 commits June 12, 2024 11:16

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

becb4c5

…pengwa/refactor_io

lint

c4ebb6e

github-advanced-security AI found potential problems Jun 12, 2024

View reviewed changes

Comment thread orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py Fixed

fixes

a526905

github-advanced-security AI found potential problems Jun 12, 2024

View reviewed changes

pengwa added 6 commits June 13, 2024 04:25

fix lints

cc3871a

minor

523e63e

fix

4737bd4

fix ut

fd2c95a

fix

870aa30

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

02dee17

…pengwa/refactor_io

wschin approved these changes Jul 3, 2024

View reviewed changes

pengwa merged commit 4932e04 into main Jul 3, 2024

pengwa deleted the pengwa/refactor_io branch July 3, 2024 02:53

Conversation

pengwa commented Jan 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Design

Motivation and Context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengwa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wschin commented Apr 3, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wschin left a comment

Choose a reason for hiding this comment

Uh oh!

pengwa commented Jul 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pengwa commented Jan 4, 2024 •

edited

Loading