feat(DTO): Add DTO codegen backend #2388

provinzkraut · 2023-09-29T14:03:21Z

Pull Request Checklist

Add a new _DTOCodegenBackend that aims to improve performance by generating optimised transfer functions ahead of time.

Some performance impressions:

method	backend	seconds
populate_from_raw	standard	1.817
populate_from_raw	codegen	0.487
populate_from_raw_collection	standard	2.376
populate_from_raw_collection	codegen	0.681
populate_from_builtins	standard	1.772
populate_from_builtins	codegen	0.470
populate_from_builtins_collection	standard	2.407
populate_from_builtins_collection	codegen	0.655
encode	standard	1.539
encode	codegen	0.285
encode_collection	standard	1.897
encode_collection	codegen	0.349

(times shown are for encoding the same data set 100,000 times)

What it does

Unrolling loops: Instead of iterating over the fields, instructions for each field are written out explicitly
Reduce function calls: Code of nested functions and for nested transfers is inlined into the main transfer function, reducing the number of overall function calls and assignments
Reduce branching: A lot of the branching done at runtime can be avoided since we have all the information beforehand (e.g. different paths for Mapping or regular object types, specific field types, which also involves avoid calling isinstance and friends)
Optimised field access: Wrapping field access in try/except block is faster if we expect no exception to be raised most of the time, and a hasattr/<field name> in <object> is faster for the average case where a field might not be present. Since we can guess which case we'll hit more frequently on average (by checking whether it's optional/excluded/UNSET/Optional[]) we can use the most performant method for every individual field
Zero cost renaming: Instead of checking which fields need to be renamed and using temporary values, renamed fields are assigned the same as regular fields, since all information is known beforehand
Zero cost exclusion: No code is generated for excluded fields, avoiding all checks related to this at runtime

Examples

This is the core loop of the regular DTO backend that we perform these operations for each field, every time we transfer data:

should_use_serialization_name = not override_serialization_name and is_data_field
source_name = field_definition.serialization_name if should_use_serialization_name else field_definition.name

if not is_data_field:
    if field_definition.is_excluded:
        continue
elif not (
    source_name in source_instance
    if isinstance(source_instance, Mapping)
    else hasattr(source_instance, source_name)
):
    continue

transfer_type = field_definition.transfer_type
destination_name = field_definition.name if is_data_field else field_definition.serialization_name
source_value = (
    source_instance[source_name]
    if isinstance(source_instance, Mapping)
    else getattr(source_instance, source_name)
)

if field_definition.is_partial and is_data_field and source_value is UNSET:
    continue

unstructured_data[destination_name] = _transfer_type_data(
    source_value=source_value,
    transfer_type=transfer_type,
    nested_as_dict=destination_type is dict,
    is_data_field=is_data_field,
    override_serialization_name=override_serialization_name,
)

Assuming our source is a Mapping, and we don't expect an optional value, we can generate the following code for this:

try:
    source_value = source_instance["some_name"]
    if source_value is not UNSET:
        unstructured_data["desitination_name"] = _transfer_type_data(source_value)
except KeyError:
    pass

The actual code will differ from this, as for example _transfer_type_data will be inlined as well with the appropriate function for the type and the if branch might be skipped as well, but this gives a good impression of what sort of things can be done.
Below is a full example of the generated code with some formatting and comments added.

Full example

def func(source_instance_0):
  unstructured_data_0 = {}
  
  # we have two main branches, one for a mapping and one for a regular object, 
  # since we accept both as input data. Having two big branches allows us to 
  # only perform this check once
  if isinstance(source_instance_0, Mapping):

      # we use try/except here since we expect the item to be present
      try:
          unstructured_data_0["a"] = source_instance_0["a"]
      except KeyError:
          pass

      # this is an inlined function; a transfer for another - nested - type
      try:

          unstructured_data_1 = {}

          # if we access this more than once, we assign it to a variable. we could do some 
          # fine tuning here to find the sweet spot where the assignment is cheaper than 
          # the lookup but this depends on the data type and varies case by case and between 
          # python versions, so I kept it fairly simple for now
          source_instance_0_nested_0 = source_instance_0["nested"]

          if isinstance(source_instance_0_nested_0, Mapping):

              try:
                  unstructured_data_1["a"] = source_instance_0_nested_0["a"]
              except KeyError:
                  pass

              try:
                  unstructured_data_1["b"] = source_instance_0_nested_0["b"]
              except KeyError:
                  pass

          else:
              try:
                  unstructured_data_1["a"] = source_instance_0_nested_0.a
              except AttributeError:
                  pass

              try:
                  unstructured_data_1["b"] = source_instance_0_nested_0.b
              except AttributeError:
                  pass

          # call the transfer model type with the kwargs we've built. In case the destination
          # type is a dict we well, we would actually skip this step and just do 
          # unstructured_data_0["nested"] = unstructured_data_1
          unstructured_data_0["nested"] = destination_type_1(**unstructured_data_1)

      except KeyError:
          pass

      try:
          # using a generator comprehension here for collections. in theory we could optimise
          # this for collections with a known size and unroll this loop as well
          unstructured_data_0["nested_list"] = origin_0(
              transfer_type_data_0(item) for item in source_instance_0["nested_list"]
          )
      except KeyError:
          pass

      try:
          unstructured_data_0["b"] = source_instance_0["b"]
      except KeyError:
          pass

      try:
          unstructured_data_0["c"] = origin_1(source_instance_0["c"])
      except KeyError:
          pass

      # using this check here because "optional" is optional, so we expect to hit the failure
      # case more frequently, which is more costly with a try/except 
      if "optional" in source_instance_0:
          unstructured_data_0["optional"] = source_instance_0["optional"]

  else:
      # this is the same as above, just for an object instead of a mapping

      try:
          unstructured_data_0["a"] = source_instance_0.a
      except AttributeError:
          pass

      try:
          unstructured_data_2 = {}
          source_instance_0_nested_1 = source_instance_0.nested

          if isinstance(source_instance_0_nested_1, Mapping):

              try:
                  unstructured_data_2["a"] = source_instance_0_nested_1["a"]
              except KeyError:
                  pass

              try:
                  unstructured_data_2["b"] = source_instance_0_nested_1["b"]
              except KeyError:
                  pass
          else:

              try:
                  unstructured_data_2["a"] = source_instance_0_nested_1.a
              except AttributeError:
                  pass

              try:
                  unstructured_data_2["b"] = source_instance_0_nested_1.b
              except AttributeError:
                  pass

          unstructured_data_0["nested"] = destination_type_2(**unstructured_data_2)

      except AttributeError:
          pass

      try:
          unstructured_data_0["nested_list"] = origin_2(
              transfer_type_data_1(item) for item in source_instance_0.nested_list
          )
      except AttributeError:
          pass

      try:
          unstructured_data_0["b"] = source_instance_0.b
      except AttributeError:
          pass

      try:
          unstructured_data_0["c"] = origin_3(source_instance_0.c)
      except AttributeError:
          pass

      if hasattr(source_instance_0, "optional"):
          unstructured_data_0["optional"] = source_instance_0.optional

  tmp_return_type_0 = destination_type_0(**unstructured_data_0)
  return tmp_return_type_0

Limitations

Inlining for nested collections isn't possible; Instead, for each collection a separate function is generated which will be called within a comprehension of the appropriate type. In theory this could still be optimised for collections of known size (e.g. tuples or collections that have a constraint on their length).

Drawbacks

This implementation is a bit harder to reason about than the regular backend and therefore might increase the maintenance effort.

Apply some micro-optimizations to _transfer_instance_data

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

…actored) (#2176) 'Refactored by Sourcery' Co-authored-by: Sourcery AI <>

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

…efactored) (#2230) 'Refactored by Sourcery' Co-authored-by: Sourcery AI <>

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

litestar/config/app.py

litestar/dto/_codegen_backend.py

litestar/dto/base_dto.py

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

cofin · 2023-10-07T01:58:03Z

Given that these are hidden behind a feature flag, is there any reason not to move forward with this now?

JacobCoffee

dont trust me

provinzkraut · 2023-10-07T07:52:06Z

Given that these are hidden behind a feature flag, is there any reason not to move forward with this now?

Actually no, you're right (=

sonarcloud · 2023-10-07T07:58:30Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

99.1% Coverage
0.0% Duplication

github-actions · 2023-10-07T08:03:31Z

Documentation preview will be available shortly at https://litestar-org.github.io/litestar-docs-preview/2388

provinzkraut and others added 29 commits September 29, 2023 15:09

feat(dto): optimise _transfer_instance_data

3fe6a7b

Apply some micro-optimizations to _transfer_instance_data

wip

058f660

some refactoring

886a6d3

use localized names only

84b7a4f

reduce creation of temporary variables

f64e39a

inline all _transfer_instance_data calls

909c399

fix extraction for empty DTOs

4cef153

optimise field access

edebd67

refactor and inline nested _transfer_type_data

c6c56e2

refactor _transfer_data

613764e

inline _transfer_data

70d7a9f

pre-compute generated functions

94b76f3

optimise transfer to dicts

bb9556f

formatting

197e09c

inline transfer_nested_union_type_data

75ed7dc

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

feat(DT): Improve performance with codegeneration - WIP (Sourcery ref…

2989d50

…actored) (#2176) 'Refactored by Sourcery' Co-authored-by: Sourcery AI <>

formatting

052ce63

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

add feature flag for DTO codegen

b99c3f6

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

remove duplicated implementation

92adf02

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

implement more tests for codegen backend

7c2b4fd

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

fix test typing

3e00ac8

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

don't pass locals to fn explicitly

5f5884a

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

optimise nested attribute access

42b6e73

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

feat(DTO): Improve performance with code generation - WIP (Sourcery r…

2a4054c

…efactored) (#2230) 'Refactored by Sourcery' Co-authored-by: Sourcery AI <>

fix pyproject.toml

cee3f81

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

simplify nested blocks

af4e336

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

fix rebase error :)

9d3ce10

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

add app level feature flag

764f434

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

some refactoring

30b495b

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

provinzkraut requested a review from a team as a code owner September 29, 2023 14:03

provinzkraut commented Sep 29, 2023

View reviewed changes

provinzkraut added 4 commits September 29, 2023 16:13

fix StrEnum

e7f563b

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

Merge branch 'main' into dto-codegen

1bbb2ef

use unique_name_for_scope

7ca5bb7

Signed-off-by: Janek Nouvertné <25355197+provinzkraut@users.noreply.github.com>

Merge branch 'main' into dto-codegen

62a5ff8

JacobCoffee approved these changes Oct 7, 2023

View reviewed changes

JacobCoffee requested review from peterschutt, guacs and Alc-Alc October 7, 2023 05:58

Merge branch 'main' into dto-codegen

81d39b0

provinzkraut enabled auto-merge (squash) October 7, 2023 07:53

provinzkraut merged commit 8874b03 into main Oct 7, 2023
19 checks passed

provinzkraut deleted the dto-codegen branch October 7, 2023 08:13

provinzkraut mentioned this pull request Oct 7, 2023

docs(DTO): add section for DTO codegen backend #2415

Merged

provinzkraut mentioned this pull request Mar 16, 2024

feat(DTO): Enable codegen backend by default #3215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(DTO): Add DTO codegen backend #2388

feat(DTO): Add DTO codegen backend #2388

provinzkraut commented Sep 29, 2023 •

edited

cofin commented Oct 7, 2023

JacobCoffee left a comment

provinzkraut commented Oct 7, 2023

sonarcloud bot commented Oct 7, 2023

github-actions bot commented Oct 7, 2023

feat(DTO): Add DTO codegen backend #2388

feat(DTO): Add DTO codegen backend #2388

Conversation

provinzkraut commented Sep 29, 2023 • edited

Pull Request Checklist

What it does

Examples

Limitations

Drawbacks

cofin commented Oct 7, 2023

JacobCoffee left a comment

Choose a reason for hiding this comment

provinzkraut commented Oct 7, 2023

sonarcloud bot commented Oct 7, 2023

github-actions bot commented Oct 7, 2023

provinzkraut commented Sep 29, 2023 •

edited