Atoms iterator performance improvement #476

vovanbo · 2017-03-17T19:39:55Z

Try to avoid unnecessary dict conversions in most frequently called method of Schematics.
Simple benchmark is here: https://gist.github.com/vovanbo/dbb3044b417bcd407a4411119aacf415

lkraider · 2017-03-17T20:05:33Z

Awesome, thanks for the benchmark! I'll review it and merge ASAP.

lkraider

Thanks, merging this! I'll update the returned tuple to match the docs (ie: keep the keys as a filter of all fields), even thought the only use case is filtering the value for now.

lkraider · 2017-03-17T23:01:29Z

schematics/iteration.py


-        atom_tuple = Atom(**dict((k, atom_dict.get(k)) for k in keys))
+        atom_tuple = Atom(name=field_name, field=field, value=value)


By always setting the keys, we loose the option to not return a field, but since there is no use case defined for that it is not a loss. Compare the output:

Before patch

list(iteration.atoms(schema, data, keys=['value'])) Out[3]: [Atom(name=None, field=None, value='42'), Atom(name=None, field=None, value=Undefined), Atom(name=None, field=None, value=Undefined)]

After patch

list(iteration.atoms(schema, data, keys=['value'])) Out[10]: [Atom(name='id', field=<IntType() instance as 'id'>, value='42'), Atom(name='first_name', field=<StringType() instance as 'first_name'>, value=Undefined), Atom(name='last_name', field=<StringType() instance as 'last_name'>, value=Undefined)]

lkraider · 2017-03-20T16:39:19Z

Just FYI: I updated the development branch code to still keep the before-patch functionality and added test to document it, while keeping the speed improvement. Thanks!

vovanbo · 2017-03-20T18:44:45Z

Awesome! Thanks for merging!

I had lost from my view savings of initial None values as you added in tests. Looks like atoms is more useful, than I think initially. :)

At this moment I try to search any another things that can be more faster in Schematics. In my current project Schematics is bottleneck, that deliver clean models for business logic, but also bring slowness at marshalling level.

It may be confusing, but, if it is interesting for you as developer, in details, I have customised ModelType that allows to lazy resolving of models at runtime and some limiting logic on export loop to avoid infinite recursion. I think about PR of this ModelType, but I don't have any time at this moment to offer flexible decision about type syntax to avoid setup full path anytime (e.g. ModelType('SomeModel') instead of ModelType('full.path.to.models.SomeModel'). I guess Schematics haven't this functionality out of the box, because of serialization of circular "wired" models often depend on project requirements (and also can be impossible at all). So, in my case marshalling of deeply nested models can take up to 90% of time of request handling and many-many import/export loops calls.

P.S. Excuse me for changing this place to discuss.

lkraider · 2017-03-20T19:20:38Z

@vovanbo It would be great if you could provide test cases for slow performance. Schematics is definitely heavy on looping over nested models and is something we should try to optimize.

https://github.com/schematics/schematics-benchmark is where we will be storing the benchmarks we want to keep track of.

For the dynamic ModelTypes, I have a feature of Model.append_field(...) that allows extending a Model after declaration, but I would like to see your approach. Can you create an Issue for this feature and propose your PR for it?

Atoms iterator performance improvement

6cbf8e3

lkraider approved these changes Mar 20, 2017

View reviewed changes

lkraider merged commit aebc7b0 into schematics:development Mar 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atoms iterator performance improvement #476

Atoms iterator performance improvement #476

vovanbo commented Mar 17, 2017

lkraider commented Mar 17, 2017

lkraider left a comment

lkraider Mar 17, 2017

lkraider commented Mar 20, 2017 •

edited

vovanbo commented Mar 20, 2017

lkraider commented Mar 20, 2017


		atom_tuple = Atom(**dict((k, atom_dict.get(k)) for k in keys))
		atom_tuple = Atom(name=field_name, field=field, value=value)

Atoms iterator performance improvement #476

Atoms iterator performance improvement #476

Conversation

vovanbo commented Mar 17, 2017

lkraider commented Mar 17, 2017

lkraider left a comment

Choose a reason for hiding this comment

lkraider Mar 17, 2017

Choose a reason for hiding this comment

lkraider commented Mar 20, 2017 • edited

vovanbo commented Mar 20, 2017

lkraider commented Mar 20, 2017

lkraider commented Mar 20, 2017 •

edited