Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speeding up pydantic #547

Closed
samuelcolvin opened this issue May 25, 2019 · 12 comments
Closed

speeding up pydantic #547

samuelcolvin opened this issue May 25, 2019 · 12 comments
Labels
feature request help wanted Pull Request welcome

Comments

@samuelcolvin
Copy link
Member

In my own implementation of the benchmark using attrs+cattrs (#543 referenced above), I was getting ~3x faster execution for attrs+cattrs relative to pydantic.

Trying to get to the bottom of why the attrs+cattrs benchmark was faster (and if anything could be easily modified to catch up), I did some line-by-line profiling, and while nothing stood out to me as an obvious bottleneck, I did notice that you pay a price for some of the validations even when they are disabled (e.g., when validating a string, anystr_length_validator gets called even if there are no length restrictions on the instance).

@samuelcolvin Is there a way to actually modify the set of validators that gets called for a given model? That seems more reflective of my understanding of how attrs+cattrs performs validation, and I would expect it to help speed things up if, for instance, you only want type validation. It looked to me like something was happening where there was iteration over each field's list of validators, but if I touched that list (even just removing items from it) it surprisingly caused everything to run much more slowly.

@sebastianmika for what it's worth, if you are trying to resolve a bottleneck, I was able to get a ~30% speed improvement in the pydantic benchmark by cythonizing just fields.py, main.py and validators.py with zero changes to the underlying source (except I had to rename uses of include to in main.py as include is reserved in cython).

Originally posted by @dmontagu in #513 (comment)

@samuelcolvin
Copy link
Member Author

@dmontagu this is really interesting, I've copied this to a new issue to separate it from discussions around attr benchmarks/

Is there any easy way for you share your result from profiling? This is something I've been intending to do for a year or so but haven't had the time to sit down and concentrate on it. If we can easily benefit from your work without having to start from scratch that would be great.

Regarding cythonizing, I'd be very keen to add this to the pypi release, my initial thoughts/concerns:

  • we need to keep pydantic working identically with and without cython, either by:
    • duplicating code in .py and .pyx, if so we should minimise the amount of duplicated code
    • or by modifying python code to be cython friendly but still valid python. What's the current recommendation on this?
  • we should include manylinux binaries to make installs fast for the majority of users. According to this windows installs are less than 1%, but maybe that's not correct and some of the system=null installs are actually windows? Either way windows binaries are too much hassle for me, at least initially

@samuelcolvin samuelcolvin mentioned this issue May 25, 2019
4 tasks
@samuelcolvin
Copy link
Member Author

@dmontagu are you sure you got main.py to compile to cython?

I've been playing around and while fields.py and validators.py worked fine, cythonizing main.py always gives me no validator found for <class 'cython_function_or_method'>

@samuelcolvin samuelcolvin mentioned this issue May 25, 2019
5 tasks
@samuelcolvin
Copy link
Member Author

see #548, feedback much appreciated.

@dmontagu
Copy link
Contributor

@samuelcolvin ah yes, I tried to cythonize main.py but I think I ran into precisely the issue you are describing, I can't remember if I was able to work around it.

One thing I remember noticing from the line profiler was that this check in particular was eating a couple % of execution time, despite being basically irrelevant after model definition. But it looks like that was removed in the cython branch.

I had to hack the codebase to get the line profiler working due in part to conflicts with built-in names (in particular, I think the presence of a file called profile.py next to the benchmark was causing some conflicts). Sadly I nuked my profiler setup but it shouldn't be hard to put it back together. I'll get it working and then share.

@samuelcolvin
Copy link
Member Author

no worries, i got line profiling working fine. (didn't run into the profile.py problem you mentioned oddly).

But it looks like that was removed in the cython branch.

No just moved, will look at that too.

You were right about extra validators being called, removing that saved a significant about of time.

@dmontagu
Copy link
Contributor

dmontagu commented May 25, 2019

@samuelcolvin Here is the output of the line-profiler running the benchmark on my computer: https://gist.github.com/dmontagu/0a8b823c2c5dec3954405a921028f7dd.

  • profile_all.log has more functions being line-profiled (as it is set up in Add line profiling, with some in-line #551)
  • profile_last_layer.log has only the "last layer" of functions being line-profiled (basically just commenting out where I added the other functions to the line profiler), so has less "noise" in the output

If you look at the code in #551 I think it should be clear how to modify to run it on different functions if you want to profile others. (But it sounds like you might now have your own line-profiling setup.)

Next I'll try to run it on the cython branch and add that output.

@dmontagu
Copy link
Contributor

dmontagu commented May 25, 2019

https://gist.github.com/dmontagu/0a8b823c2c5dec3954405a921028f7dd now contains

  • cython_all.log, and
  • cython_last_layer.log

with the results of line-profiling the cythonized code, analogous to the files described above. Note: line profiling the cythonized code required some tweaks to the setup.py to add line traces, but wasn't too bad (see the comment on #552 for more info).

@tiangolo
Copy link
Member

tiangolo commented May 26, 2019

Just to chime in about Windows:

All of FastAPI's users are Pydantic users. I don't expect them all to be power-users.

And I expect many developers would use Windows, only/mainly for development, not for production (I imagine/hope). I personally use Linux, but FastAPI is compatible with all platforms.

I would like to suggest to keep a Windows-compatible version available.

Now, I wouldn't really care much about an optimized version for Windows, just any "vanilla" version (without Cython) that developers can use when developing in Windows.

I expect anyone that wants maximum performance in production would use Linux (and would take advantage of any Cython optimized code). But I would like to suggest keeping a simple Windows-compatible version too.


That aside, I think these efforts to optimize performance to the next level are amazing 🎉

@samuelcolvin
Copy link
Member Author

thanks @tiangolo.

From the download numbers it looks like very few people use windows (although maybe the platform=null people are windows, not at all clear, maybe they're macos????). These numbers will of course include fastapi users.

The version in #548 has close to identical behaviour with and without compilation with cython - tests are run with and without cython for every major python version, no tests are skipped in either case.

Tests are now also run (without compiling to cython) on windows using azure pipeline for all builds.

Regarding building precompiled binaries for windows and macos, there's #555 - maybe no one cares. It's just there to link to from the docs.

@tiangolo
Copy link
Member

Awesome, thanks @samuelcolvin .

As I reported in the PR, I was able to install it in Windows and it works without problems (with the non-Cythonized version). ✔️

So, my use case (FastAPI) is covered. 🎉


Just for completeness, I think there's something weird with the usage report.

Here's a snapshot of the visitors of FastAPI's docs site by OS in the last month:

macOS: 1,751
Windows: 1,648
Linux: 1,008
Others: ...

Selection_055

I know website visitors is not the same as installs, but I would at least expect the statistics to be somewhat correlated. The number of Windows users shouldn't be that high.

But again, this is just for completeness.

I don't think there's anything else that needs to be done given that Pydantic keeps installing without problems on Windows.

And I expect Windows users would still deploy to Linux, even more, if they want maximum performance.

@samuelcolvin
Copy link
Member Author

Interesting. Downloads from pypi will be heavily skewed by CI (eg. travis will often download packages multiple times for every build) which is normally linux.

But windows installs still shouldn't be that low. Either way, I agree we're fine on that front.

@dmontagu
Copy link
Contributor

@samuelcolvin I ran the benchmarks for these branches using PyPy as well, in case you were interested in the performance.

https://gist.github.com/dmontagu/4b31ce876c0d41a6735b641c65fb7285
That gist has benchmark timings on my local machine for:

  • Master branch
  • My version of the cython branch (PR submitted), without cython installed
  • My version of the cython branch, with cython installed (so compilation happens)
    using python executables:
  • py37 (CPython 3.7.3)
  • pypy711 (PyPy 7.1.1)

For the PyPy runs I did 10 repetitions of the benchmark to give it some time to jit away; I'm not sure if this is enough to see the full potential or not. PyPy seems to help the other libraries more, relatively speaking, though it does offer some improvement over fully non-cythonized pydantic. But the cythonized Pydantic is still near the front.

alexdrydew pushed a commit to alexdrydew/pydantic that referenced this issue Dec 23, 2023
* implement computed fields 🎉

* dataclass properties

* catch errors in properties

* fix include and exclude

* add example from pydantic#2625, fix by_alias

* fix on older python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request help wanted Pull Request welcome
Projects
None yet
Development

No branches or pull requests

3 participants