High CPU usage after upgrade #445

sahu-sunil · 2023-11-16T15:22:55Z

cattrs version: 23.1.2
Python version: 3.9
Operating System: Linux

We are using cattrs for config validation purpose of our tests in Asynchronous framework. Thousands of tests run every minute. We upgraded from 1.5.0 to 23.1.2 which has major changes on how cattrs back-end works. The cpu touches 100% in couple of mins. After profiling we found that cache (linecache) was the reason for high cpu usage. We disabled the caching (including detailed validations) with _cattrs_use_linecache=False which helped a bit, now it takes couple of hours (3-4h) to reach cpu to 100.
We see generate_unique_filename is still taking lot of time.

Is there any other way to handle this in cattrs latest version?

The text was updated successfully, but these errors were encountered:

Tinche · 2023-11-16T16:49:17Z

Interesting. That's quite a large upgrade though, a ton of things have changed. Are you using cattrs.Converter, cattrs.BaseConverter or functions imported directly from cattrs (like cattrs.structure)?

You can likely resolve your issue by switching to cattrs.BaseConverter, that converter class does minimal code generation. It's somewhat slower to structure and unstructure though. Would be curious for more information!

sahu-sunil · 2023-11-19T13:06:43Z

I am using cattrs.Converter and then registering hooks like -

converter = Converter(detailed_validation=False)
converter.register_structure_hook(
    ClassName,
    make_dict_structure_fn(
    ClassName,
    converter,
    _cattrs_use_linecache=False,
    _cattrs_detailed_validation=False,
    ),
)
converter.structure(....)

Will try BaseConverter for less initialisation but AFAIK it will still go for functions mentioned in above profiler snapshot.

Tinche · 2023-11-19T16:11:23Z

BaseConverter won't help if you're using make_dict_structure_fndirectly, since that's the source of the issue.

Well, the root issue is that you're probably recreating the same class a lot in your tests.

Let me give you some context. make_dict_structure_fn will compile a specialized function for structuring. It'll also store the source code of that function using Python's linecache module; this is necessary for good stack traces and for stepping through the function in a debugger.

But if you, for example, have a test that recreates the same class (or a converter) over and over again, you'll effectively create a leak there since all this source code will keep getting stored there, and generate_unique_filename is inefficient when dealing with a large number of classes with the same name (which is what you're seeing in the profile dump).

To me this indicates an inefficiency in your tests, any chance you could not generate so many converters or classes and hooks? Even if I worked around this in cattrs, it's still kind of bad practice.

That said, use_linecache=False should fix the issue so maybe it's a different class than what you think (or a nested class). You can debug by maybe setting up a breakpoint and inspecting linecache.cache manually; it'll probably have the same class name a ton.

import linecache

print(linecache.cache)

sahu-sunil · 2023-11-22T14:29:13Z

Yes, I tried to avoid recreation of classes/converters at some places and it helped a bit but still there are places where I can't get rid of them.

I was going through make_dict_structure_fn function and wanted to clarify -
Should this be inside if condition ?

#fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
script = "\n".join(total_lines)
eval(compile(script, "", "exec"), globs)
if _cattrs_use_linecache:
     fname = generate_unique_filename(cl, "structure", reserve=_cattrs_use_linecache)
     linecache.cache[fname] = len(script), None, total_lines, fname

Do we need this in case of linecache disabled?

Tinche · 2023-11-22T14:42:43Z

Hm, you might be right. We can put that into the if block. If you want to put together a quick PR and the tests pass I'll merge that in. (Make the PR against the v23.2 branch so we can release it quickly.)

It might not help though. If you have classes that you can't control they'll enter that if block regardless.

You're just recompiling the exact same handlers over and over again, right? So the source code will be identical every time. I'm thinking there could be an optimization opportunity here where we could look for the script matching and just reuse the same linecache entry for all handlers. Then its size would be constant instead of O(n).

sahu-sunil · 2023-11-22T15:31:04Z

Okay. In middle of something, won't be able to raise it today. Please see if you can else I will take a look later. Thanks

Tinche · 2023-11-27T22:48:34Z

Try this branch: https://github.com/python-attrs/cattrs/tree/tin/linecache-opt

sahu-sunil · 2023-11-30T13:31:57Z

Yes It is working now. Thank you for quick responses & optimisations.

Tinche · 2023-11-30T15:20:39Z

Cool, will release this soon.

Tinche added the more-info-needed More information required. label Nov 16, 2023

sahu-sunil mentioned this issue Nov 23, 2023

Avoid linecache unique file generation in case of linecache disabled #461

Merged

Tinche linked a pull request Nov 27, 2023 that will close this issue

Optimize linecaching #464

Merged

Tinche closed this as completed Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage after upgrade #445

High CPU usage after upgrade #445

sahu-sunil commented Nov 16, 2023

Tinche commented Nov 16, 2023

sahu-sunil commented Nov 19, 2023

Tinche commented Nov 19, 2023

sahu-sunil commented Nov 22, 2023

Tinche commented Nov 22, 2023

sahu-sunil commented Nov 22, 2023

Tinche commented Nov 27, 2023

sahu-sunil commented Nov 30, 2023 •

edited

Tinche commented Nov 30, 2023

High CPU usage after upgrade #445

High CPU usage after upgrade #445

Comments

sahu-sunil commented Nov 16, 2023

Tinche commented Nov 16, 2023

sahu-sunil commented Nov 19, 2023

Tinche commented Nov 19, 2023

sahu-sunil commented Nov 22, 2023

Tinche commented Nov 22, 2023

sahu-sunil commented Nov 22, 2023

Tinche commented Nov 27, 2023

sahu-sunil commented Nov 30, 2023 • edited

Tinche commented Nov 30, 2023

sahu-sunil commented Nov 30, 2023 •

edited