Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-1635741: port _elementtree to multi-phase init (PEP 489) #23535

Closed
wants to merge 2 commits into from

Conversation

koubaa
Copy link
Contributor

@koubaa koubaa commented Nov 28, 2020

@koubaa
Copy link
Contributor Author

koubaa commented Nov 28, 2020

@vstinner @shihai1991 @corona10 please review

Copy link
Member

@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks like it was a lot of work. Thanks for investing your time to port _elementtree.

I'm a bit worried that the module state getter is going to affect performance negatively. The code has to use the slow _PyType_GetModuleByDef() API. In my patch for the _ssl module I added a module state pointer to instance structs. For example you could add state to typedef struct ElementObject and then reference the state with self->state. This would reduce the amount of get state calls and simplify the code.

What do you think? Is this a safe apprach?

#define ET_STATE_GLOBAL \
((elementtreestate *) PyModule_GetState(PyState_FindModule(&elementtreemodule)))
static inline elementtreestate *
get_elementtree_module_state_by_object(void *object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The qualifier object is not self-explanatory. Every Python object is an object. This includes module objects and type objects. How about:

Suggested change
get_elementtree_module_state_by_object(void *object)
get_elementtree_module_state_by_instance(PyObject *self)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about expend get_element_module_state() to receive self object or type object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran I tried using PyObject * and that gives a lot of compiler warnings (see somem below):

C:\Dev\OSS\cpython\cpython\Modules_elementtree.c(3749,74): warning C4133: 'function': incompatible types - from 'XMLParserObject *' to 'PyObject *' [C:\Dev\OSS\cpython\cpython\PCbuild_elementtree.vcxproj]
C:\Dev\OSS\cpython\cpython\Modules\clinic_elementtree.c.h(26,52): warning C4133: 'function': incompatible types - from 'ElementObject *' to 'PyObject *' [C:\Dev\OSS\cpython\cpython\PCbuild_elementtree.vcxproj]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shihai1991 do you mean two methods, one for each?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. just my personal suggestion :)

Modules/_elementtree.c Show resolved Hide resolved
Modules/_elementtree.c Show resolved Hide resolved
Comment on lines +384 to +386
class _elementtree.Element "ElementObject *" "get_elementtree_module_state_by_object(self)->element_type"
class _elementtree.TreeBuilder "TreeBuilderObject *" "get_elementtree_module_state_by_object(self)->tree_builder_type"
class _elementtree.XMLParser "XMLParserObject *" "get_elementtree_module_state_by_object(self)->xml_parser_type"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? AFAIK the argument clinic code has only access to type, not to self.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiran This is where I got from trial and error, see the implementation of _elementtree_Element_append in Modules/clinic/_elementtree.c.h

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@koubaa
Copy link
Contributor Author

koubaa commented Nov 28, 2020

The patch looks like it was a lot of work. Thanks for investing your time to port _elementtree.

I'm a bit worried that the module state getter is going to affect performance negatively. The code has to use the slow _PyType_GetModuleByDef() API. In my patch for the _ssl module I added a module state pointer to instance structs. For example you could add state to typedef struct ElementObject and then reference the state with self->state. This would reduce the amount of get state calls and simplify the code.

What do you think? Is this a safe apprach?

@tiran I'm not sure if its safe but I understand it in principle. I don't know exactly the lifetime of instance structs when pickling. @vstinner thoughts on that approach?

Is there a benchmark I could try to measure the performance impact? _PyType_GetModuleByDef seems to only be slow if there is significant subclassing so I could try to whip up an example that tries to overdo that.

@erlend-aasland
Copy link
Contributor

This PR introduces a ref. leak (looks similar to the ref. leak I had in PR #23428):

bash$ ./python.exe -m test -F -r -j1 -R 3:5 test_xml_etree_c
Using random seed 1742818
0:00:00 load avg: 1.54 Run tests in parallel using 1 child processes
0:00:14 load avg: 1.50 [  1/1] test_xml_etree_c failed
beginning 8 repetitions
12345678
........
test_xml_etree_c leaked [147, 147, 147, 147, 147] references, sum=735
test_xml_etree_c leaked [59, 59, 59, 59, 59] memory blocks, sum=295

== Tests result: FAILURE ==

1 test failed:
    test_xml_etree_c

Total duration: 14.8 sec
Tests result: FAILURE

@koubaa
Copy link
Contributor Author

koubaa commented Dec 11, 2020

@erlend-aasland oof sorry, I didn't realize you were working on this module too. Any tips to find the cause of the ref leak or which reference is leaking? I'm not too familiar with the diagnostics & telemetry available.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Dec 11, 2020

oof sorry, I didn't realize you were working on this module too.

No problem, you're fine! Feel free to take a look at my PR. I haven't been able to devote any free time to this the last month or so, anyway.

Any tips to find the cause of the ref leak or which reference is leaking? I'm not too familiar with the diagnostics & telemetry available.

Me neither. It kinda stopped when I encountered the ref. leak. I use python.exe -m test -F -r -j1 -R 3:5 test_<mod> to check for reference/memory leaks. I've tried playing with some of the -X options as well, but I haven't found a good way to debug ref leaks yet. I'll keep an eye on this PR to see what you find out :)

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Jan 11, 2021
@iritkatriel
Copy link
Member

https://bugs.python.org/issue1635741 is closed. What is that status of this PR?

@vstinner
Copy link
Member

The change is still relevant, but should use a new issue number.

Moreover, the SC asked to put the conversion of static types to heap types on hold. @encukou and @erlend-aasland wrote https://peps.python.org/pep-0687/ which may unblock the situation but it's still a draft.

@erlend-aasland
Copy link
Contributor

FYI, PEP-687 was just accepted.

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 1, 2022
@erlend-aasland
Copy link
Contributor

Superseded by gh-101285

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants