Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_PyStaticUnicode_Dealloc should not exist. #96458

Closed
markshannon opened this issue Aug 31, 2022 · 12 comments
Closed

_PyStaticUnicode_Dealloc should not exist. #96458

markshannon opened this issue Aug 31, 2022 · 12 comments
Assignees
Labels
3.12 bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@markshannon
Copy link
Member

Static data doesn't need to be freed. In fact, in cannot be freed.

It appears that the utf field is dynamically allocated on static strings. Since this field is only needed for non-ascii strings, it should be static as well.
Then _PyStaticUnicode_Dealloc can be deleted.

@markshannon markshannon added the type-bug An unexpected behavior, bug, or error label Aug 31, 2022
@markshannon
Copy link
Member Author

@gvanrossum
Copy link
Member

Seems this was introduced in gh-32032. Maybe @kumaraditya303 understands why it is needed or how it could be avoided? It's only use seems to be deepfreeze.c. See also #91240, originally reported by @jkloth.

@ericsnowcurrently
Copy link
Member

ericsnowcurrently commented Aug 31, 2022

"Dealloc" in the name is certainly misleading. "Fini" would be more appropriate (and follow convention).

As to the function being unnecessary, it looks like you are right.

PyUnicode_AsUTF8AndSize() (and derivatives like PyUnicode_AsUTF8()) populates that "utf8" field, but only for non-ascii strings.
If we were statically initializing any non-ascii strings, we'd initialize the "utf8" and "utf8_length" fields, so we still wouldn't need _PyStaticUnicode_Dealloc().

That said, would it be worth keeping a safeguard somewhere, to ensure we don't later add non-ascii strings where "utf8" isn't statically initialized?

@gvanrossum
Copy link
Member

So is your proposal to just add the compact->utf8 initialization to deepfreeze.c, for non-ascii strings?

Re: safeguard, I'm not sure why we'd need that, given that this is all generated code. Nobody in their right mind statically initializes unicode objects (the deepfreeze.py script has no mind :-).

@ericsnowcurrently
Copy link
Member

So is your proposal to just add the compact->utf8 initialization to deepfreeze.c, for non-ascii strings?

I'm agreeing with Mark that we can eliminate _PyStaticUnicode_Dealloc().

Re: safeguard, I'm not sure why we'd need that

Yeah, keeping a safeguard doesn't seem that important. I'll strike that part of my earlier comment to reduce the noise.

@gvanrossum
Copy link
Member

I'm agreeing with Mark that we can eliminate _PyStaticUnicode_Dealloc().

How? Just delete the function and its calls? There was a reason it was put in (see #91240), related to leak detection tools.

@ericsnowcurrently
Copy link
Member

tl;dr _PyStaticUnicode_Dealloc() didn't actually need to do anything with the utf8 field, and the PEP 623 PR dropped the other thing we were cleaning up. So _PyStaticUnicode_Dealloc() doesn't actually do anything any more.


There are a couple changes to consider here:

  1. when _PyStaticUnicode_Dealloc() was added (gh-32032), it cleared PyASCIIObject.wstr and, for non-ascii strings, PyCompactUnicodeObject.utf8
  2. a few months later, PyASCIIObject.wstr was removed by PEP 623, in gh-92537 (leaving _PyStaticUnicode_Dealloc() to only (maybe) clear PyCompactUnicodeObject.utf8)

When I tried building main on Windows with the _PyStaticUnicode_Dealloc() line commented out in deepfreeze.py, it showed 0 leaks. When I did the same with the original change (1), it showed the reported leak. Doing the same against the PEP 623 change (2) did not show that leak.

This implies a number of possibilities:

  1. _PyStaticUnicode_Dealloc() isn't needed any more
    a. it only needed to clear wstr (which no longer exists)
    b. something else changed
  2. there was a non-ascii string getting frozen that was removed between (1) and (2)
  3. something else happened
  4. I did something wrong

I'm confident it is (1a). We are currently freezing 6 non-ascii strings. (Look for .ascii = 0 in deepfreeze.c.) So I would have seen a leak when I commented out _PyStaticUnicode_Dealloc().

Thus we should be okay to drop _PyStaticUnicode_Dealloc().

@jkloth
Copy link
Contributor

jkloth commented Sep 1, 2022

As long as test_embed still passes on Windows and Linux with _PyStaticUnicode_Dealloc() removed, all is good.

@gvanrossum
Copy link
Member

1a sounds good. Let’s delete this function. Who will make the PR?

@kumaraditya303
Copy link
Contributor

1a will leak memory. I have tried it on Linux.

#96481 fixes this by statically initalizing utf8 representation.

@ericsnowcurrently
Copy link
Member

1a will leak memory. I have tried it on Linux.

How are you checking? I did the following (after commenting out the _PyStaticUnicode_Dealloc() line in deepfreeze.py):

$ make -j8
...
$ ./python -X showrefcount -I -c pass
[0 refs, 0 blocks]

#96481 fixes this by statically initalizing utf8 representation.

+1

@gvanrossum
Copy link
Member

Since utf8 this is a cached value that's only set when needed, you'll have to execute some code that actually needs the utf8 encoding of one of the 6 non-ascii strings in deepfreeze.c. Or run the full leak discovery test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants