-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Potential memory leak in normalizestring() #77412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Patch is good, but while we're at it, is there any reason why this multi-allocation design was even used? It PyMem_Mallocs a buffer, makes a C-style string in it, then uses PyUnicode_FromString to convert C-style string to Python str. Seems like the correct approach would be to just use PyUnicode_New to preallocate the final string buffer up front, then pull out the internal buffer with PyUnicode_1BYTE_DATA and populate that directly, saving a pointless allocation/deallocation, which also means the failure case means no cleanup needed at all, while barely changing the code (aside from removing the need to explicitly NUL terminate). Only reason I can see to avoid this would be if the codec names could contain arbitrary Unicode encoded as UTF-8 (and therefore strlen wouldn't tell you the final length in Unicode ordinals), but I'm pretty sure that's not the case (if it is, we're not normalizing properly, since we only lower case ASCII). If Unicode codec names need to be handled, there are other options, though the easy savings go away. |
Maybe, we can add "encoding name must be ascii" restriction in future version (3.8+). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: