Skip to content

Remove bytearray to bytes copies in stdlib using take_bytes #141968

@cmaloney

Description

@cmaloney

Feature or enhancement

Proposal:

This is a meta issue for this one pattern which is fairly small and well contained (ba = bytearray(), later does bytes(ba)). More complex refactors (ex. binacii.b2a_base64 usage) should get their own / separate issues.

Using .take_bytes() (gh-139871) can remove an allocation + copy in cases where a function builds up data in a temporary bytearray then at the end of the function returns bytes(ba). This can significantly speed up some code. For instance gh-141863 for asyncio.streams improved the asyncio_tcp pyperformance benchmark over 10%.

  • base64 _b32encode, _b32decode
  • wave _byteswap
  • encodings.punycode
  • encodings.idna
  • urllib.parse unquote_to_bytes -- The bytearray.extend dramatically dominates time in profile, more complex refactor needed for measurable improvement.
  • re._compiler charmap

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagestdlibStandard Library Python modules in the Lib/ directorytype-refactorCode refactoring (with no changes in behavior)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions