Skip to content

Conversation

@cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Oct 26, 2025

While readinto implementations should return a count of bytes between 0 and the length of the given buffer, they are not required to. Add validation inside RawIOBase.read that the returned byte count is reasonable.

While `RawIOBase.readinto` should return a count of bytes between 0 and
the length of the given buffer, it is not required to. Add validation
inside RawIOBase.read that the returned byte count is reasonable.
@cmaloney
Copy link
Contributor Author

It looks like the IOBase C implementation has never checked range here. I think it's reasonable to back port to bugfix releases as a ValueError could be raised in other cases (not adding a new exception; just more cases) and a bad value allows access to unallocated memory. Cherry picks will likely need to be manual to cross the I/O test refactoring.

The _pyio change could also potentially use a n = n.__index__() to make sure the .readinto return is an integer at that point; not sure it's worth adding though.

cc: @vstinner

cmaloney and others added 2 commits October 26, 2025 12:00
Co-authored-by: Shamil <ashm.tech@proton.me>
@cmaloney cmaloney requested a review from ashm-dev October 26, 2025 19:03
Copy link
Member

@efimov-mikhail efimov-mikhail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 944 to 954
if (bytes_filled == -1 && PyErr_Occurred()) {
Py_DECREF(b);
return NULL;
}
if (bytes_filled < 0 || bytes_filled > n) {
Py_DECREF(b);
PyErr_Format(PyExc_ValueError,
"readinto returned '%zd' outside buffer size '%zd'",
bytes_filled, n);
return NULL;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (bytes_filled == -1 && PyErr_Occurred())
    goto error;

if (bytes_filled < 0 || bytes_filled > n) {
    PyErr_Format(PyExc_ValueError,
                 "readinto returned '%zd' outside buffer size '%zd'",
                 bytes_filled, n);
    goto error;
}

error:
    Py_DECREF(b);
    return NULL;

maybe do it this way to remove duplication and improve readability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a personal preference not to use gotos although agree this function is one where Py_DECREF(b); return res; would deduplicate and reduce lines of code.

To do that the Py_DECREF(res); would need to turn into Py_CLEAR(res) (so res is set to NULL) + a couple cases earlier could use it as well... Not planning to do that but can if there's a strong press to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goto

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kidding. In general, I also like using goto in C to factorize code to cleanup resources before exit. Here it's just a matter of taste. I'm fine with the current code 😀

cmaloney and others added 2 commits October 27, 2025 10:05
Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner vstinner removed needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Oct 27, 2025
@vstinner
Copy link
Member

I think it's reasonable to back port to bugfix releases as a ValueError could be raised in other cases (not adding a new exception; just more cases) and a bad value allows access to unallocated memory.

I prefer to not backport the change. Existing code may just work by luck and this change making Python more strict can break it.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

…OZGxS.rst

Co-authored-by: Victor Stinner <vstinner@python.org>
Copy link
Contributor

@ashm-dev ashm-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner enabled auto-merge (squash) October 27, 2025 17:43
@vstinner vstinner merged commit 0f0a362 into python:main Oct 27, 2025
45 checks passed
@cmaloney cmaloney deleted the gh-140607 branch October 27, 2025 18:22
@vstinner vstinner added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Oct 28, 2025
@miss-islington-app
Copy link

Thanks @cmaloney for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @cmaloney for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @cmaloney and @vstinner, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 0f0a362768aecb4c791724cce486d8317533a94d 3.13

@miss-islington-app
Copy link

Sorry, @cmaloney and @vstinner, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 0f0a362768aecb4c791724cce486d8317533a94d 3.14

@vstinner
Copy link
Member

I prefer to not backport the change. Existing code may just work by luck and this change making Python more strict can break it.

I read again the issue and I missed the AddressSanitizer report in the issue. It changed my mind and I know agree that it's better to backport this change.

@cmaloney: Can you try to backport the change to 3.14? The automated backport failed.

@cmaloney
Copy link
Contributor Author

👍 will work on backports today

@bedevere-app
Copy link

bedevere-app bot commented Oct 28, 2025

GH-140728 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 28, 2025
cmaloney added a commit to cmaloney/cpython that referenced this pull request Oct 28, 2025
pythonGH-140611)

While `RawIOBase.readinto` should return a count of bytes between 0 and
the length of the given buffer, it is not required to. Add validation
inside RawIOBase.read() that the returned byte count is valid.
(cherry picked from commit 0f0a362)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>
cmaloney added a commit to cmaloney/cpython that referenced this pull request Oct 28, 2025
pythonGH-140611)

While `RawIOBase.readinto` should return a count of bytes between 0 and
the length of the given buffer, it is not required to. Add validation
inside RawIOBase.read() that the returned byte count is valid.
(cherry picked from commit 0f0a362)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>
@bedevere-app
Copy link

bedevere-app bot commented Oct 28, 2025

GH-140730 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 28, 2025
vstinner added a commit that referenced this pull request Oct 29, 2025
…140611) (#140728)

* [3.14] gh-140607: Validate returned byte count in RawIOBase.read (GH-140611)

While `RawIOBase.readinto` should return a count of bytes between 0 and
the length of the given buffer, it is not required to. Add validation
inside RawIOBase.read() that the returned byte count is valid.
(cherry picked from commit 0f0a362)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>

* fixup: Use older attribute name

---------

Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this pull request Oct 29, 2025
…140611) (#140730)

* [3.13] gh-140607: Validate returned byte count in RawIOBase.read (GH-140611)

While `RawIOBase.readinto` should return a count of bytes between 0 and
the length of the given buffer, it is not required to. Add validation
inside RawIOBase.read() that the returned byte count is valid.
(cherry picked from commit 0f0a362)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>

* fixup: Use older attribute name

---------

Co-authored-by: Shamil <ashm.tech@proton.me>
Co-authored-by: Victor Stinner <vstinner@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants