Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative _io.BufferedReader.tell #95782

Closed
senyai opened this issue Aug 8, 2022 · 2 comments
Closed

Negative _io.BufferedReader.tell #95782

senyai opened this issue Aug 8, 2022 · 2 comments
Labels
extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@senyai
Copy link

senyai commented Aug 8, 2022

Bug report

I believe tell should never be negative, even for /dev/urandom. And @benjaminp wrote ToDo that it shouldn't be negative.

>>> from pathlib import Path
>>> urandom = Path('/dev/urandom').open('rb')
>>> urandom.tell()
0
>>> urandom.read(1)
b'$'
>>> urandom.tell()
-4095

Environment

Fedora linux, python 3.10.6

Linked PRs

@senyai senyai added the type-bug An unexpected behavior, bug, or error label Aug 8, 2022
@corona10
Copy link
Member


Python 3.12.0a0 (heads/main:71c3d649b5, Aug 10 2022, 15:07:52) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pathlib import Path
>>> urandom = Path('/dev/urandom').open('rb')
>>> urandom.tell()
0
>>> urandom.read(1)
b'\x80'
>>> urandom.tell()
-4095
>>>

Reproducible

6t8k added a commit to 6t8k/cpython that referenced this issue Nov 22, 2022
lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
@6t8k
Copy link
Contributor

6t8k commented Nov 22, 2022

This is caused by lseek() always returning 0 for character pseudo-devices like /dev/urandom (for other non-regular files, e.g. /dev/stdin, it always returns -1, to which CPython reacts by raising appropriate exceptions). They are thus technically seekable despite not having seek semantics.

When calling _io.BufferedReader.read(1) it reads ahead, filling its buffer (the size of which is usually being determined by the underlying file's block size as returned by fstat, which it inherits from its file system and is often 4096 bytes), creating a discrepancy between the number of bytes read and the internal tell() always returning 0 for an instance that wraps such a file. I pushed a PR that fixes this, and would appreciate any feedback.


I also bumped into further not necessarily expected behavior, of which I'm not sure if it can be a problem, and which I didn't change:

  • _io.BufferedWriter.tell() and _pyio.BufferedWriter.tell() returning the number of bytes written, seek(0, (_py)io.SEEK_CUR) resetting, which is not supposed to change the offset (both noteworthy in their own right)
  • io.Buffered{Reader,Random}.seek(0, io.SEEK_CUR) returning the number of bytes read, tell() resets it, which is not supposed to change the offset (both noteworthy in their own right)

Demo:

import io
import _pyio as pyio

# io.BufferedWriter and _pyio.BufferedWriter tell() returning number of bytes written, seek(0, io.SEEK_CUR) resets it, which is not supposed to change the offset.
f = open("/dev/urandom", "wb")
print(f.tell()) # 0
print(f.write(b"1")) # 1
print(f.write(b"12")) # 2
print(f.tell()) # 3
print(f.seek(0, io.SEEK_CUR)) # 0
print(f.tell()) # 0
f.close()

print("---")

f = pyio.open("/dev/urandom", "wb")
print(f.tell()) # 0
print(f.write(b"1")) # 1
print(f.write(b"12")) # 2
print(f.tell()) # 3
print(f.seek(0, io.SEEK_CUR)) # 0
print(f.tell()) # 0
f.close()

print("---")

# io.BufferedReader/Random seek(0, io.SEEK_CUR) returning the number of bytes read, tell() resets it. Only the C implementation is affected.
f = open("/dev/urandom", "rb")
print(f.tell()) # 0
print(f.read(1)) # b'\xf1'
print(f.read(2)) # b'X\x1a'
print(f.seek(0, io.SEEK_CUR)) # 3
print(f.tell()) # 0 with the fix, -4093 without
print(f.seek(0, io.SEEK_CUR)) # 0 with the fix, -4093 without
print(f.tell()) # 0 with the fix, -4093 without
f.close()

print("---")

f = pyio.open("/dev/urandom", "rb")
print(f.tell()) # 0
print(f.read(1)) # b'\xaa'
print(f.read(2)) # b'[\x96'
print(f.seek(0, io.SEEK_CUR)) # 0
print(f.tell()) # 0
print(f.seek(0, io.SEEK_CUR)) # 0
print(f.tell()) # 0
f.close()

6t8k added a commit to 6t8k/cpython that referenced this issue Nov 22, 2022
lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
@iritkatriel iritkatriel added the extension-modules C modules in the Modules dir label Nov 24, 2023
serhiy-storchaka pushed a commit that referenced this issue Feb 17, 2024
…ets < 0 (GH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
serhiy-storchaka pushed a commit that referenced this issue Feb 17, 2024
…rn offsets < 0 (GH-99709) (GH-115600)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
serhiy-storchaka pushed a commit that referenced this issue Feb 17, 2024
…rn offsets < 0 (GH-99709) (GH-115599)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
(cherry picked from commit 26800cf)

Co-authored-by: 6t8k <58048945+6t8k@users.noreply.github.com>
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…n offsets < 0 (pythonGH-99709)

lseek() always returns 0 for character pseudo-devices like
`/dev/urandom` (for other non-regular files, e.g. `/dev/stdin`, it
always returns -1, to which CPython reacts by raising appropriate
exceptions). They are thus technically seekable despite not having seek
semantics.

When calling read() on e.g. an instance of `io.BufferedReader` that
wraps such a file, `BufferedReader` reads ahead, filling its buffer,
creating a discrepancy between the number of bytes read and the internal
`tell()` always returning 0, which previously resulted in e.g.
`BufferedReader.tell()` or `BufferedReader.seek()` being able to return
positions < 0 even though these are supposed to be always >= 0.

Invariably keep the return value non-negative by returning
max(former_return_value, 0) instead, and add some corresponding tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-IO type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants