Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextFileWrapper.read reads more than requested #91

Open
MKuranowski opened this issue Feb 26, 2024 · 0 comments
Open

TextFileWrapper.read reads more than requested #91

MKuranowski opened this issue Feb 26, 2024 · 0 comments

Comments

@MKuranowski
Copy link

MKuranowski commented Feb 26, 2024

Long story short

TextFileWrapper can read more characters than what was requested. While this doesn't sound like an issue from Python it poses a problem for C, where the result of e.g. await afp.read(8192) will be re-written to a static Py_UCS4 buffer[8192]. This has caused MKuranowski/aiocsv#24.

Expected behavior

TextFileWrapper.read should never return more characters than what was requested. This is an explicit requirement for synchronous reads: "Read and return at most size characters from the stream [...]"

Actual behavior

TextFileWrapper can read more characters.

Steps to reproduce

Example text file, unhdr_jpn.txt, UTF-8. Any variable-width encoding triggers this issue, provided that read chunks fall in the middle of a character.

『世界人権宣言』
(1948.12.10 第3回国連総会採択)
〈前文〉
人類社会のすべての構成員の固有の尊厳と平等で譲ることのできない権利とを承
認することは、世界における自由、正義及び平和の基礎であるので、
人権の無視及び軽侮が、人類の良心を踏みにじった野蛮行為をもたらし、言論及
び信仰の自由が受けられ、恐怖及び欠乏のない世界の到来が、一般の人々の最高
の願望として宣言されたので、

Code:

import aiofile
import asyncio

READ_CHUNK = 12

async def main():
    async with aiofile.async_open("unhdr_jpn.txt", encoding="utf-8") as f:
        while data := await f.read(READ_CHUNK):
            print(f"{data=!r} {len(data)=}")
            assert len(data) <= READ_CHUNK

if __name__ == "__main__":
    asyncio.run(main())

Environment info

Kernel version: Linux sedna 6.7.2-arch1-1-surface #1 SMP PREEMPT_DYNAMIC Mon, 29 Jan 2024 23:19:41 +0000 x86_64 GNU/Linux
File system: btrfs

I have been produced this problem with implementations:

  • ✅ `export CAIO_IMPL=linux` - Native linux implementation
  • ✅ `export CAIO_IMPL=thread` - Thread implementation
  • ✅ `export CAIO_IMPL=python` - Pure Python implementation

Additional info

None :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant