Skip to content

read_to_string doesn't handle BOM #45066

@steveklabnik

Description

@steveklabnik

Apparently, UTF-8 allows, but does not require, a BOM.

When calling read_to_string on such a file, it panics with StringError("stream did not contain valid UTF-8")

Python, however:

>>> open('before.csv', encoding='utf-8').read()
'\ufeffFirst Name,Last Name,Age,City,Eyes color,Species\nJohn,Doe,32,Tokyo,Blue,Human\nFlip,Helm,12,Canberra,Red,Unknown
\nTerdos,Bendarian,165,Cracow,Blue,Magic tree\nDominik,Elpos,33,Paris,Purple,Orc\nBrad,Doe,42,Dublin,Blue,Human\nEwan,Gr
ath,51,New Delhi,Green,Human\n'

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions