str() will convert invalid utf8 from bytes object

I've reproduced this behavior on the windows, pyboard (via javascript emulator), and CircuitPython 'atmel-samd' ports.  The output of the 'windows' port is shown below.
If a bytes object contains values that represent invalid utf8 (more specifically, invalid continuation characters), CPython will throw an appropriate exception;
```
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> b = b"\xf0\xe0\xed\xe8"
>>> s = str(b, "utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 0: invalid continuation byte
```
However, MicroPython (at least, the MicroPython ports mentioned above) will happily perform the conversion, with 'interesting' results:
```
MicroPython v1.9.4 on 2018-11-19; win32 version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> b = b"\xf0\xe0\xed\xe8"
>>> s = str(b, "utf8")
>>> len(s)
4
>>> s[0]
'\x00\x00\r\x08'
>>> s[1]
'\x00\r\x08'
>>> s[2]
'\r\x08\x00'
>>> s[3]
'\x08\x00\x16'
```
What's somewhat disturbing is that the value stored to 's[2]' and 's[3]' in the example above appears to contain the contents of memory outside of the original bytes object (vague flashbacks to the now infamous 'Heartbleeds' bug come to mind).  At any rate, this (and similar) example(s) almost certainly ought to trigger an appropriate exception...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

str() will convert invalid utf8 from bytes object #4310

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

str() will convert invalid utf8 from bytes object #4310

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions