Skip to content

Security issue: CWE-791 Incomplete Filtering of Special Elements #1

@psyker156

Description

@psyker156

Hi! I would have reached out privately but I did not had another way to do so. I like your parser! Nice and simple implementation.

There is however a security issue with the parser where it will allow some illegal UTF8 codes to be parsed. In some cases, depending on where and how the parser is used, this could result in bypassing validation etc...
An example of an illegal input that makes it through:
char data[] = {'\xC0', '\xA6', '\x27', '\x27', '\x27', '\x27','\x00'};

0xC0 will allow a 2 byte long character to be decoded but 0xC0 is not legal. Obviously, the resulting impact is all based on where the parser is used.
Potential security issues and spec limitation are documented in the RFC at the following spots:
https://datatracker.ietf.org/doc/html/rfc3629#section-4
https://datatracker.ietf.org/doc/html/rfc3629#section-10

For your information, this is very similar to CVE-2025-1094 (a PostgreSQL vulnerability published last week) .

I found this library as I was looking at various UTF-8 parsers for similar issues.

In the case of utf8 library, this would map to the following CWE:
https://cwe.mitre.org/data/definitions/791.html

If you feel like it, you could create a security advisory, right here on github, for this issue: https://docs.github.com/en/code-security/security-advisories/working-with-repository-security-advisories/creating-a-repository-security-advisory

Thank you for your time and for sharing code with the world!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions