Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ill-formed byte sequences should be validated #30

Closed
masakielastic opened this issue Oct 23, 2014 · 4 comments
Closed

Ill-formed byte sequences should be validated #30

masakielastic opened this issue Oct 23, 2014 · 4 comments
Labels

Comments

@masakielastic
Copy link

The validator bypasses Ill-formed byte sequences. The definition of UTF-8 string can be seen in RFC 3629 or "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard (from my answer on stackoverflow).

$validator = new EmailValidator;
$email = "\x80\x81\x82@\x83\x84\x85.\x86\x87\x88";

var_dump(
    true === $validator->isValid($email)
);

The way for validating UTF-8 string is using htmlspecialchars or preg_match.

function utf8_validate($str) {
    return $str === htmlspecialchars_decode(htmlspecialchars($str, ENT_QUOTES, 'UTF-8'));
}

function utf8_validate2($str) {
    return false !== preg_match('/./u', $str);
}
@masakielastic
Copy link
Author

The part of C0 (U+0000 - U+000F) and C1(U+0080 – U+009F) are also bypassed. I could not found the range of U+0000 - U+000F in the definition of local part (RFC5322BNF.html)

use Egulias\EmailValidator\EmailValidator;

$validator = new EmailValidator;

for ($i = 0; $i < 0x100; ++$i) {

    $c = utf8_chr($i);
    $email = $c .'test@example.com';

    if ($validator->isValid($email)) {

        if (preg_match('/\p{Cc}/u', $c)) {
            $number = strtoupper(dechex($i));
            $length = strlen($number);
            $number = str_repeat('0', 4 - $length).$number;
            echo 'U+'.$number.' ';
        }
    }
}

echo PHP_EOL;

function utf8_chr($code_point) {

    if ($code_point < 0 || 0x10FFFF < $code_point || (0xD800 <= $code_point && $code_point <= 0xDFFF)) {
        return '';
    }

    if ($code_point < 0x80) {
        $hex[0] = $code_point;
        $ret = chr($hex[0]);
    } else if ($code_point < 0x800) {
        $hex[0] = 0x1C0 | $code_point >> 6;
        $hex[1] = 0x80  | $code_point & 0x3F;
        $ret = chr($hex[0]).chr($hex[1]);
    } else if ($code_point < 0x10000) {
        $hex[0] = 0xE0 | $code_point >> 12;
        $hex[1] = 0x80 | $code_point >> 6 & 0x3F;
        $hex[2] = 0x80 | $code_point & 0x3F;
        $ret = chr($hex[0]).chr($hex[1]).chr($hex[2]);
    } else  {
        $hex[0] = 0xF0 | $code_point >> 18;
        $hex[1] = 0x80 | $code_point >> 12 & 0x3F;
        $hex[2] = 0x80 | $code_point >> 6 & 0x3F;
        $hex[3] = 0x80 | $code_point  & 0x3F;
        $ret = chr($hex[0]).chr($hex[1]).chr($hex[2]).chr($hex[3]);
    }

    return $ret;
}

@egulias
Copy link
Owner

egulias commented Oct 26, 2014

Hi @masakielastic !
Great report, thanks!
Sorry for my late response but I've been pretty busy these days.
I'll work on this as soon as I can. If you can provide a PR I'll gladly merge it!

@egulias egulias added the bug label Nov 2, 2014
egulias added a commit that referenced this issue Nov 17, 2014
egulias added a commit that referenced this issue Nov 29, 2014
egulias added a commit that referenced this issue Nov 29, 2014
egulias added a commit that referenced this issue Nov 29, 2014
@egulias
Copy link
Owner

egulias commented Nov 29, 2014

Hi!
Can you check release 1.2.6 and close the issue if corresponds? Thanks!

@egulias
Copy link
Owner

egulias commented Jan 4, 2015

@masakielastic I'll close this, please check version 1.2.7. If you find more issues, please create new ones.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants