-
-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ill-formed byte sequences should be validated #30
Comments
The part of C0 (U+0000 - U+000F) and C1(U+0080 – U+009F) are also bypassed. I could not found the range of U+0000 - U+000F in the definition of local part (RFC5322BNF.html) use Egulias\EmailValidator\EmailValidator;
$validator = new EmailValidator;
for ($i = 0; $i < 0x100; ++$i) {
$c = utf8_chr($i);
$email = $c .'test@example.com';
if ($validator->isValid($email)) {
if (preg_match('/\p{Cc}/u', $c)) {
$number = strtoupper(dechex($i));
$length = strlen($number);
$number = str_repeat('0', 4 - $length).$number;
echo 'U+'.$number.' ';
}
}
}
echo PHP_EOL;
function utf8_chr($code_point) {
if ($code_point < 0 || 0x10FFFF < $code_point || (0xD800 <= $code_point && $code_point <= 0xDFFF)) {
return '';
}
if ($code_point < 0x80) {
$hex[0] = $code_point;
$ret = chr($hex[0]);
} else if ($code_point < 0x800) {
$hex[0] = 0x1C0 | $code_point >> 6;
$hex[1] = 0x80 | $code_point & 0x3F;
$ret = chr($hex[0]).chr($hex[1]);
} else if ($code_point < 0x10000) {
$hex[0] = 0xE0 | $code_point >> 12;
$hex[1] = 0x80 | $code_point >> 6 & 0x3F;
$hex[2] = 0x80 | $code_point & 0x3F;
$ret = chr($hex[0]).chr($hex[1]).chr($hex[2]);
} else {
$hex[0] = 0xF0 | $code_point >> 18;
$hex[1] = 0x80 | $code_point >> 12 & 0x3F;
$hex[2] = 0x80 | $code_point >> 6 & 0x3F;
$hex[3] = 0x80 | $code_point & 0x3F;
$ret = chr($hex[0]).chr($hex[1]).chr($hex[2]).chr($hex[3]);
}
return $ret;
} |
Hi @masakielastic ! |
Hi! |
@masakielastic I'll close this, please check version 1.2.7. If you find more issues, please create new ones. |
The validator bypasses Ill-formed byte sequences. The definition of UTF-8 string can be seen in RFC 3629 or "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard (from my answer on stackoverflow).
The way for validating UTF-8 string is using htmlspecialchars or preg_match.
The text was updated successfully, but these errors were encountered: