Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When json_encode() encounters a string that contains invalid utf8 characters it will return null. #69

Open
sevidmusic opened this issue Jul 26, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sevidmusic
Copy link
Owner

sevidmusic commented Jul 26, 2023

When json_encode() encounters a string that contains
invalid utf8 characters it will return null.

The solution is to use JSON_INVALID_UTF8_SUBSTITUTE to
convert invalid UTF-8 characters to \0xfffd.

@see https://www.php.net/manual/en/json.constants.php
@see https://stackoverflow.com/questions/4663743/how-to-keep-json-encode-from-dropping-strings-with-invalid-characters

The following integration test should pass (it currently fails):

<?php

/**
 * Purpose of this integration test:
 *
 * Test that strings that contain invalid utf8 characters can be
 * encoded as json via a Json instance, and that a Json instance
 * used to encode an strings that contain invalid utf8 character
 * can be decoded back to it's original value via a JsonDecoder.
 *
 */

include(
    str_replace(
        'tests' . DIRECTORY_SEPARATOR . 'integration',
        '',
        __DIR__
    ) .'vendor/autoload.php'
);

use \Darling\PHPJsonUtilities\classes\encoded\data\Json;
use \Darling\PHPJsonUtilities\classes\decoders\JsonDecoder;

/**
 * Return a string composed of a random number of randomly
 * generated characters.
 *
 * Note: The string may contain invalid utf-8 characters.
 *
 * @return string
 *
 * @example
 *
 * ```
 * echo randomChars();
 * // example output: rqEzm*g1vRI7!lz#-%q
 *
 * echo randomChars();
 * // example output: @Lz%R+bgR#79l!mz-
 *
 * ```
 *
 */
function randomChars(): string
{
    $string = str_shuffle(
        'abcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_-=+'
    );
    try {
        $string .=
            random_bytes(random_int(1, 100)) .
            $string .
            random_bytes(random_int(1, 100));
    } catch(\Exception $e) {
    }
    return str_shuffle($string);
}

$string = randomChars();

$jsonEncodedString = new Json($string);

$expectedJsonString = json_encode(
    $string,
    JSON_INVALID_UTF8_SUBSTITUTE,
    2147483647
);

echo "\033[38;5;0m\033[48;5;111mRunning test" . __FILE__ . " \033[48;5;0m";

if(
    $jsonEncodedString->__toString() === $expectedJsonString
) {
    echo "\033[38;5;0m\033[48;5;84mPassed\033[48;5;0m";
} else {
    echo "\033[38;5;0m\033[48;5;196m Failed\033[48;5;0m" . PHP_EOL . PHP_EOL;
}

echo "\033[38;5;0m\033[48;5;45m Expected: \033[48;5;0m" . PHP_EOL . PHP_EOL;
var_dump($expectedJsonString);
echo "\033[38;5;0m\033[48;5;202m Actual: \033[48;5;0m" . PHP_EOL . PHP_EOL;
var_dump($jsonEncodedString->__toString());


Current result:

Expected: 

string(629) ""\u0007\ufffd\u000fau \ufffd\u000f\ufffd%y->k\ufffdu!3\ufffd\u001b6\ufffd&^(X*65jdt\ufffd\nt1-\ufffd\ufffd39\ufffd;$r\ufffdfs6j7o\ufffdg)-\ufffdymv\ufffd*@\u0000\ufffd<`\ufffd\n\ufffdr@\ufffd\u0006r!\u0001_\ufffduc\u0012%\ufffdh;\ufffd5d\ufffdpn^\u0005\ufffdc7\ufffdrq\ufffdi#\u0015\u0014\ufffd@,)9\ufffd)\ufffd+B\ufffd\ufffdmX%wa4_\ufffd\u0012z$b0LnVpz\ufffd8\ufffdlw\u0012.h\u001f\ufffd=(\ufffdqp$\ufffdh\ufffd6\ufffd\u07ce+\fvxe\ufffdo0a`)$l&\fgl\ufffd^nb\ufffd2\ufffd4\ufffd+\ufffdxj\ufffdJ_\ufffd8o+\ufffde\"\ufffd\ufffds#\ufffd\u001a\ufffd\ufffd<\ufffdx1A\ufffd\ufffdfo\ufffd=G\u05a0$\bg>#&W\ufffd25k\ufffdi5F4\ufffd\ufffd""
�[38;5;0m�[48;5;202m Actual: �[48;5;0m

string(0) ""

@sevidmusic sevidmusic added the bug Something isn't working label Jul 26, 2023
@sevidmusic sevidmusic self-assigned this Jul 26, 2023
sevidmusic added a commit that referenced this issue Jul 26, 2023
@sevidmusic
Copy link
Owner Author

sevidmusic commented Jul 26, 2023

As of 0dde4f8 the JSON_INVALID_UTF8_SUBSTITUTE flag is used which fixes the invalid characters.

However, this prevents strings that contain invalid chars from being encoded and decoded properly to their original value.

This needs further investigation.

@sevidmusic sevidmusic pinned this issue Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

1 participant