-
Notifications
You must be signed in to change notification settings - Fork 209
Description
I won't go into the details but due to a combination of our application not handling utf16 pairs correctly, the non-strictness of json_decode and the old mongo php driver not caring about invalid utf8 sequences, our production DB has a fair bit of documents that cannot be handled by the new mongo php driver.
Simple test script to get you setup with a "broken" document; it does a write and a read with the new and old drivers to illustrate what happens.
<?php
$prefix = "{\"value\":\"";
$postfix = "\"}";
$testString = "\\ud83c \\udd71";
$testString = "$prefix$testString$postfix";
$testString = json_decode($testString, true)["value"];
$updateCriteria = ["_id" => "corrupt bson"];
$mongoManager = new MongoDB\Driver\Manager();
try
{
$bulk = new MongoDB\Driver\BulkWrite();
$bulk->update($updateCriteria, ['$set' => ["TestStringNew" => $testString]], ["upsert" => true]);
$result = $mongoManager->executeBulkWrite("database.collection", $bulk);
if($result->getMatchedCount() == 0)
{
print("Couldn't update account\n");
}
else
{
print("Update success\n");
}
}
catch(MongoDB\Driver\Exception\Exception $e)
{
echo get_class($e), ": ", $e->getMessage(), " - ", $e->getCode(), "\n";
}
print("Update with old driver:\n");
$legacyClient = new MongoClient();
try
{
$result = $legacyClient->database->collection->update($updateCriteria, ['$set' => ["TestStringOld" => $testString]], ["upsert" => true]);
if(isset($result['err']) || $result['n'] == 0)
{
print("Couldn't update account\n");
}
else
{
print("Update success\n");
}
}
catch(MongoException $e)
{
echo get_class($e), ": ", $e->getMessage(), " - ", $e->getCode(), "\n";
}
print("Find with new driver:\n");
try
{
$query = new MongoDB\Driver\Query($updateCriteria, ["projection" => ["TestStringNew" => true, "TestStringOld" => true]]);
$cursor = $mongoManager->executeQuery("database.collection", $query);
foreach($cursor as $account)
{
print(json_encode($account));
print("\n");
}
}
catch(MongoDB\Driver\Exception\Exception $e)
{
echo get_class($e), ": ", $e->getMessage(), " - ", $e->getCode(), "\n";
}
print("Find with old driver:\n");
try
{
$account = $legacyClient->database->collection->findOne($updateCriteria, ["TestStringOld" => true, "TestStringNew" => true]);
if(is_null($account))
{
print("null account?");
}
else
{
var_dump($account);
}
}
catch(MongoException $e)
{
echo get_class($e), ": ", $e->getMessage(), " - ", $e->getCode(), "\n";
}
?>
The output when trying to write with the new driver will give something like MongoDB\Driver\Exception\UnexpectedValueException: Got invalid UTF-8 value serializing '��� ���'
which isn't super duper helpful if you don't already know what's wrong but it at least tries to print the unexpected value for my debugging pleasure.
The output when trying to read with the new driver will give something like
MongoDB\Driver\Exception\UnexpectedValueException: Detected corrupt BSON data
. Note that the unexpected value is not printed at all. For large documents it can be quite time consuming to try and figure out where the invalid UTF8 is. I propose enhancing this error message like Detected corrupt BSON data while deserializing property 'FirstThing.SecondThing.Whatever'