Skip to content
13 changes: 7 additions & 6 deletions source/fundamentals/bson/utf8-validation.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,16 @@ processing overhead since it needs to check the data.
If you *disable* validation, your application avoids the validation processing
overhead, but cannot guarantee consistent presentation of invalid UTF-8 data.

The driver enables UTF-8 validation by default. It checks documents for any
characters that are not encoded in a valid UTF-8 format when it transfers data
between your application and MongoDB.
By default, the driver enables UTF-8 validation on data from MongoDB.
It checks incoming documents for any characters that are not encoded in a
valid UTF-8 format when it parses data sent from MongoDB to your application.

.. note::

The current version of the {+driver-short+} automatically substitutes
invalid UTF-8 characters with alternate valid UTF-8 ones before
validation when you send data to MongoDB. Therefore, the validation
This version of the {+driver-short+} automatically substitutes invalid
`lone surrogates <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters>`__
with the `replacement character <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toWellFormed>`__
before validation when you send data to MongoDB. Therefore, the validation
only throws an error when the setting is enabled and the driver
receives invalid UTF-8 document data from MongoDB.

Expand Down