Skip to content

Commit

Permalink
pythongh-108590: Improve sqlite3 docs on encoding issues and how to h…
Browse files Browse the repository at this point in the history
…andle those (python#108699)

Add a guide for how to handle non-UTF-8 text encodings.
Link to that guide from the 'text_factory' docs.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Corvin <corvin@corvin.dev>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
  • Loading branch information
6 people committed Oct 25, 2023
1 parent 81ed80d commit 1262e41
Showing 1 changed file with 50 additions and 33 deletions.
83 changes: 50 additions & 33 deletions Doc/library/sqlite3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1154,6 +1154,10 @@ Connection objects
f.write('%s\n' % line)
con.close()

.. seealso::

:ref:`sqlite3-howto-encoding`


.. method:: backup(target, *, pages=-1, progress=None, name="main", sleep=0.250)

Expand Down Expand Up @@ -1220,6 +1224,10 @@ Connection objects

.. versionadded:: 3.7

.. seealso::

:ref:`sqlite3-howto-encoding`

.. method:: getlimit(category, /)

Get a connection runtime limit.
Expand Down Expand Up @@ -1441,39 +1449,8 @@ Connection objects
and returns a text representation of it.
The callable is invoked for SQLite values with the ``TEXT`` data type.
By default, this attribute is set to :class:`str`.
If you want to return ``bytes`` instead, set *text_factory* to ``bytes``.

Example:

.. testcode::

con = sqlite3.connect(":memory:")
cur = con.cursor()

AUSTRIA = "Österreich"

# by default, rows are returned as str
cur.execute("SELECT ?", (AUSTRIA,))
row = cur.fetchone()
assert row[0] == AUSTRIA

# but we can make sqlite3 always return bytestrings ...
con.text_factory = bytes
cur.execute("SELECT ?", (AUSTRIA,))
row = cur.fetchone()
assert type(row[0]) is bytes
# the bytestrings will be encoded in UTF-8, unless you stored garbage in the
# database ...
assert row[0] == AUSTRIA.encode("utf-8")

# we can also implement a custom text_factory ...
# here we implement one that appends "foo" to all strings
con.text_factory = lambda x: x.decode("utf-8") + "foo"
cur.execute("SELECT ?", ("bar",))
row = cur.fetchone()
assert row[0] == "barfoo"

con.close()
See :ref:`sqlite3-howto-encoding` for more details.

.. attribute:: total_changes

Expand Down Expand Up @@ -1632,7 +1609,6 @@ Cursor objects
COMMIT;
""")


.. method:: fetchone()

If :attr:`~Cursor.row_factory` is ``None``,
Expand Down Expand Up @@ -2611,6 +2587,47 @@ With some adjustments, the above recipe can be adapted to use a
instead of a :class:`~collections.namedtuple`.


.. _sqlite3-howto-encoding:

How to handle non-UTF-8 text encodings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, :mod:`!sqlite3` uses :class:`str` to adapt SQLite values
with the ``TEXT`` data type.
This works well for UTF-8 encoded text, but it might fail for other encodings
and invalid UTF-8.
You can use a custom :attr:`~Connection.text_factory` to handle such cases.

Because of SQLite's `flexible typing`_, it is not uncommon to encounter table
columns with the ``TEXT`` data type containing non-UTF-8 encodings,
or even arbitrary data.
To demonstrate, let's assume we have a database with ISO-8859-2 (Latin-2)
encoded text, for example a table of Czech-English dictionary entries.
Assuming we now have a :class:`Connection` instance :py:data:`!con`
connected to this database,
we can decode the Latin-2 encoded text using this :attr:`~Connection.text_factory`:

.. testcode::

con.text_factory = lambda data: str(data, encoding="latin2")

For invalid UTF-8 or arbitrary data in stored in ``TEXT`` table columns,
you can use the following technique, borrowed from the :ref:`unicode-howto`:

.. testcode::

con.text_factory = lambda data: str(data, errors="surrogateescape")

.. note::

The :mod:`!sqlite3` module API does not support strings
containing surrogates.

.. seealso::

:ref:`unicode-howto`


.. _sqlite3-explanation:

Explanation
Expand Down

0 comments on commit 1262e41

Please sign in to comment.