Skip to content

Commit

Permalink
Add a section for types
Browse files Browse the repository at this point in the history
  • Loading branch information
encukou committed Mar 9, 2016
1 parent c92f8c4 commit 33d18c1
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 35 deletions.
41 changes: 6 additions & 35 deletions source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,47 +44,18 @@ The porting process
Read up on the roles of ``six``, ``sixer``, ``modernize``,
``py3c`` and ``pylint --py3k``.

* Define **data types** you are using
* :doc:`Define data types you are using <types>`

The biggest change in Python 3 is handling of the string types.
Python 3 draws a sharp distinction between *text* and *bytes*,
and requires that conversions between these are made explicitly,
with a well-defined encoding.

The consequence is that every value that was ``str`` in Python 2
must now be exactly one of two “stringy” types:
Before porting, it helps to decide, on a big-picture scale,
which data is textual and which is bytes.

*

*text* (known as ``unicode`` in Python 2 and ``str`` in Python 3):
human-readable text represented as a sequence of Unicode
codepoints; usually without embedded NULL characters.

*

``bytes`` – binary serialization format suitable for storing data on
on disk or sending it over the wire, as a sequence of
integers between 0 and 255.
Most data – images, sound, configuration, even text – can be
serialized (encoded) to bytes and deserialized (decoded) from
bytes, using an appropriate protocol such as PNG, VAW, JSON
or UTF-8.

Code that supports both Python 2 and 3 in the same codebase
will “conceptually” use another type:

*

``str`` (the “native string”; text in py3, bytes in py2) – the type
Python uses internally for data like variable and attribute names,
and requires for ``__str__``/``__repr__`` output.

There are other changes to types, but those are generally minor.

Large, complex codebases may benefit from automatic optional type
checking provided by mypy_.
If your project uses verification tools like pylint_, consider adding
mypy to the mix.
Also, static type-checking tools are available to help the porting
process.

* **Modernize** your code

Expand All @@ -111,6 +82,7 @@ The porting process

tests
tools
types

.. comment:
Expand All @@ -123,5 +95,4 @@ The porting process
.. _Python 3 Q & A: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html
.. _Supporting Python 3: http://python3porting.com/
.. _mypy: http://www.mypy-lang.org/
.. _pylint: https://www.pylint.org/
2 changes: 2 additions & 0 deletions source/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ and to check for common errors.
Here is a survey of tools we reccommend.


.. _six:

six
---

Expand Down
76 changes: 76 additions & 0 deletions source/types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
Types
=====

From a developer's point of view, the largest change in Python 3
is that where one would use ``str`` in Python 2,
one needs to explicitly choose between *text* and *bytes*:

*

*text*: human-readable text represented as a sequence of Unicode
codepoints. Usually, it does not contain unprintable control characters
such as NULL.

This type is available as ``str`` in Python 3, and ``unicode``
in Python 2.

In code, we will refer to this type as ``unicode`` – a short, unabmbiguous
name, although one that is not built-in in Python 3.
Some projects refer to it as ``six.text_type`` (from the :ref:`six`
library).

*

*bytes* – binary serialization format suitable for storing data on
on disk or sending it over the wire, as a sequence of
integers between 0 and 255.
Most data – images, sound, configuration info, or *text* – can be
serialized (encoded) to bytes and deserialized (decoded) from
bytes, using an appropriate protocol such as PNG, VAW, JSON
or UTF-8.

In both Python 2.6+ and 3, this type is available as ``bytes``.

Ideally, every “stringy” value will explicitly be one of these types.

XXX: What to do before you start porting

Additionally, code that supports both Python 2 and 3 in the same codebase
can use what is conceptually a third type:

*

The “native string” (``str`` – text in py3, bytes in py2): the type
Python uses internally for data like variable and attribute names,
and requires for ``__str__``/``__repr__`` output.

Besides strings, there are some changes to other core types,
but those are generally minor and usually don't require planning before
you start porting.


Type checking
-------------

Large, complex codebases with stable interfaces may benefit from automatic
optional type checking provided by mypy_.

This allows verifying the expected types of arguments, return values, and
variables, using a syntax such as::

def greeting(name):
# type: (str) -> str
return 'Hello ' + name + '!'

Mypy employs the concept of *gradual typing*: not all types can be specified,
and type checking is simply not done in that case.
Type specifications can be added selectively – for example only to stable
interfaces, or to code that uses strings heavily.

This guide will not go into detail on mypy.
If you are interested, see the `mypy homepage <http://www.mypy-lang.org/>`_
and the `Python 2-compatible typing specification syntax <https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code>`_ (``mypy --py2``).



.. _mypy: http://www.mypy-lang.org/

0 comments on commit 33d18c1

Please sign in to comment.