### Item 3: Know the Differences Between bytes, str, and unicode

Focus only on Python 3.

In Python 3, there are two types that represent sequences of characters: 
* bytes and str
    - Instances of `bytes` contain raw 8-bit values.
    - Instances of `str` contain Unicode characters.


* There are many ways to represent Unicode characters as binary data (row 8-bit values).
* The most common encoding is `UTF-8`

Important!

* `str` instances in Python 3 do not have an associated binary encoding.
    * To convert Unicode characters to binary data, you must use the `encode` method.
    * To convert binary data to Unicode characters, you must use the `decode` method.

* When you're writing Python programs, it's important to do encoding and decoding of Unicode at the furthest boundary of your interfaces.

* The core of your program should use Unicode character types (`str` in Python 3) and should not assume anything about character encodings.
* This approach allows you to be very accepting an alternative text encodings while being stric about your putput text encoding (ideally, UTF-8)
    * e.g.
        * Latin-1
        * Shift
        * JIS
        * Big5
        * MS-DOS (cp437)

* The split between character types leads to two common situations in Python code:
    * Operate on row 8-bit values that are UTF-8-encoded characters (or some other encoding).
    * Operate on Unicode characters that have no specific encoding.    

#### You'll often need these two helper functions

* take a `str` or `bytes` and always returns a `str`

In [None]:
def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value  # instance of str

* take a `str` or `bytes` and always returns a `bytes`

In [None]:
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # instance of bytes

* data is being opened in write binary mode ('wb')

In [None]:
import os

with open('random.bin', 'wb') as f:
    f.write(os.urandom(10))

### Things to Remember

* In Python 3, `bytes` contains sequences of 8-bit values, `str` contains sequences of Unicode characters. 
    * bytes and str instances can’t be used together with operators (like > or +).


* Use helper functions to ensure that the inputs you operate on are the type of character sequence you expect.
    * 8-bit values, UTF-8 encoded characters, Unicode characters, etc.
    

* If you want to read or write binary data to/from a file, always open the file using a `binary` mode.
    * like 'rb' or 'wb'