## Charecter Issue

A string is a sequence of chars. But a char can be denoted in may different ways. Each char is a Unicode charecter (from python3 onwards). Each char also has a unicode notation and a byte notation also.

In [25]:
s = 'café'
len(s)

4

In [26]:
b = s.encode('utf_8')
b

b'caf\xc3\xa9'

In [27]:
len(b)

5

There are 2 basic build-in types for binary sequence, the immutable `bytes` type and the mutable `bytearray`. Each item in both is an integer from 0-255

In [18]:
cafe = bytes('café', encoding='utf_8')
cafe

b'caf\xc3\xa9'

In [19]:
len(cafe)

5

In [20]:
cafe[0]

99

In [21]:
for i in range(len(cafe)):
    print(cafe[i])

99
97
102
195
169


In [22]:
cafe[:1]

b'c'

In [23]:
cafe_arr = bytearray(cafe)
cafe_arr

bytearray(b'caf\xc3\xa9')

In [24]:
cafe_arr[-1:]

bytearray(b'\xa9')

In [28]:
len(cafe_arr)

5

Both bytes and bytearray support every str method except those that do formatting
( format , format_map ) and a few others that depend on Unicode data, including case
fold , isdecimal , isidentifier , isnumeric , isprintable , and encode . This means that
you can use familiar string methods like endswith , replace , strip , translate , upper ,
and dozens of others with binary sequences—only using bytes and not str arguments.
In addition, the regular expression functions in the re module also work on binary
sequences, if the regex is compiled from a binary sequence instead of a str .

A `byte` or `bytearray` can be build in many ways like using the constructors calling classmethods like `byte.fromhex()` etc.

In [31]:
import array
# 'h' denotes short integers
nums = array.array('h', [-2, -1, 0, 1, 2])
octets = bytes(nums)
octets

b'\xfe\xff\xff\xff\x00\x00\x01\x00\x02\x00'

In [30]:
len(octets)

10

Creating bytes/bytearray from buffer like source will always copy the bytes. In contrast `memoryview` objects let you share memory between both. To extract structured information from the binary source use the `struct` module. 

To show the power of struct and memoryview let see an example were we 
took at the header of a gif file.

In [33]:
import struct
# format of the bytes string (gif header)
fmt = '<3s3sHH'
with open('spacehelmet.gif', 'rb') as fp:
    img = memoryview(fp.read())
    
header = img[:10]
bytes(header)

b'GIF89a\xf4\x01\xf4\x01'

In [34]:
struct.unpack(fmt, header)

(b'GIF', b'89a', 500, 500)

In [35]:
del header
del img

### Encoder/Decoder

Pythons comes with 100s of basic encoders and decoder that are used to handle the different types commonly seen in the wild. 

In [36]:
for codec in ['latin_1', 'utf_8', 'utf_16']:
    print(codec, 'El Niño'.encode(codec), sep='\t')

latin_1	b'El Ni\xf1o'
utf_8	b'El Ni\xc3\xb1o'
utf_16	b'\xff\xfeE\x00l\x00 \x00N\x00i\x00\xf1\x00o\x00'


### Problems you might encounter
Python throws a encoding or decoding error `UnicodeEnodeError` or `UnicodeDecodeError`. We will look into how to handle these.