#### PYTHON FUNDAMENTALS | FROM BASICS TO ADVANCED ► CHAPTER 3 ► DATA TYPES
---

This notebook will cover Python **built-in types**. Among others, using built-in types offers the following advantages:
* they make programs easy to write (no need to re-invent the wheel);
* they are often more efficient than custom ones you might be tempted to create (they are actually implemented in `C`);
* you can use them as base type to further extend their behaviour and create custom types.

We can divide built-in types in:

* **numeric types**: int, float, complex, boolean (sub-category of int), fractions, decimal
* **sequence types**: str, list, tuple, ...
* **mapping types**: dict
* **others**: iterator, set, ...

Reference: https://docs.python.org/3.4/library/stdtypes.html

We will in this notebook see how to create such objects, what sort of operations they support and some language idioms. But this is two important to indroduce first two pairs of important notions: **literals/constructors** and **mutable/immutable**.

### I. Literals and constructors
To create an objet from a built-in type, there is fundamentally two approach:
* through a **literal** which is a succinct and easily visible way to create a new value/object;
* through a **constructor** which is a function/method (will cover the topic later on) that produces an object of a certain type.

For example:

In [353]:
# Create a new integer object through a literal
2

2

In [354]:
# Create a new integer object through a constructor
int(2)

2

In [355]:
# List literal
[1, 2, 3]

[1, 2, 3]

In [356]:
# List constructor
list([1, 2, 3])

[1, 2, 3]

You might wonder what is the interest of such **constructors** as **literals** are a far more succinct way to create new objects.

Actually, there is a very useful use case where these constructors are relevant: **type conversion/coercion**.

In [357]:
# Convert a float into an int
int(3.14)

3

In [358]:
# Convert an int into a float
float(2)

2.0

In [359]:
# Convert a tuple into a list
list((1, 2, 'spam'))

[1, 2, 'spam']

In [360]:
# Convert a string into a float
float('3.14')

3.14

### II. Mutable vs. immutable data types
Simply put **mutable** objects can change their value but keep their **id()** (memory location).

In [361]:
# For instance let's create a string an assign it to a variable
my_string = 'internet of things'

# And let's print its memory address (in hexadecimal to make more readable)
hex(id(my_string))

'0x10fbe56a8'

In [362]:
# Now let's modify my_string by concatenating a new word and check its memory address
hex(id(my_string + ' is the inter-networking of physical devices'))

'0x10fbc1dc0'

We see in the example above that modifying the original string actually created a new one at a different memory location.

In Python, **immutable** objects include numbers, strings and tuples. Such an object cannot be altered. In contrast, lists, dictionaries, and sets are **mutable** — they can be changed in place freely.

In [363]:
# Let's create a list
my_list = [1, 3, 'iot', 3.14]
# And check its memory address
hex(id(my_list))

'0x10fbd8688'

In [364]:
# Now let's append a new list element
my_list.append('spam')
my_list

[1, 3, 'iot', 3.14, 'spam']

In [365]:
# Address of modified list is the same
hex(id(my_list))

'0x10fbd8688'

You should wonder at this stage what's the point of having such immutable objects. We will answer this question later on in this course when we cover dictionaries, shared references and functions.

### III. Types and type-specific methods
As we saw earlier, object are just pieces of memory with values and sets of associated operations. 

How can we know which operations an object support without reference to the official documentation or Python book?

Let's see an example with a `list` object:

In [366]:
# Let's create a list object using a literal
my_list = [1, 4, 'spam']
my_list

[1, 4, 'spam']

Once an object is created in can access the operations they support using the following syntax: 

```
my_object.operation()
```

`operation` is called a **method**, we will introduce this notion when we cover Classes and OOP (Object Oriented Programming).

In [367]:
# For instance 
my_list.append('3.14')
print(my_list)

[1, 4, 'spam', '3.14']


So how do we know all operations/methods supported by a list object?

In [368]:
# Option 1 - use completion: press the "tab" key on your keyboard once the cursor located after the "." dot
# following the variable name
# my_list.

In [369]:
# Option 2 - using a Python built-in function named "dir"
dir(my_list)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Using `dir` function to access operations supported you see two distinct groups of methods:

* the one following this syntax `__method__` with a double underscore in front of a name and a double one at the end. They are **special methods** https://docs.python.org/3.4/reference/datamodel.html#special-method-names. We will discuss them when dealing with OOP and classes.

* and all the others that we are interested in right now.

In [370]:
# Now if we want to know what "pop" function does
help(my_list.pop)

Help on built-in function pop:

pop(...) method of builtins.list instance
    L.pop([index]) -> item -- remove and return item at index (default last).
    Raises IndexError if list is empty or index is out of range.



In [371]:
#or
?my_list.pop

We see that the `pop` method will remove and return by default the last item of the list. Let's try it:

In [372]:
my_list.pop()

'3.14'

I invite you to explore operations for the various data types.

### OVERVIEW OF MAIN DATA TYPES
___

### IV. Numeric data types
#### IV.1 Integer `int`
In Python version 2.x, we had two types of integers:

* `int` for integers;
* `long` for integers of unlimited length (the only limit being the size of your machine's memory).

Since Python 3.x versions, there is no more distinction between `int` and `long` types https://www.python.org/dev/peps/pep-0237/. Python will manage the required length for you.

In [373]:
# An arbitrary long integer will be of `type` int as shown below:
type(97865098709874234234098723423509872342340987234234098723598072938742340987209384720394870293847)

int

Such an integer would have had type `long` in Python 2.x

#### IV.2 Real numbers `float`

In [374]:
# Creating a "float" through a literal
f = 4.34567198762
print(f)

4.34567198762


Be aware though that in computing there is this issue of precision due to the floating-point representation of real values. For further information:

1. Wikipedia entry on floatin-point representation: https://en.wikipedia.org/wiki/Floating-point_arithmetic
2. "What Every Programmer Should Know About Floating-Point Arithmetic" http://floating-point-gui.de/
3. Python docs on that issue: https://docs.python.org/3.4/tutorial/floatingpoint.html

To illustrate that issue see below:

In [375]:
0.1 + 0.2

0.30000000000000004

So in general you should ask yourself if that really matters in your situation and if that's the case you can always use a specific `Decimal type` that will fix it https://docs.python.org/3.4/library/decimal.html.

In [376]:
from decimal import Decimal # import Decimal type from decimal package

In [377]:
a = Decimal('0.1')
b = Decimal('0.2')
float(a + b)

0.3

#### IV.3 Booleans `bool`

In [378]:
# This is simply the result of a test whose result is just True or False
1 < 2

True

In [379]:
1 > 3

False

In [380]:
# Automatic conversion of a boolean to an integer in an expression evaluation
(3 > 2) + 1

2

#### IV.4 Operations
You have obviously access to all basic operations on numbers such as:

In [381]:
# Addition
3 + 4.6798

7.6798

In [382]:
# Multiplication
4 * 7

28

And son on...

However, be aware that for Python version 2.x  a division `5/3`  would yield an integer, in that case `1`. To get a "normal" division behaviour just explicitly write that the denominator is a `float`:

```
5 / 3.
or
5 / float(3)
```

That's not the case anymore in Python 3.x

In [383]:
5 / 3

1.6666666666666667

In [384]:
# If you want in Python 3.x an floor division
5 // 3

1

### V. Sequence data types
Sequences are containers, i.e object that holds an arbitrary number of other objects. Generally, containers provide a way to access the contained objects and to iterate over them.

Examples of containers include tuple, list, set, dict; these are the built-in containers. More container types are available in the collections module.

**IMPORTANT**: Sequences are indexed from **0** to the **last** element of the sequence.


#### V.1 String `str`

Strings are **immutable**. They are our first example of what in Python we call a sequence—a positionally ordered collection of other objects.

In [385]:
# A string literal
my_string = 'machine to machine communication'

`str` objects support 44 different operations. You are free to explore them one by one.

In [386]:
# Warning: you are not supposed to understand the expression below but will cover it soon! This is
#          called list comprehension if you are curious.
[item for item in dir(my_string) if '__' not in item] 

['capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [387]:
# Some examples
# Convert to upper string
my_string.upper()

'MACHINE TO MACHINE COMMUNICATION'

In [388]:
# Index of first "t" character - starting from 0
my_string.index('t')

8

In [389]:
# Count of "m" characters
my_string.count('m')

# etc...

4

We've seen so far that we can operate on objects with methods associated, but we can as well use **operators**:

In [390]:
# Concatenation operator
my_string + "quicker and easier"

'machine to machine communicationquicker and easier'

In [391]:
# Membership testing: is substring "machine" in my_string?
'machine' in my_string

True

Last we can operate on object using **built-in functions** such as dir, len, ...
Full list of built-in function: https://docs.python.org/3.4/library/functions.html

In [392]:
# Get the length (number of elements) of a string
len(my_string)

32

So to summarize, once you have created an object, you can operate on it via:
* **methods**: `upper, index, count, ...` and many others;
* **operators**: `+, in, ...`
* and **built-in functions**: `dir, len, ...`

Operators and built-in functions will operate differently according to data types.

#### V.2 Slicing
This is a very important notion in Python, we will explain it using `str` but will be valid to all other type of sequences.

In [393]:
# Let's a create a fresh new string
s = 'internet of things'

In [394]:
# You can access the first element of the list using that notation
s[0]

'i'

In [395]:
# or the third one
s[2]

't'

In [396]:
# Extracting first to third element
s[0:3] 

'int'

In [397]:
# From element index 0 included to element index 12 excluded
s[4:12]

'rnet of '

In [398]:
# From begining to index 3 excluded
s[:3]

'int'

In [399]:
# From index 4 to the end
s[4:]

'rnet of things'

In [400]:
# From beginning to end (shallow copy)
s[::]

'internet of things'

In [401]:
# Slicing specifying index steps
s[5:20:2]

'nto hns'

In [402]:
# From beginning to end by steps of 2 elements
s[::2]

'itre ftig'

In [403]:
# From beginning to element index 8 (excluded) by steps of 3 elements
s[:8:3]

'iee'

In [404]:
# From 2 to end by 3
s[2::3]

'tn  is'

In [405]:
# from element index -10 (from the end) to -7 (excluded)
s[-10:-7]

' of'

In [406]:
# From beginning to index -3 from the end
s[:-3]

'internet of thi'

In [407]:
# From start to end but in reverse order (step= -1). Hence from right to left.
s[::-1]

'sgniht fo tenretni'

In [408]:
# From index 2 to 0 (excluded) in reverse order
s[2:0:-1]

'tn'

In [409]:
# From index 2 to beginning (included) in reverse order
s[2::-1]

'tni'

By now you should get the idea!

#### V.3 List
A `list` is a sequence whose elements can be of various types. Lists are `mutable`, i.e you can modify them in-place (no need to make a copy of the original list).

As a sequence, a `list` type inherits the methods we saw for sequences such as: membership test, length, and many others. Refer to official documentation for more information: 
* https://docs.python.org/3.4/tutorial/introduction.html#lists
* https://docs.python.org/3.4/tutorial/datastructures.html#more-on-lists

In [410]:
# To create an empty list
a = []

In [411]:
type(a)

list

In [412]:
# List elements can be of various type
i = 4
a = [i, 'spam', 3.2, True]
print(a)

[4, 'spam', 3.2, True]


In [413]:
# You can access and slice list element as with str
a[1]

'spam'

In [414]:
a[1:3]

['spam', 3.2]

**But the big difference** is that list are mutables: you can modify them **in-place**.

In [415]:
# Accessing + updating an element (in-place)
a[0] = a[0] + 2
a

[6, 'spam', 3.2, True]

In [416]:
# Insert a new list at index 1
a[1:2] = ['egg', 'spam'] # First remove element at index 1 and insert the new list
print(a)

[6, 'egg', 'spam', 3.2, True]


In [417]:
# Delete elements from index 1 to 3 (excluded)
a[1:3] = []
print(a)

[6, 3.2, True]


In [418]:
# Let's a quick look at available methods
[item for item in dir(a) if '__' not in item] 

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [419]:
# Append a new element
a.append('ip_address')
a

[6, 3.2, True, 'ip_address']

In [420]:
# Remove the last element
a.pop()
print(a)

[6, 3.2, True]


Once again, I let you explore methods available.

#### Small trick to manipulate `str` as if they were mutable.

In [422]:
# Let's create our immutable string
s = 'internet of things'

In [424]:
# First convert your string to a mutable list
s_as_list = s.split() # by default it splits the str using space as separator but it can be changed
s_as_list

['internet', 'of', 'things']

In [426]:
# Now you can update any elements
s_as_list[1] = 'for'
print(s_as_list)

['internet', 'for', 'things']


In [430]:
# Once you have finished to process your list you can convert it back to a `str``
processed_str = " ".join(s_as_list)
print(processed_str)

internet for things


This is interesting when you need to perform a lots of changes in a string. Remember that each time you modify a string you actually create a new one in memory, this is not efficient. Instead, if you convert your string to a list, perform your modifications and then convert it back to a list, you will create simply 2 new objects: the temporary list and the final result as a string.

#### V.4 Tuple
A `tuple` as `list` is a sequence. Like `lists`, `tuples` contain elements of various types but is **immutable**: they cannot be modified https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences.

Why ? We will discuss it when we introduce the `dictionary` type and `shared references`.


In [437]:
# To create an empty tuple
t = ()
print(t)

()


In [438]:
type(t)

tuple

In [440]:
# To create a tupe with a single element
t = (4,)
print(t)

(4,)


Notice the comma `(4,)` at the end. If you don't do so, you will end up with an integer (Python will just evaluate the expression within the parenthesis here an integer). 

In [441]:
# If you don't specify the final comma
t = (4)
print(t)

4


In [442]:
type(t)

int

In [443]:
# But if you have several elements final comma is not required.
t = (4, 'spam', 4.5, True)
print(t)

(4, 'spam', 4.5, True)


In [444]:
# Actually you don't need parenthesis around tuples
t = 4,
print(t)

(4,)


In [446]:
# The same for multiple elements
t = 4, 'spam', 4.3, True
print(t)

(4, 'spam', 4.3, True)


We will see why this is interesting when we cover the notion of tuple unpacking.

### VI.  Mapping data types
Sequence-like data types are very efficient at accessing, modifying and deleting elements as long as you know the indice. However, they are not really adapted for membershipt tests. To illustrate that, let's consider the following situation:

In [469]:
# Define a function testing if an element is member of a list (we do it 5 times for the sake of argument) 
def f(my_list):
    for i in range(5):
        print('x' in a)

In [470]:
# Check if 'x' belong to range(5): [0, 1, 2, 3, 4] 
f(range(5))

False
False
False
False
False


This is pretty fast, but now let's do the same with a list containing the first 50,000,000 integers.

In [479]:
f(range(50000000))

False
False
False
False
False


Now the membership test takes much more time. This is because Python need to go trough the list sequentially from the beginning to the end each time. We will see below how **hash table** and a Python implementation `dict` solve that limitation.

The second limitation of sequence type is the impossibility to access an element by something other than indices. For instance it would be interesting to store and access phone numbers in a sequence using this syntax: `phone_numbers['alex']`. This would raise an error.

The `dict` Python data type as an implementation of **hash table** allows to address both limitations.

#### VI.1 Hash table
In computing, a hash table (hash map) is a data structure which implements an associative array abstract data
type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found https://en.wikipedia.org/wiki/Hash_table.

![Hashing table](img/hash_table_wiki.png)

Let's consider a list of couples, each couple containing a name and an associated phone number:

In [495]:
phone_book = [('John Smith', '521-8976'), ('Lisa Smith', '521-1234'), ('Sandra Dee','521-9655')]
print(phone_book)

[('John Smith', '521-8976'), ('Lisa Smith', '521-1234'), ('Sandra Dee', '521-9655')]


And you want to be able to look up phone number by name. A hash function will assign each **key** (person's name) to a unique bucket. In the specific case of Python, it uses the result of `hash()` as a starting point, it is not the definitive position.

In [496]:
hash('John Smith')

-825961714896365170

And then will use that hash value as a basis to generate a unique index in "buckets" where the associated phone number **value** will be stored. Hence, you can retrieve any phone number (perform test membership) in constant time (time required for the hash function to compute the hash value) as opposed to a list data type. 

In reality, this is slighly more subtle but you get the point. For further detail, either consult the wikipedia entry mentioned above or this blog post: http://interactivepython.org/runestone/static/pythonds/SortSearch/Hashing.html

#### VI.2 Python dictionaries `dict`

In [498]:
# To create a dictionary using a literal:
my_dict = {'John Smith':'521-8976', 'Lisa Smith': '521-1234', 'Sandra Dee': '521-9655'}
print(my_dict)

{'Lisa Smith': '521-1234', 'John Smith': '521-8976', 'Sandra Dee': '521-9655'}


In [499]:
# Or by using 'dict' data type constructor and a list of couples:
phone_book = [('John Smith', '521-8976'), ('Lisa Smith', '521-1234'), ('Sandra Dee','521-9655')]
my_dict = dict(phone_book)
print(my_dict)

{'Lisa Smith': '521-1234', 'John Smith': '521-8976', 'Sandra Dee': '521-9655'}


Four important aspects of a dictionary:
1. they are composed of a series of **keys** and **values**. In our example respectively names and phone numbers;
2. **keys** must be **immutable**;
3. but this is a **mutable** data type;
4. they are not ordered as opposed to sequences (`collections` module provide an ordered version instead).

In [500]:
# Let's add a new key, value pair
my_dict['Peter Dee'] = '521-122'
my_dict

{'John Smith': '521-8976',
 'Lisa Smith': '521-1234',
 'Peter Dee': '521-122',
 'Sandra Dee': '521-9655'}

In [501]:
# Let's remove a key, value pair
del my_dict['Peter Dee']
my_dict

{'John Smith': '521-8976', 'Lisa Smith': '521-1234', 'Sandra Dee': '521-9655'}

In [504]:
# Let's update a value
my_dict['John Smith'] = '521-8999'
my_dict

{'John Smith': '521-8999', 'Lisa Smith': '521-1234', 'Sandra Dee': '521-9655'}

In [505]:
# To perform a membership test
'William' in my_dict

False

In [506]:
# or
'William' not in my_dict

True

In [509]:
# Get all keys as a list
list(my_dict.keys())

['Lisa Smith', 'John Smith', 'Sandra Dee']

In [510]:
# Get all values as a list
list(d.values())

[39, 20]

In [512]:
# Get all key, value couples as a list
list(my_dict.items())

[('Lisa Smith', '521-1234'),
 ('John Smith', '521-8999'),
 ('Sandra Dee', '521-9655')]

### VII. Others
Finally, in that last section, we will introduce two important additional data types, namely `sets` and `iterators`.

#### VII.1 Python sets `set`
**Sets** are unordered set of unique immutable objects. They are mainly useful when:
1. you want to perform membership testing (optimized for that operation)
2. perform set operations (in the mathematical sense) such as union, intersection , difference, ...

In [535]:
# To create a set from literal
s = {1, 2, 3, 'spam'}
s

{1, 2, 3, 'spam'}

In [543]:
# From constructor
l = [1, 2, 3, 3, 'spam']
s = set(l)
s

{1, 2, 3, 'spam'}

Note that duplicated values are removed.

In [544]:
# Add an element
s.add("egg")
s

{1, 2, 3, 'spam', 'egg'}

In [545]:
# Remove an element
s.remove(2)
s

{1, 3, 'spam', 'egg'}

In [546]:
# Add several elements
s.update([5, 6])
s

{1, 3, 5, 6, 'egg', 'spam'}

In [547]:
# Test membership
5 in s

True

A series of mathematical set operations.

In [548]:
# Let's create a second set
s2 = set([1, 5, 'spam'])

In [549]:
# Difference
s - s2

{'egg', 3, 6}

In [550]:
# Union
s | s2 

{1, 'egg', 3, 5, 6, 'spam'}

In [551]:
# Intersection
s & s2

{1, 5, 'spam'}

#### VII.2 Iterators and iterables
In Latin, **iter** relates to notions of route, trip, step, ... 

The concept of **iterable objects** is relatively recent in Python, but it has come to permeate the language’s design. It’s essentially a **generalization** of the notion of sequences. An object is considered **iterable** if it is either a physically stored sequence, or an object that produces one result at a time in the context of an iteration tool like a for loop. 

Examples of **iterables** include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, and file objects. Let's take an example:

In [600]:
# Let's create a list - which is iterable: we can traverse it element by element
my_list = [1, 2, 3, 'spam']
print(my_list)
type(my_list)

[1, 2, 3, 'spam']


list

In [601]:
# We can create an iterator object from iterable in two ways
# 1. Using the built-in function 'iter'
iter(my_list)

<list_iterator at 0x10fc20ba8>

In [581]:
# 2. Or using the special method '__iter__'

In [602]:
my_list.__iter__()

<list_iterator at 0x10fc16ac8>

Note that these two iterators are different - different address in memory.

In [603]:
# Now that let's create
it = iter(my_list)

In [604]:
# And iterate over it element by element
next(it)

1

In [605]:
next(it)

2

In [606]:
next(it)

3

In [607]:
next(it)

'spam'

In [608]:
next(it)

StopIteration: 

Once you reach the end of the list, you will get an exception. Your iterator is exhausted. To iterate again, you need to create a new iterator.

In [609]:
it = iter(my_list)

In [610]:
# Let's use the special '__next__' method instead equivalent to the built-in one 'next'
it.__next__()

1

In [None]:
# etc, etc, ...

In [631]:
# test !!!
l = [1, 2]
it = iter(l)

In [632]:
next(it)

1

In [633]:
del l[1] 

In [634]:
next(it)

StopIteration: 

Using iterators has several advantages:
* it provides a uniform way to run though built-in objects - we will see later on that we can build our own iterable object as well;
* they are more memory efficient

We will see Python statements in the next notebook, but when you use a `for loop` to run through a sequence, it implicitly creates an iterator from your sequence.

In [635]:
for i in [0, 1, 2, 3]:
    print(i)

0
1
2
3


In [638]:
# is 'translated' into
my_list = [0, 1, 2, 3]
it = my_list.__iter__()

In [639]:
it.__next__()

0

In [640]:
it.__next__()

1

...