# Python: Data Containers

## 1. List
A `list` object stores a set of ordered items, each item can be of any data type. Lists are mutable. A list can be defined using square brackets `[]` or using its contruction function.

In [1]:
# convert a string to a list
list('python')

['p', 'y', 't', 'h', 'o', 'n']

In [2]:
# storing different data types
['python', 3.14, None]

['python', 3.14, None]

### 1.1. Item addition

The `list.append()` method adds a single object to the end of the list.

In [3]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats.append('leopard')
cats

['jaguar', 'tiger', 'lion', 'panther', 'leopard']

The `list.insert()` method inserts a single object into the list at a given index.

In [4]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats.insert(2, 'puma')
cats

['jaguar', 'tiger', 'puma', 'lion', 'panther']

In [5]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats.insert(2, ['cheetah', 'puma'])
cats

['jaguar', 'tiger', ['cheetah', 'puma'], 'lion', 'panther']

The `list.extend()` methods add elements from the given iterable to the end of the list.

In [6]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats.extend('puma')
cats

['jaguar', 'tiger', 'lion', 'panther', 'p', 'u', 'm', 'a']

In [7]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats.extend(['cheetah', 'puma'])
cats

['jaguar', 'tiger', 'lion', 'panther', 'cheetah', 'puma']

Python supports concatenating lists using the plus sign `+`.

In [8]:
cats = ['jaguar', 'tiger', 'lion', 'panther']
cats + ['cheetah', 'puma']

['jaguar', 'tiger', 'lion', 'panther', 'cheetah', 'puma']

In [9]:
['jaguar', 'tiger', 'lion', 'panther'] + ['cheetah', 'puma']

['jaguar', 'tiger', 'lion', 'panther', 'cheetah', 'puma']

The `list.remove()` method deletes the first occurrence of the given item.

In [10]:
cats = ['tiger', 'python', 'lion', 'leopard']
cats.remove('python')
cats

['tiger', 'lion', 'leopard']

The `list.pop()` method deletes an item at the given index, and returns that deleted item.

In [11]:
cats = ['tiger', 'python', 'lion', 'leopard']
deleted = cats.pop(1)
print(cats)
print(deleted)

['tiger', 'lion', 'leopard']
python


Python also supports using the `del` statement to delete items at specified index.

In [12]:
cats = ['tiger', 'python', 'lion', 'leopard']
del cats[::2]
cats

['python', 'leopard']

### 1.2. List slicing
List slicing works the same as string slicing: `[start:stop:step]`.

#### Slicing review

In [13]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[1:7:2]

[1, 3, 5]

In [14]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[7]

7

In [15]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[7:8]

[7]

In [16]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[:7] + digits[7:]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [17]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[::-1]

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

In [18]:
nested_list = [['tiger', 5], ['lion', 4], ['jaguar', 6]]
nested_list[1][0]

'lion'

#### Item assignment
A technique works only on mutable sequences.

In [19]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[7] = None
digits

[0, 1, 2, 3, 4, 5, 6, None, 8, 9]

In [20]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
digits[1:6] = [True, False]
digits

[0, True, False, 6, 7, 8, 9]

### 1.3. List manipulation
A function/method usually does only one of these:
- Returning the output as a new object.
- Alter the input data in-place.

In [21]:
# sort alphabetically, icreasingly
digits = [5, 2, 1, 6, 3, 0, 7, 9, 4, 8]
digits.sort()
digits

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [22]:
# sort by item length, decreasingly
cats = ['tiger', 'lion', 'panther', 'leopard', 'jaguar', 'cheetah']
cats.sort(key=len, reverse=True)
cats

['panther', 'leopard', 'cheetah', 'jaguar', 'tiger', 'lion']

In [23]:
my_list = ['a', 'n', 'a', 'c', 'o', 'n', 'd', 'a']
my_list.count('a')

3

In [24]:
cats = ['tiger', 'lion', 'panther', 'leopard', 'jaguar', 'cheetah']
cats.index('panther')

2

In [25]:
cats = ['tiger', 'lion', 'panther', 'leopard', 'jaguar', 'cheetah']
len(cats)

6

In [26]:
cats = ['tiger', 'lion', 'panther', 'leopard', 'jaguar', 'cheetah']
min(cats)

'cheetah'

In [27]:
cats = ['tiger', 'lion', 'panther', 'leopard', 'jaguar', 'cheetah']
max(cats)

'tiger'

In [28]:
digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
sum(digits)

45

### 1.4. Tuple
`tuple` is the immutable version of `list`. Therefore, the in-place methods of `list` cannot be apply to `tuple`.

#### Creating tuples
Either the parentheses `()` or the `tuple()` constructor can be used to create `tuple` objects.

In [29]:
('tiger', 'lion', 'jaguar')

('tiger', 'lion', 'jaguar')

In [30]:
cats = ('tiger', 'lion', 'jaguar')
cats

('tiger', 'lion', 'jaguar')

In [31]:
tuple_1 = ('python')
tuple_2 = ('python',)
type(tuple_1) == type(tuple_2)

False

In [32]:
tuple([2, 3, 5])

(2, 3, 5)

In [33]:
tuple('235')

('2', '3', '5')

#### Packing and unpacking
Packing is the exclusive technique of `tuple`. Unpacking works on all list-like objects.

In [34]:
# packing
rectangle = 5, 8, 40
rectangle

(5, 8, 40)

In [35]:
# packing
rectangle = 5, 8, 40

# unpacking
width, length, area = rectangle

print(f'Area = Width * Length = {width} * {length} = {area}')

Area = Width * Length = 5 * 8 = 40


### 1.5. Range
The `range()` function generates a list of equally spaced integers.

In [36]:
digits = range(10)
digits

range(0, 10)

In [37]:
range(4, 16, 2)

range(4, 16, 2)

In [38]:
# range slicing
digits = range(10)
digits[2]

2

Converting an `range` object to `list` shows its elements.

In [39]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## 2. Set
`set` can be thought of `list` with no duplicates. `set` objects can be defined using braces `{}` or its constructor.

### 2.1. Creating sets

In [40]:
{1, 1, 2, 2, 2, 3}

{1, 2, 3}

In [41]:
set([1, 3, 2, 3, 1, 1])

{1, 2, 3}

In [42]:
{1, 2, 3} == {3, 2, 1}

True

### 2.2. Set manipulation

#### Single set

In [43]:
my_set = {1, 2, 3, 4, 5}
my_set.clear()
my_set

set()

In [44]:
my_set = {1, 2, 3, 4, 5}
my_set.add(6)
my_set

{1, 2, 3, 4, 5, 6}

In [45]:
my_set = {1, 2, 3, 4, 5}
my_set.discard(5)
my_set

{1, 2, 3, 4}

:::{note}

`set.remove()` & `set.discard()` is a pair of methods share the same functionality but one method raises error (the given item is not in the set), the other tries to legalize that error (doing nothing).

:::

In [46]:
sum({1, 2, 3, 4, 5})

15

In [47]:
min({1, 2, 3, 4, 5})

1

In [48]:
len({1, 2, 3, 4, 5})

5

#### Working with two sets

In [49]:
a = {0, 1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8, 9}
a.union(b)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

In [50]:
a = {0, 1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8, 9}
a.intersection(b)

{3, 4, 5}

In [51]:
a = {0, 1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8, 9}
a.difference(b)

{0, 1, 2}

In [52]:
a = {0, 1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8, 9}
a.symmetric_difference(b)

{0, 1, 2, 6, 7, 8, 9}

In [53]:
a = {1, 2, 3}
b = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
a.issubset(b)

True

In [54]:
a = {1, 2, 3}
b = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b.issuperset(a)

True

## 3. Dictionary
Dictionary is an mutable 2-dimentional obejct containing key-value pairs, where keys are unique.

### 3.1. Creating dictionaries
`dict` objects are defined using braces `{}` and use colons `:` on splitting key-value pairs or using its constructor.

In [55]:
{'python': 6, 'jupyter': 0, 'jupyter': 7, 'kaggle': 6}

{'python': 6, 'jupyter': 7, 'kaggle': 6}

In [56]:
dict(python=6, jupyter=7, kaggle=6)

{'python': 6, 'jupyter': 7, 'kaggle': 6}

In [57]:
dict([['python', 6], ['jupyter', 7], ['kaggle', 6]])

{'python': 6, 'jupyter': 7, 'kaggle': 6}

In [58]:
{1: 2, 3: 4} == {3: 4, 1: 2}

True

#### Zipping
The `zip()` function creates a nested tuple from the given iterables where their items come in pairs.

In [59]:
zipped = zip(['a', 'b', 'c'], [1, 2, 3])
list(zipped)

[('a', 1), ('b', 2), ('c', 3)]

In [60]:
zipped = zip(['a', 'b', 'c'], [1, 2, 3])
dict(zipped)

{'a': 1, 'b': 2, 'c': 3}

### 3.2. Dictionary manipulation

#### Keys and values

In [61]:
my_dict = dict([['python', 6], ['jupyter', 7], ['kaggle', 6]])

Extracting keys and values.

In [62]:
my_dict.keys()

dict_keys(['python', 'jupyter', 'kaggle'])

In [63]:
my_dict.values()

dict_values([6, 7, 6])

In [64]:
my_dict.items()

dict_items([('python', 6), ('jupyter', 7), ('kaggle', 6)])

The `in` statement checks whether an object is a dictionary's items or not.

In [65]:
'python' in my_dict

True

#### Value accessing
Each key can be used to access the corresponding value and execute value assignment. Note that this is not slicing.

In [66]:
# access value using key
my_dict = dict([['python', 6], ['jupyter', 7], ['kaggle', 6]])
my_dict['python']

6

In [67]:
# add a new key-value pair
my_dict = {'python': 6, 'jupyter': 7, 'kaggle': 6}
my_dict['anaconda'] = 8
my_dict

{'python': 6, 'jupyter': 7, 'kaggle': 6, 'anaconda': 8}

In [68]:
# change the value of an existing key
my_dict = {'python': 1, 'jupyter': 7, 'kaggle': 6}
my_dict['python'] = 6
my_dict

{'python': 6, 'jupyter': 7, 'kaggle': 6}

The `del` statement deletes an existing key-value pair.

In [69]:
my_dict = {'python': 6, 'jupyter': 7, 'kaggle': 6}
del my_dict['kaggle']
my_dict

{'python': 6, 'jupyter': 7}

#### Modifying keys
Dictionaries do not allow modifying keys. However, this can be done indirectly using the `dict.pop()` method, which deletes a key and returns the corresponding value.

In [70]:
my_dict = {'kaggle': 6, 'jupyter': 7, 'anaconda': 8}
my_dict['python'] = my_dict.pop('kaggle')
my_dict

{'jupyter': 7, 'anaconda': 8, 'python': 6}

#### Updating
The `dict.update()` method adds new, updates duplicated and keeps existing keys.

In [71]:
data1 = {
    'name': ['Taylor', 'Patrick', 'Sam'],
    'phone': ['0912345678', '01600000016']}
data2 = {
    'gender': ['female', 'male', 'male'],
    'phone': ['0912345678', '0300000016', '0312345678']}
data1.update(data2)
data1

{'name': ['Taylor', 'Patrick', 'Sam'],
 'phone': ['0912345678', '0300000016', '0312345678'],
 'gender': ['female', 'male', 'male']}

### 3.3. Looping over a dictionary

In [72]:
my_dict = {'kaggle': 6, 'jupyter': 7, 'anaconda': 8}
for key in my_dict:
    print(key)

kaggle
jupyter
anaconda


In [73]:
my_dict = {'kaggle': 6, 'jupyter': 7, 'anaconda': 8}
for key in my_dict.keys():
    print(key)

kaggle
jupyter
anaconda


In [74]:
my_dict = {'kaggle': 6, 'jupyter': 7, 'anaconda': 8}
for value in my_dict.values():
    print(value)

6
7
8


In [75]:
my_dict = {'kaggle': 6, 'jupyter': 7, 'anaconda': 8}
for key, value in my_dict.items():
    print(f'The length of "{key}" is {value}')

The length of "kaggle" is 6
The length of "jupyter" is 7
The length of "anaconda" is 8


## Recap of data structures
Object     |Syntax          |Accessing|Slicable?|Unique?|Mutable?|
:----------|:---------------|:--------|:-------:|:-----:|:------:|
`str`      |`'123'`         |`x[0]`   |&check;  |       |        |
`list`     |`[1, 2, 3]`     |`x[0]`   |&check;  |       |&check; |
`tuple`    |`(1, 2, 3)`     |`x[0]`   |&check;  |       |        |
`set`      |`{1, 2, 3}`     |         |         |&check;|&check; |
`frozenset`|                |         |         |&check;|        |
`dict`     |`{'a':1, 'b':2}`|`x['a']` |         |&check;|&check; |

The data structures above are also called *iterables* (an object can be looped over using a `for` loop). The iterables having ordered items are called *sequences*.

## 4. Iterable comprehensions
A short and clear way to create lists and dictionaries.

### 4.1. List comprehension

#### Basic list comprehension

In [76]:
digits = []
for i in range(10):
    digits.append(i**2)
digits

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [77]:
[i*i for i in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [78]:
list1 = [1, 8, 5]
list2 = [2, 4, 9]

[x+y for x, y in zip(list1, list2)]

[3, 12, 14]

#### Printing elements

In [79]:
from datetime import date
data = [
    date(2020, 1, 1),
    date(2020, 1, 2),
    date(2020, 1, 3),
    date(2020, 1, 4),
    date(2020, 1, 5),
    date(2020, 1, 6),
    date(2020, 1, 7)
]

_ = [print(i) for i in data]

2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05
2020-01-06
2020-01-07


#### With conditions

In [80]:
[i*i for i in range(10) if i%2 == 0]

[0, 4, 16, 36, 64]

In [81]:
[i*i for i in range(10) if (i < 3) or (i > 8)]

[0, 1, 4, 81]

#### Nested list comprehension

In [82]:
# flatten a nested list
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]

[entry for row in matrix for entry in row]

[1, 2, 3, 4, 5, 6, 7, 8]

In [83]:
# create the transpose
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]

[[row[i] for row in matrix] for i in range(4)]

[[1, 5], [2, 6], [3, 7], [4, 8]]

### 4.2. Dictionary comprehension

In [84]:
felidae = ['lion', 'tiger', 'leopard', 'jaguar', 'panther', 'cheetah', 'puma']
{key: len(key) for key in felidae}

{'lion': 4,
 'tiger': 5,
 'leopard': 7,
 'jaguar': 6,
 'panther': 7,
 'cheetah': 7,
 'puma': 4}

In [85]:
{i: i**2 for i in range(10)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}