# **Chapter 3: Built-in Data Structures, Function, and Files**

## 3.1. Data Structures and Sequences

### 3.1.1. Tuple

A tuple is a `fixed-length`, `immutable sequence` of Python objects. The easiest way to create one is with a comma-separated sequence of values:

In [107]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [108]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [109]:
#convert any sequence or iritator to tuple
tuple([4, 0, 2])
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [110]:
#sequences are 0-indexed in Python
tup[0] #this will print out the first letter in above tuple

's'

In [111]:
#tuple is immutable - it’s not possible to modify which object is stored in each slot:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

In [None]:
#If an object inside a tuple is mutable, such as a list, you can modify it inplace:
tup[1].append(3) #tup[1] -> return [1,2] -- remember 0-index rule above -> this is a list then it could be modified
tup

('foo', [1, 2, 3], True)

In [None]:
#concatenate tuples using the + operator to produce longer tuples:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [None]:
#Multiplying a tuple by an integer, as with lists, has the effect of concatenating together that many copies of the tuple:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking tuples

In [None]:
#If you try to assign to a tuple-like expression of variables, Python will attempt to unpack the value on the righthand side of the equals sign:
tup = (4, 5, 6)
a, b, c = tup #P note: this will assign value of a, b, c to 4,5,6 RESPECTIVELY
c

6

In [None]:
#Even sequences with nested tuples can be unpacked:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

Using this functionality you can easily swap variable names, a task which in many languages might look like:
```
tmp = a #gán giá trị tmp = a -> cho a lấy giá trị mới = b -> b = giá trị temp (giá trị trung gian để chuyển)
a = b
b = tmp
```

In [None]:
#in Python, the swap can be done like this:
a, b = 1, 2
a
b


2

In [None]:
b, a = a, b #this step is to swap value of a and b
a
b

1

In [None]:
#A common use of variable unpacking is iterating over sequences of tuples or lists:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    #print('a={0}, b={1}, c={2}')
    print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


The Python language recently acquired some more advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple. This uses the special syntax `*rest`, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [None]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a, b

(1, 2)

This rest bit is sometimes something you want to discard; there is nothing special about the rest name. As a matter of convention, many Python programmers will use the `underscore (_)` for unwanted variables:

In [None]:
rest

[3, 4, 5]

In [None]:
a, b, *_ = values

#### Tuple methods

`Since the size and contents of a tuple cannot be modified`, it is very light on instance methods. A particularly useful one (also available on lists) is `count`, which counts the number of occurrences of a value:

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### 3.1.2. List

In contrast with tuples, lists are `variable-length` and `their contents can be modified` in-place. You can define them using square brackets `[ ]` or using the list type function:

In [None]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [None]:
b_list[1] = 'peekaboo' #change value in position 1 'bar' to 'peekaboo'
b_list

['foo', 'peekaboo', 'baz']

Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions. <br>
The `list` function is frequently used in data processing as a way to materialize an iterator or generator expression:

In [None]:
gen = range(10)
gen

range(0, 10)

In [None]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
#P note: add from beginner's python cheatsheet
bikes = [1,2,3]

# to copy a list
copy_of_bike = bikes[:] # p question? why do we need [:]
copy_of_bike

[1, 2, 3]

In [None]:
copy_of_bike == bikes

True

In [None]:
#try to remove [:] in the copy syntax above -> what happen
bikes = [1,2,3]
copy_of_bikes = bikes
copy_of_bikes

[1, 2, 3]

In [None]:
type(copy_of_bikes)

list

In [None]:
bikes

[1, 2, 3]

In [None]:
copy_of_bikes = [4,5,6] #change value of copy version -> check the original list

In [None]:
bikes #remain its own value

[1, 2, 3]

#### Adding and removing elements

In [None]:
#Elements can be appended to the end of the list with the append method:
b_list.append('dwarf')
b_list  

['foo', 'red', 'peekaboo', 'baz', 'dwarf', 'dwarf', 'dwarf', 'dwarf']

In [None]:
#Using insert you can insert an element at a specific location in the list:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'red', 'peekaboo', 'baz', 'dwarf', 'dwarf', 'dwarf', 'dwarf']

**WARNING** <br>
`insert is computationally expensive compared with append`, because references to subsequent elements have to be shifted internally to make room for the new element. <br>
If you need to insert elements at both the beginning and end of a sequence, you may wish to explore `collections.deque`, a double-ended queue, for this purpose.

In [None]:
#The inverse operation to insert is pop, which removes and returns an element at a particular index:
b_list.pop(2)
b_list

['foo', 'red', 'dwarf', 'dwarf', 'dwarf', 'dwarf']

In [None]:
#Elements can be removed by value with remove, which locates the first such value and removes it from the last:
b_list.append('foo')
b_list

['red', 'dwarf', 'dwarf', 'dwarf', 'dwarf', 'foo', 'foo']

In [None]:
b_list.remove('foo') #remove the last 'foo' value
b_list

['red', 'dwarf', 'dwarf', 'dwarf', 'dwarf']

In [None]:
#Check if a list contains a value using the in keyword:
'dwarf' in b_list

True

In [None]:
#The keyword not can be used to negate in:
'dwarf' not in b_list

False

#### Concatenating and combining lists

In [None]:
#Similar to tuples, adding two lists together with + concatenates them:
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [None]:
#If you have a list already defined, you can append multiple elements to it using the `extend` method:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Note that `list concatenation by addition` is a `comparatively expensive operation` since a new list must be created and the objects copied over.<br>
Using `extend to append elements to an existing list, especially if you are building up a large list, is usually preferable`.

In [None]:
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
# this is faster than by using extend to append elements

In [None]:
everything = []
for chunk in list_of_lists:
    everything = everything + chunk

#### Sorting

In [None]:
#You can sort a list in-place (without creating a new object) by calling its sort function:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

`sort` has a few options that will occasionally come in handy. One is the ability to pass a `secondary sort key` — that is, a function that produces a value to use to sort the objects. <br>
For example, we could sort a collection of strings by their lengths:

In [None]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Binary search and maintaining a sorted list

The built-in `bisect` module implements binary search and insertion into a sorted list. <br>
`bisect.bisect` finds the location where an element should be inserted to keep it sorted, while `bisect.insort` actually inserts the element into that location:

In [None]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c, 2) # -> find the location that new value should be inserted is 4 (before 3)

4

In [None]:
bisect.bisect(c, 5) # -> find the location that new value should be inserted is 6 (before 7) -- find the location only, not insert

6

In [None]:
bisect.insort(c, 6) #insert value 6 but still keep the list sorted

In [None]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

**CAUTION** <br>
The `bisect module functions do not check whether the list is sorted`, as doing so would be computationally expensive. Thus, `using them with an unsorted list` will succeed without error but `may lead to incorrect results`.

#### Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of `start:stop` passed to the indexing operator `[]:`

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [None]:
seq[3:4] = [6, 3]
seq

[7, 2, 3, 6, 3, 3, 5, 6, 0, 1]

While the element at the `start index is included`, `the stop index is not included`, so that the number of elements in the result is stop - start. <br>
Either the start or stop can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively:

In [None]:
seq[:5]

[7, 2, 3, 6, 3]

In [None]:
seq[3:]

[6, 3, 3, 5, 6, 0, 1]

Negative indices slice the sequence relative to the end:

In [None]:
seq[-4:]
seq[-6:-2]

[3, 3, 5, 6]

In [None]:
#A step can also be used after a second colon to, say, take every other element:
seq[::2]

[7, 3, 3, 5, 0]

In [None]:
#A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:
seq[::-1] #not only steps but also reverse the list -> quite interesting use case

[1, 0, 6, 5, 3, 3, 6, 3, 2, 7]

### 3.1.3. Built-in Sequence Functions

Python has a handful of `useful sequence functions` that you `should familiarize yourself with and use at any opportunity`.

#### 3.1.3.1 enumerate

In [None]:
# It’s common when iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:
collection = [1,3,5,7,8,9]
i = 0
for value in collection:
   # do something with value
   collection[i] = collection[i] **2 #P note: this is my edit to square the element in the list
   i += 1
collection

[1, 9, 25, 49, 64, 81]

Since this is so common, Python has a built-in function, `enumerate`, which returns a sequence of (i, value) tuples:

In [None]:
collection = [1,3,5,7,8,9]
for i, value in enumerate(collection):
   # do something with value
   collection[i] = collection[i] **2
collection

[1, 9, 25, 49, 64, 81]

When you are indexing data, a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence:

In [None]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

{'foo': 0, 'bar': 1, 'baz': 2}

In [None]:
thinh = [27, 'DA', 'Single']
# enumerate() - copy vd cua Thinh https://www.notion.so/phule1912/4-Lists-f7d73b799068488fb588eda4aabafd99 

for index, item in enumerate(thinh):
    print('Index ' + str(index) + ' in thinh is: ' + str(item))

Index 0 in thinh is: 27
Index 1 in thinh is: DA
Index 2 in thinh is: Single


#### 3.1.3.2 sorted

The `sorted` function returns a new sorted list from the elements of any sequence:

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [None]:
sorted('horse race')

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### 3.1.3.3 zip

`zip` `“pairs” up the elements` of a number of lists, tuples, or other sequences to create `a list of tuples`:

In [None]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [None]:
type(zipped) # P check the type of created sequence

zip

`zip` can take an arbitrary number of sequences, and the number of elements it produces is determined by the `shortest` sequence:

In [None]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

A `very common use of zip` is simultaneously `iterating over multiple sequences`, possibly also combined with `enumerate`:

In [None]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


Given a “zipped” sequence, `zip can be applied in a clever way to “unzip”` the sequence. `Another way to think about this is converting a list of rows into a list of columns`. The syntax, which looks a bit magical, is:

In [None]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
            ('Curt', 'Schilling')]
first_names, last_names = zip(*pitchers)
first_names

('Nolan', 'Roger', 'Curt')

In [None]:
last_names

('Ryan', 'Clemens', 'Schilling')

#### 3.1.3.4 reversed

`reversed` iterates over the elements of a sequence in reverse order:

In [None]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Keep in mind that `reversed is a generator` (to be discussed in some more detail later), so it `does not create the reversed sequence until materialized` (e.g., with list or a for loop).

In [None]:
reversed(range(10)) #-> P test my understanding that this generator does not creat the list above without the list() function

<range_iterator at 0x28554320570>

### 3.1.4. Dict

#### 3.1.4.0. Dict introduction
<!-- P added sub heading for grouping collapsing purpose -->

`dict` is likely the most important built-in Python data structure. A more common name for it is `hash map` or `associative array`. It is a flexibly sized collection of `key-value pairs`, where key and value are Python objects.

In [None]:
empty_dict = {}
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

You can access, insert, or set elements using the same syntax as for
accessing elements of a list or tuple:

In [None]:
d1[7] = 'an integer'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [None]:
d1['b']

[1, 2, 3, 4]

You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value:

In [None]:
'b' in d1

True

You can `delete values` either using the `del` keyword or the `pop` method (which simultaneously `returns the value and deletes the key`):

In [None]:
d1[5] = 'some value'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [None]:
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [None]:
#delete value by using del
del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [None]:
#returns the value and deletes the key
ret = d1.pop('dummy')
ret

'another value'

In [None]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

The `keys and values` method give you iterators of the dict’s keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order:

In [None]:
list(d1.keys())

['a', 'b', 7]

In [None]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

You can `merge` one dict into another using the `update method`:

In [None]:
d1.update({'b' : 'foo', 'c' : 12})
d1
#The update method changes dicts in-place, so any existing keys in the data passed to update will have their old values discarded.

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

#### 3.1.4.1. Creating dicts from sequences

It’s common to occasionally end up with `two sequences that you want to pair up element-wise in a dict`. As a first cut, you might write code like this:

In [None]:
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value

NameError: name 'key_list' is not defined

Since `a dict is essentially a collection of 2-tuples`, the dict function accepts a list of 2-tuples:

In [None]:
mapping = dict(zip(range(5), reversed(range(5))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Later we’ll talk about `dict comprehensions`, another `elegant way to
construct dicts`.

#### 3.1.4.2. Default values

It’s very common to have logic like:
```python
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```

Thus, the dict methods get and pop can take a default value to be returned, so that the above if-else block can be written simply as:

In [None]:
value = some_dict.get(key, default_value)
#this is a cleaner version to replace for the if else syntax above

NameError: name 'some_dict' is not defined

`get` by default will return None if the key is not present, while `pop ` will raise an exception.

For example, you could imagine categorizing a list of words by their first letters as a dict of lists:

In [None]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0] #word[0] -> return the 1st letter of each word in wordS list
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The `setdefault dict method` is for precisely this purpose. The preceding
for loop can be rewritten as:

In [None]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in `collections` module has a useful class, `defaultdict`, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict: from collections

In [None]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

#### 3.1.4.3. Valid dict key types

While the `values of a dict can be any Python object`,` the keys generally have to be immutable objects` like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is `hashability`.<br><br>
You can check whether an object is hashable (can be used as a key in a dict) with the hash function:

In [None]:
hash('string')

6734678306481594111

In [None]:
hash((1, 2, (2, 3)))

-9209053662355515447

In [None]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can:

In [None]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

In [None]:
hash(tuple([1, 2, 3]))

529344067295497451

### 3.1.5. Set

A `set` is an `unordered collection of unique elements`. You can think of them `like dicts, but keys only`, no values.

A set can be created in two ways: <br> 
* via the `set function` or <br>
* via a set literal with `curly braces`:

In [None]:
set([2, 2, 2, 1, 3, 3])
#only return {1,2,3} - uniqque value only

{1, 2, 3}

In [None]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

`Sets` support mathematical set operations like <br>
* `union`
* `intersection`
* `difference`
* and `symmetric difference`.

In [None]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [None]:
#union
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
#intersection
a.intersection(b)

{3, 4, 5}

In [None]:
a & b

{3, 4, 5}

In [None]:
c = a.copy()
c |= b #update: Set the contents of c to be the union of the elements in c and b
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [None]:
d = a.copy()
d &= b #intersection: Set the contents of c to be elements in both c and b
d

{3, 4, 5}

In [None]:
#Like dicts, set elements generally must be immutable. To have list-like elements, you must convert it to a tuple:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [None]:
#You can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)

True

In [None]:
a_set.issuperset({1, 2, 3})

True

In [None]:
#Sets are equal if and only if their contents are equal:
{1, 2, 3} == {3, 2, 1}

True

### 3.1.6. List, Set, and Dict Comprehensions

`List comprehensions` are one of the most-loved Python language features. They allow you to `concisely form a new list` by filtering the elements of a collection, transforming the elements passing the filter in `one concise expression`. <br> <br>
They take the basic form: <br>
[`expr` **for** val **in** collection **if** `condition`] <br><br>
This is equivalent to the following `for loop`:

In [None]:
result = []
for val in collection:
    if condition:
        result.append(expr)

NameError: name 'collection' is not defined

In [None]:
#For example, given a list of strings, we could filter out strings with length 2 or less and also convert them to uppercase like this:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

`Set and dict comprehensions` are a natural extension, producing sets and dicts in an idiomatically similar way instead of lists.

A `dict comprehension` looks like this: <br><br>
    dict_comp = {`key-expr : value-expr for value in collection if condition`}

A `set comprehension` looks like the equivalent list comprehension except with curly braces instead of square brackets:<br><br>
    set_comp = {`expr for value in collection if condition`}

In [None]:
#Suppose we wanted a set containing just the lengths of the strings contained in the collection;
#we could easily compute this using a set comprehension:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

We could also express this more functionally using the `map function`, introduced shortly:

In [None]:
set(map(len, strings))

{1, 2, 3, 4, 6}

In [None]:
#As a simple dict comprehension example, we could create a lookup map of these strings to their locations in the list:
loc_mapping = {val : index for index, val in enumerate(strings)}
#P vietsub: to create a loc_mapping dictionary key = val, value: index with index, value from enumerate strings list
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### 3.1.6.1 Nested list comprehensions

In [None]:
#Suppose we have a list of lists containing some English and Spanish names:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

You might have gotten these names from a couple of files and decided to organize them by language. Now, suppose we wanted to get a single list containing all names with two or more e’s in them. We could certainly do this with a `simple for loop`:

In [None]:
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)
names_of_interest

['Steven']

You can actually wrap this whole operation up in a `single nested list comprehension`, which will look like:

In [None]:
result = [name for names in all_data for name in names
          if name.count('e') >= 2]
result

['Steven']

At first, nested list comprehensions are a bit hard to wrap your head around. The `for` parts of the list comprehension `are arranged according to the order of nesting`, and any filter condition is put at the end as before.

In [None]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Keep in mind that the order of the for expressions would be the same if you wrote a nested for loop instead of a list comprehension:
```py
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)
```

You can have arbitrarily many levels of nesting, though if you have more than two or three levels of nesting you should probably start to question whether this makes sense from a code readability standpoint. <br>
It’s important to distinguish the syntax just shown from a list comprehension inside a list comprehension, which is also perfectly valid:

In [None]:
[[x for x in tup] for tup in some_tuples]
#This produces a list of lists, rather than a flattened list of all of the inner elements.

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## 3.2. Functions

Functions are the primary and most `important method of code organization` and `reuse` in Python.

Functions are declared with the `def` keyword and returned from with the `return` keyword:

In [None]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

Each function can have `positional arguments` and `keyword arguments`. `Keyword arguments` are most commonly used to `specify default values or optional arguments`. In the preceding function, `x and y are positional arguments` while z is a keyword argument. This means that the function can be called in any of these ways:
```py
my_function(5, 6, z=0.7)
my_function(3.14, 7, 3.5)
my_function(10, 20)
```

**NOTE**
It is possible to use keywords for passing positional arguments as well. In the preceding example, we could also have written: <br>
```py
my_function(x=5, y=6, z=7)
my_function(y=6, x=5, z=7)
```
In some cases this can help with *readability*.

### 3.2.1. Namespaces, Scope, and Local Functions

Functions can access variables in two different scopes: `global` and `local`. An *alternative and more descriptive name* describing a variable scope in Python is a *namespace*. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. <br> After the function is finished, the local namespace is destroyed (with some exceptions that are outside the purview of this chapter). <br>

Consider the following function:

```
def func():
    a = []
    for i in range(5):
        a.append(i)
```
When func() is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits.

Suppose instead we had declared a as follows:

In [None]:
a = [] #P note: move a above def func()
def func():
    for i in range(5):
        a.append(i)
a

[]

`Assigning variables outside of the function’s scope is possible`, but those variables must be `declared as global` via the global keyword:<br>
[P note: chưa hiểu chỗ này để làm gì vì 2 phương pháp đang thấy cho ra cùng kết quả]

In [None]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

[]


**CAUTION** <br>
I generally `discourage` use of the `global keyword`. Typically global variables are used to store some kind of state in a system. If you find yourself using a lot of them, it may `indicate a need for object-oriented programming (using classes)`.

### 3.2.2. Returning Multiple Values

When I first programmed in Python after having programmed in Java and C++, one of my favorite features was the ability to return multiple values from a function with simple syntax.<br>
Here’s an example:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

In data analysis and other scientific applications, you may find yourself doing this often. `What’s happening here is that the function is actually just returning one object, namely a tuple, which is then being unpacked into the result variables`. <br>In the preceding example, we could have done this instead:

In [None]:
return_value = f()
#In this case, return_value would be a 3-tuple with the three returned variables

A potentially attractive alternative to returning multiple values like before might be to return a dict instead:<br>
Note: This alternative technique can be useful depending on what you are trying to do.

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

### 3.2.3. Functions Are Objects

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages.<br> Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings:

In [None]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'West virginia?']

Anyone who has ever worked with user-submitted survey data has seen messy results like these. <br>
Lots of things need to happen to make this list of strings uniform and ready for analysis: stripping whitespace, removing punctuation symbols, and standardizing on proper capitalization. <br>
One way to do this is to use built-in string methods along with the re standard library module for regular expressions:

In [None]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value) #regex101: Match a single character present in the list below [!#?]
        value = value.title()
        result.append(value)
    return result

In [None]:
#result
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

An `alternative approach` that you may find useful is to make a list of the operations you want to apply to a particular set of strings:

In [None]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [None]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

A more `functional pattern` like this enables you to easily modify how the strings are transformed at a very high level. The clean_strings function is also now more reusable and generic.

You can use functions as arguments to other functions like the built-in map function, which applies a function to a sequence of some kind:

In [None]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


### 3.2.4. Anonymous (Lambda) Functions

P note for other resources to learn lambda:<br>
[How to Use Python Lambda Functions](https://realpython.com/python-lambda/#lambda-calculus) <br>
[Lambda, Map, and Filter in Python](https://betterprogramming.pub/lambda-map-and-filter-in-python-4935f248593)

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a` single statement, the result of which is the return value`.<br>
They are defined with the `lambda keyword`, which has no meaning other than `“we are declaring an anonymous function”`:

In [None]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2 #P note: anon = anonymous function

`I usually refer to these as lambda functions in the rest of the book`. They are especially convenient in data analysis because, as you’ll see, there are many cases where data transformation functions will take functions as arguments. `It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable`. <br>
For example, consider this silly example

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

`You could also have written [x * 2 for x in ints]`, but here `we were able to succinctly pass a custom operator` to the `apply_to_list function`.

As another example, suppose you wanted to sort a collection of strings by the number of distinct letters in each string:

In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [None]:
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

**NOTE** <br>
One reason lambda functions are called anonymous functions is that , unlike functions declared with the def keyword, `the function object itself is never given an explicit` `__name__` `attribute`.

### 3.2.5. Currying: Partial Argument Application

Currying is computer science jargon (`named after the mathematician Haskell Curry`) that means deriving new functions from existing ones by `partial argument application`.

For example, suppose we had a trivial function that adds two numbers together:

In [None]:
def add_numbers(x, y):
    return x + y

Using this function, we could `derive a new function of one variable`, add_five, that adds 5 to its argument:

In [None]:
add_five = lambda y: add_numbers(5, y)

The `second argument` to add_numbers is said `to be curried`. There’s nothing very fancy here, as all we’ve really done is define a new function that calls an existing function.<br>
The built-in `functools` module can simplify this process using the `partial function`

In [None]:
from functools import partial
add_five = partial(add_numbers, 5)

### 3.2.6. Generators

`Having a consistent way to iterate over sequences, like objects in a list or lines in a file, is an important Python feature`. This is accomplished by means of the `iterator protocol`, a generic way to make objects iterable. For example, iterating over a dict yields the dict keys:

In [None]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

a
b
c


When you write `for key in some_dict`, the Python interpreter first attempts to create an iterator out of `some_dict`:

In [None]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x26beced26d0>

An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop.<br>
Most methods expecting a `list` or `list-like` object will also accept any iterable object.<br>
This includes built-in methods such as `min`, `max`, and `sum`, and type constructors like `list` and `tuple`:

In [None]:
list(dict_iterator)

['a', 'b', 'c']

A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, `generators return a sequence of multiple results lazily`, pausing after each one until the next one is requested. To create a generator, use the `yield` keyword *instead of return* in a function:

In [None]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [None]:
#When you actually call the generator, no code is immediately executed:
gen = squares()
gen

<generator object squares at 0x0000026BECECEF20>

It is not until you request elements from the generator that it begins executing its code:

In [None]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

#### Generator expresssions

Another even more concise way to make a generator is by using a `generator expression`. This is a generator analogue to list, dict, and set comprehensions; to create one, enclose what would otherwise be a `list comprehension within parentheses` *instead of brackets* (P note: need to change `[]` in list comprehension to `()` only):

In [None]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x0000026BECECC510>

This is completely equivalent to the following more verbose generator

```
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()
```

`Generator expressions` can be used instead of list comprehensions as function arguments in many cases:

In [None]:
sum(x ** 2 for x in range(100))

328350

In [None]:
dict((i, i **2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module

The standard library `itertools` module has a collection of generators for many common data algorithms.<br>
For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function.<br>

Here’s an example:

In [None]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### 3.2.7. Errors and Exception Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs. In data analysis applications, many functions only work on certain kinds of input.<br>
As an example, Python’s float function is capable of casting a string to a floating-point number, but fails with ValueError on improper inputs:

In [None]:
float('1.2345')
float('something')

ValueError: could not convert string to float: 'something'

Suppose we wanted a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to float in a try/except block:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except: #The code in the except part of the block will only be executed if float(x) raises an exception:
        return x

In [None]:
attempt_float('1.2345')
attempt_float('something')

'something'

You might notice that float can raise exceptions other than `ValueError`:

In [None]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You might want to only suppress ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program.<br>
To do that, write the exception type after except:

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [None]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You can catch `multiple exception` types by writing `a tuple of exception` types instead (the parentheses are required):

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the try block succeeds or not. To do this, use `finally`:

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

NameError: name 'path' is not defined

Here, the file handle f will always get closed. Similarly, you can have code that executes only if the `try:` block succeeds using `else`:

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

#### Exceptions in IPython

```py
In [10]: %run examples/ipython_bug.py
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/home/wesm/code/pydata-book/examples/ipython_bug.py in <module>()
     13     throws_an_exception()
     14
---> 15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in calling_things()
     11 def calling_things():
     12     works_fine()
---> 13     throws_an_exception()
     14
     15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in throws_an_exception()
      7     a = 5
      8     b = 6
----> 9     assert(a + b == 10)
     10
     11 def calling_things():

AssertionError:
```

## 3.3. Files and the Operating System

In [None]:
%pushd book-materials

[WinError 2] The system cannot find the file specified: 'book-materials'
d:\Github\py-mckinney


['d:\\Github\\py-mckinney']

In [None]:
path = 'examples/segismundo.txt'
f = open(path)

In [None]:
for line in f:
    pass

In [None]:
lines = [x.rstrip() for x in open(path)]
lines

In [None]:
f.close()

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

In [None]:
f = open(path)
f.read(10)
f2 = open(path, 'rb')  # Binary mode
f2.read(10)

In [None]:
f.tell()
f2.tell()

In [None]:
import sys
sys.getdefaultencoding()

In [None]:
f.seek(3)
f.read(1)

In [None]:
f.close()
f2.close()

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
with open('tmp.txt') as f:
    lines = f.readlines()
lines

In [None]:
import os
os.remove('tmp.txt')

### Bytes and Unicode with Files

In [None]:
with open(path) as f:
    chars = f.read(10)
chars

In [None]:
with open(path, 'rb') as f:
    data = f.read(10)
data

In [None]:
data.decode('utf8')
data[:4].decode('utf8')

In [None]:
sink_path = 'sink.txt'
with open(path) as source:
    with open(sink_path, 'xt', encoding='iso-8859-1') as sink:
        sink.write(source.read())
with open(sink_path, encoding='iso-8859-1') as f:
    print(f.read(10))

In [None]:
os.remove(sink_path)

In [None]:
f = open(path)
f.read(5)
f.seek(4)
f.read(1)
f.close()

In [None]:
%popd

## 3.4. Conclusion