# Practice examples 
- Python for Data Analysis Chapter 3
- DATA-609: Ken Noppinger
---

## Data Structures and Sequences

### Tuple

Fixed-length, immutable sequence of Python objects

In [1]:
# comma-separated sequence of values
tup = 4, 5, 6
tup

(4, 5, 6)

In [2]:
# enclose the values in parentheses
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [3]:
# convert any sequence or iterator to a tuple by invoking tuple
tuple([4, 0, 2])

(4, 0, 2)

In [4]:
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

In [5]:
# Elements can be accessed with square brackets [].  Sequences are 0-indexed
tup[0]

's'

In [6]:
# Once the tuple is created it’s not possible to modify which object is stored in each slot
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

In [8]:
# If an object inside a tuple is mutable, such as a list in example above, then it can be modified in-place
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [9]:
# Concatenate tuples using the + operator to produce longer tuples
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [10]:
# Multiplying a tuple by an integer, as with lists, has the effect of concatenating together that many copies of the tuple
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking tuples

In [11]:
# Python will attempt to unpack the value on the righthand side of the equals sign
tup = (4, 5, 6)
a, b, c = tup
b

5

In [12]:
# Sequences with nested tuples can be unpacked
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

In [13]:
# Swap - Integers
a, b = 1, 2
b, a = a, b
a, b

(2, 1)

In [14]:
# Swap - Strings
a, b = "cow", "calf"
b, a = a, b
a, b

('calf', 'cow')

In [15]:
# Iterating over sequences of tuples or lists
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [16]:
# Returning multiple values from a function
# TBD

In [17]:
# Extract a few elements from the beginning of a tuple. Use *rest special syntax
values = 1, 2, 3, 4, 5
a, b, *rest = values
a, b

(1, 2)

In [18]:
rest

[3, 4, 5]

In [19]:
# Style recommendation - Use *_ for unwanted variables if plan is to discard them
a, b, *_ = values
_

[3, 4, 5]

#### Tuple methods

In [20]:
# Count the number of occurrences of a value
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### List

Lists are variable-length and their contents can be modified in-place

In [21]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

['foo', 'bar', 'baz']

In [22]:
b_list[1] = 'peekaboo'
b_list

['foo', 'peekaboo', 'baz']

Use the list function to materialize an iterator or generator expression

In [23]:
gen = range(10)
gen

range(0, 10)

In [24]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Adding and removing elements

In [25]:
# Append to end of a list
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [26]:
# Insert at a specific location
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

In [27]:
# Pop to remove and return an element at a particular index
b_list.pop(2)

'peekaboo'

In [28]:
b_list

['foo', 'red', 'baz', 'dwarf']

In [29]:
# Remove locates the first occurrence of a value and removes it from the list
b_list.append('foo')
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

In [30]:
b_list.remove('foo')
b_list

['red', 'baz', 'dwarf', 'foo']

In [31]:
# Containment - Check if a list contains a value using the "in" keyword
'dwarf' in b_list

True

In [32]:
# Containment - The keyword "not" can be used to negate "in"
'dwarf' not in b_list

False

#### Concatenating and combining lists

In [33]:
# Adding two lists together with + concatenates them.
[4, None, 'foo'] + [7, 8, (2, 3)]

[4, None, 'foo', 7, 8, (2, 3)]

In [34]:
# Append multiple elements to a list using the extend method.  
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

[4, None, 'foo', 7, 8, (2, 3)]

Extend is faster than concatenating.  

Reason is because concatenating makes a new list to copy and combine lists where extending grows the existing list.

In [35]:
list_of_lists = [[1,2,3],[4,5,6],[7,8,9]] * 2
list_of_lists

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [36]:
%time
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
everything    

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 9.3 µs


[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [37]:
%time
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
everything

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 7.87 µs


[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]

#### Sorting

In [38]:
# Sort a list in-place (without creating a new object) by calling its sort function
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

In [39]:
# Pass a secondary sort key (i.e., a function) that produces a value to use to sort the objects. 
# For example, sort a collection of strings by their lengths
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

['He', 'saw', 'six', 'small', 'foxes']

#### Binary search and maintaining a sorted list

Built-in bisect module implements binary search and insertion into a sorted list

In [40]:
import bisect
my_sorted_list = [1, 2, 2, 2, 3, 4, 7]

Use **bisect.bisect** to find the location where an element should be inserted to keep it sorted

In [41]:
bisect.bisect(my_sorted_list, 2)

4

In [42]:
bisect.bisect(my_sorted_list, 5)

6

Use **bisect.insort** to insert the element into that location

In [43]:
bisect.insort(my_sorted_list, 6)
my_sorted_list

[1, 2, 2, 2, 3, 4, 6, 7]

#### Slicing

Select sections of most sequence types by using slice notation.

Use ```start:stop``` passed to the indexing operator []

In [44]:
# Slice a sequence
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

[2, 3, 7, 5]

In [45]:
# Assign a slice of a sequence
seq[3:4] = [6, 3]
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

Either the start or stop can be omitted, in which case they default to start of sequence 
and end of sequence, respectivley.

In [46]:
# Slice from beginning of sequence
seq[:5]

[7, 2, 3, 6, 3]

In [47]:
# Slice from a point in the sequence to the end
seq[3:]

[6, 3, 5, 6, 0, 1]

Negative indices slice the sequence relative to the end.

In [48]:
# Slice the last four elements of the sequence
seq[-4:]

[5, 6, 0, 1]

In [49]:
# Slice four elements of the sequence from the sixth position from the end
seq[-6:-2]

[6, 3, 5, 6]

A step can be used after a second colon to take every nth element.

In [50]:
seq[::2]

[7, 3, 3, 6, 1]

A clever use of stepping is to pass -1, which has the useful effect of reversing a list or tuple.

In [51]:
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

### Built-in Sequence Functions

#### enumerate

It’s common when iterating over a sequence to want to keep track of the index of the current item

In [52]:
collection = [1,8,5,7] * 1000

Do-it-yourself approach

In [53]:
%time
i = 0  
total = 0
for value in collection:
    total += value
    i += 1
print(f"{i} items add up to {total}")

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 8.11 µs
4000 items add up to 21000


Use built-in function ```enumerate``` to return a sequence of tuples

In [54]:
%time
total = 0
for i, value in enumerate(collection):
    total += value
print(f"{i+1} items add up to {total}")    

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 9.06 µs
4000 items add up to 21000


When indexing data, use ```enumerate``` to compute a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence

In [55]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping    

{'foo': 0, 'bar': 1, 'baz': 2}

#### sorted

Use the ```sorted``` function to return a new sorted list from the elements of any sequence

In [56]:
sorted([7, 1, 2, 6, 0, 3, 2])

[0, 1, 2, 2, 3, 6, 7]

In [57]:
sorted('horse race')

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

Note: ```sorted``` behavior is different than sort in-place, which changes the sequence.

In [58]:
my_list = [3,2,1]
sorted(my_list)

[1, 2, 3]

In [59]:
my_list

[3, 2, 1]

#### zip

```zip``` “pairs” up elements of a number of lists, tuples, or other sequences to create a list of tuples

In [60]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)

In [61]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

```zip``` can take an arbitrary number of sequences

The number of elements it produces is determined by the shortest sequence

In [62]:
seq3 = ['False','True']
list(zip(seq1, seq2, seq3))

[('foo', 'one', 'False'), ('bar', 'two', 'True')]

A very common use of ```zip``` is simultaneously iterating over multiple sequences, possibly
also combined with enumerate

In [63]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


Given a “zipped” sequence, ```zip``` can be applied in a clever way to “unzip” the sequence. Another way to think about this is converting a list of rows into a list of columns.

To demonstrate this, consider a list of baseball pitchers:

In [64]:
pitchers = [('Max', 'Scherzer'), ('Clayton', 'Kershaw'), ('Justin', 'Verlander')]
pitchers

[('Max', 'Scherzer'), ('Clayton', 'Kershaw'), ('Justin', 'Verlander')]

Zipped sequence representation as a list of rows

| Row | First Name | Last Name |
| --- | --- | --- |
| **Pitcher 0** | Max | Scherzer |
| **Pitcher 1** | Clayton | Kershaw |
| **Pitcher 2** | Justin | Verlander |

In [65]:
first_names, last_names = zip(*pitchers)

In [66]:
first_names

('Max', 'Clayton', 'Justin')

In [67]:
last_names

('Scherzer', 'Kershaw', 'Verlander')

Unzipped sequence representation as a list of columns

| Column | Pitcher 0 | Pitcher 1 | Pitcher 2 |
| --- | --- | --- | --- |
| **First Name** | Max | Clayton | Justin |
| **Last Name** | Scherzer | Kershaw | Verlander |

#### reversed

```reversed``` iterates over the elements of a sequence in reverse order

In [68]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

#### dict

```dict``` is a flexibly sized collection (i.e., dictionary) of key-value pairs, where key and value are Python objects.

This is conceptually a ***hash map*** or ***associative array***



In [69]:
empty_dict = {}

In [70]:
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4], 'c': (14, 28, 56)}
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (14, 28, 56)}

Insert an element in the dictionary

In [71]:
d1[7] = 'an integer'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (14, 28, 56), 7: 'an integer'}

Access an element of the dictionary

In [72]:
d1['b']

[1, 2, 3, 4]

Check if dict contains a key

In [73]:
'a' in d1

True

Delete a value from the dictionary using the ```del``` keyword

In [74]:
d1[5] = 'some value'
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 'c': (14, 28, 56),
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [75]:
del d1[5]
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 'c': (14, 28, 56),
 7: 'an integer',
 'dummy': 'another value'}

Retrieve a value and delete it from the dictionary using the ```pop``` method

In [76]:
ret = d1.pop('dummy')
ret

'another value'

In [77]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 'c': (14, 28, 56), 7: 'an integer'}

The ```keys``` and ```values``` method provide iterators of the dict’s keys and values, respectively.

In [78]:
list(d1.keys())

['a', 'b', 'c', 7]

In [79]:
list(d1.values())

['some value', [1, 2, 3, 4], (14, 28, 56), 'an integer']

Merge one dict into another using the ```update``` method

Note: changes are made in-place and the old values of matched keys are discarded

In [80]:
d1.update({'b' : 'foo', 'c' : 12, 'animal' : 'cow'})
d1

{'a': 'some value', 'b': 'foo', 'c': 12, 7: 'an integer', 'animal': 'cow'}

#### Creating dicts from sequences

Given two sequences, pair them up element-wise in a dict.

In [81]:
value_tuple = ('goat','cow','pig','hen','horse')
key_list = list(range(len(value_tuple)))

In [82]:
# Long way
mapping = {}
for key, value in zip(key_list, value_tuple):
    mapping[key] = value
mapping    

{0: 'goat', 1: 'cow', 2: 'pig', 3: 'hen', 4: 'horse'}

In [83]:
# Short way
mapping = dict(zip(key_list, value_tuple))
mapping

{0: 'goat', 1: 'cow', 2: 'pig', 3: 'hen', 4: 'horse'}

#### Dict default values 

Common scenario - need to check for existence of an element

In [84]:
# Typical logic...
key = 8
default_value = "non-animal"
if key in mapping:
    value = mapping[key]
else:
    value = default_value
value    

'non-animal'

Use ```get``` and ```pop``` methods as an alternative

In [85]:
key = 8
default_value = "non-animal"
value = mapping.get(key,default_value)
value

'non-animal'

In [86]:
value = mapping.get(3,default_value)
value

'hen'

In [87]:
value = mapping.pop(3,default_value)
value

'hen'

In [88]:
mapping

{0: 'goat', 1: 'cow', 2: 'pig', 4: 'horse'}

Categorize a list of words by their first letters as a dict of lists

In [89]:
# Long way
%time
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter        

CPU times: user 6 µs, sys: 0 ns, total: 6 µs
Wall time: 10.7 µs


{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

Use the ```setdefault``` dict method to replace the preceding for loop

In [90]:
# Short way
%time 
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter    

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 8.11 µs


{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

#### Valid dict key types

Values of a dict can be any Python object.

Keys of a dict have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too).

Use the ```hash``` function to check keys for hashability (i.e., if they are immutable)

In [91]:
# Hashable because a string is immutable
hash('string')

3852412887424715785

In [92]:
# Hashable because integers and tuples are immutable
hash((1, 2, (2, 3)))

-9209053662355515447

In [93]:
# Not hashable because lists are mutable
hash((1, 2, [2, 3]))

TypeError: unhashable type: 'list'

Convert a list to a tuple to use it as a key. The tuple can be hashed as long as its elements also can

In [94]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

#### set

An unordered collection of unique elements.
- ```set``` function
- *set literal*

In [95]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [96]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

Sets support mathematical operations like union, intersection, difference, and symmetric difference.

Union - set of distinct elements occurring in either set

In [97]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [98]:
# Union function
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [99]:
# Union operator
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

Intersection - contains the elements occurring in both sets

In [100]:
# Intersection function
a.intersection(b)

{3, 4, 5}

In [101]:
# Intersection operator
a & b

{3, 4, 5}

Copy a Set - Replace the contents of the set on the left side of the operation with the result. 

In [102]:
c = a.copy()  # Make a copy
c |= b        # Update it with union of another set
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [103]:
d = a.copy()  # Make a copy
d &= b        # Update it with the intersection with the other set
d

{3, 4, 5}

Set elements must be immutable.  Convert the set to a tuple to contain list-like elements

In [104]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

Check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set

In [105]:
# Subset
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)

True

In [106]:
# Superset
a_set.issuperset({1, 2, 3})

True

Sets are equal if and only if their contents are equal

In [107]:
{1, 2, 3} == {3, 2, 1}

True

#### List, Set, and Dict Comprehensions

Concisely form a new list by filtering the elements of a collection and transforming the elements passing the filter in one concise expression

```[expr for val in collection if condition]```

In [108]:
# List of strings
list_of_strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [109]:
# Long way of iterating the list
result = []
for s in list_of_strings:
    if len(s) > 2:
        result.append(s.upper())
result        

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [110]:
# List comprehension
list_comp = [s.upper() for s in list_of_strings if len(s) > 2]
list_comp

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [129]:
# List comprehension example for concatenating lists
everything = []
everything = [everything + chunk for chunk in list_of_lists]
everything

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [111]:
# Dictionary of strings
dict_of_strings = {1:'a', 2:'as', 3:'bat', 4:'car', 5:'dove', 6:'python'}
dict_of_strings

{1: 'a', 2: 'as', 3: 'bat', 4: 'car', 5: 'dove', 6: 'python'}

In [112]:
# Dictionary comprehension 
# Note the use of the items() dict function to access each key:value pair
dict_comp = {key : value.upper() for (key ,value) in dict_of_strings.items() if len(value) > 2}  
dict_comp

{3: 'BAT', 4: 'CAR', 5: 'DOVE', 6: 'PYTHON'}

In [113]:
# Set Comprehension
set_of_strings = {'a', 'as', 'bat', 'car', 'dove', 'python'}
set_of_strings

{'a', 'as', 'bat', 'car', 'dove', 'python'}

In [114]:
set_comp = {str(s).upper() for s in set_of_strings if len(s) > 2}
set_comp           

{'BAT', 'CAR', 'DOVE', 'PYTHON'}

In [131]:
# List comprehension example showing uniques lengths of the strings is a list
unique_lengths = {len(x) for x in list_of_strings}
unique_lengths

{1, 2, 3, 4, 6}

In [133]:
# Map function - List
set(map(len, list_of_strings))

{1, 2, 3, 4, 6}

In [135]:
# Lookup mapping using a dict
loc_mapping = {val : index for index, val in enumerate(list_of_strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested List comprehension

Given two lists of names organized by language, create a single list of names with 2 or more of the letter 'a' in them

In [140]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar','']]
names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count('a') >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

['Maria', 'Natalia']

Now, do the same exercise using a single nested list comprehension

The ```for``` parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before.

In [141]:
result = [name for names in all_data for name in names if name.count('a') >= 2]
result

['Maria', 'Natalia']

“Flatten” a list of tuples of integers into a simple list of integers

In [142]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

The ```for``` expressions are in the same order as if a nested for loop was used instead of a list comprehension

In [144]:
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(x)
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Greate a list of lists for the tuples rather than a flattened list of all the inner elements

In [145]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## Functions

Use a function to group a reusable set of Python statements as method for reducing and efficiently organizing code

In [153]:
# Note x and y are positional arguments and a and b are keyword arguments. 
def my_function(x, y, a=1.5, b=2.2):
    if a > 1:
        return b * (x + y)
    else:
        return b / (x + y)

In [160]:
my_function(5, 6, a=3.5, b=2)

22

In [156]:
my_function(3.14, 7, 3.5, 4)

40.56

In [157]:
my_function(10, 20)

66.0

In [158]:
my_function(x=5, y=6, a=7, b=3)

33

**Function Rules** 

The main restriction on function arguments is that the keyword arguments must follow the positional arguments (if any). 

Keyword arguments can be specified in any order freeing the coder from having to remember which order the function arguments were specified in and only what their names are

In [161]:
my_function(5, 6, b=2, a=3.5)

22

### Namespaces, Scope, and Local Functions

Functions can access variables in two different scopes: ***global*** and ***local***

An alternative and more descriptive name describing a variable scope in Python is a ***namespace***
- Any variables that are assigned within a function by default are assigned to the local namespace.
- The local namespace is created when the function is called and immediately populated by the function’s arguments.
- After the function is finished, the local namespace is destroyed.

In [170]:
# When func() is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits
def func():
    a = []
    for i in range(5):
        a.append(i)

In [171]:
# Now, define empty list a as a global scope so that a exists after the function call
a = []
def func():
    for i in range(5):
        a.append(i)

### Returning Multiple Values

A function can return multiple values.

The function is actually just returning one object, namely a ***tuple***, which is then being unpacked into the result variables

In [178]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c
return_value = f()
return_value

(5, 6, 7)

In [179]:
a, b, c = f()
(a + b) * c

77

### Functions Are Objects

Data cleaning/transformation exercise

In [184]:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda','south carolina##', 'West virginia?']

In [185]:
import re

In [186]:
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()               # Strip white space
        value = re.sub('[!#?]', '', value)  # Remove punctuation symbols
        value = value.title()               # Standardize on proper capitalization
        result.append(value)
    return result

In [187]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

Make *clean_strings* more reusable by making a list of the operations to apply to particular strings

In [188]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [189]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

Use functions as arguments to other functions like the built-in map function shown below

In [190]:
for x in map(remove_punctuation, states):
    print(x)

 Alabama 
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia


### Anonymous (Lambda) Functions

***Anonymous*** or ***lambda*** functions are a way of writing functions consisting of a single statement, the result of which is the return value. 

The lambda keyword has no meaning other than declaring an anonymous function:

In [191]:
# Original function
def short_function(x):
    return x * 2

In [192]:
# Lambda function
equiv_anon = lambda x: x * 2

There are many cases where data transformation functions will take functions as arguments. 

It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable.

In [193]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

In [194]:
ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [196]:
# Sort by number of distinct letters in each string of a list
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

### Currying: Partial Argument Application

Currying is computer science jargon that means deriving new functions from existing ones by partial argument application.

In [207]:
def add_numbers(x, y):
    return x + y

In [208]:
# Anonymous function that calls an existing function.  Vairable y is curried in this example.
add_five = lambda y: add_numbers(5, y)    

In [209]:
add_five(4)

9

In [210]:
# The built-in functools module can simplify this process using the partial function
from functools import partial
add_five = partial(add_numbers, 5)

In [211]:
add_five(4)

9

### Generators

Use iterator protocol as a generic way to make objects (such sequences, lists, or lines in a file) iterable

In [212]:
some_dict = {'a': 1, 'b': 2, 'c': 3}

In [213]:
for key in some_dict:
    print(key)

a
b
c


In [217]:
# The syntax above is interpeted to create an iterator:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x7f896c3072c0>

An iterator is any object that will yield objects to the Python interpreter when used in a context like a ```for``` loop.

Most methods expecting a list or list-like object will also accept any ***iterable*** object.

In [218]:
list(dict_iterator)

['a', 'b', 'c']

A ***generator*** is a concise way to construct a new iterable object.

- normal functions execute and return a single result at a time
- generators return a sequence of multiple results lazily, pausing after each one until the next one is requested

In [219]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [221]:
gen = squares()
gen

<generator object squares at 0x7f8956ddf6d0>

It is not until you request elements from the generator that it begins executing its code

In [222]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

#### Generator expresssions

A generator expression is an analogue to list, dict, and set comprehensions

Create a generator expression by enclosing what would otherwise be a list comprehension within parentheses instead of
brackets

In [227]:
# Example of verbose generator
def _make_gen():
    for x in range(10):
        yield x ** 2
gen = _make_gen()

In [228]:
# Example with generator expression
gen = (x ** 2 for x in range(10))
gen

<generator object <genexpr> at 0x7f8956ddfe40>

In [229]:
for x in gen:
    print(x, end=' ')

0 1 4 9 16 25 36 49 64 81 

Generator expressions can be used instead of list comprehensions as function arguments in many cases

In [230]:
sum(x ** 2 for x in range(100))

328350

In [231]:
dict((i, i **2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module

The standard library ***itertools*** module has a collection of generators for many common data algorithms

Useful ***itertools*** functions
- ```combinations(iterable, k)``` - Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement
- ```permutations(iterable, k)``` - Generates a sequence of all possible k-tuples of elements in the iterable, respecting order
- ```groupby(iterable[, keyfunc])``` - Generates (key, sub-iterator) for each unique key
- ```product(*iterables, repeat=1)``` - Generates the Cartesian product of the input iterables as tuples, similar to a
nested for loop

In [232]:
import itertools

In [233]:
first_letter = lambda x: x[0]

In [234]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [235]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### Errors and Exception Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs

In [236]:
float('1.2345')

1.2345

In [237]:
float('something')

ValueError: could not convert string to float: 'something'

Use try-except to return gracefully

In [247]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [248]:
attempt_float('1.2345')

1.2345

In [249]:
attempt_float('something')

'something'

In [251]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

```float``` can raise exceptions other than ```ValueError```, so suppress specific errors: ```ValueError``` and ```TypeError```

In [252]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [253]:
attempt_float('something')

'something'

In [254]:
attempt_float((1, 2))

(1, 2)

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the ```try``` block succeeds or not.

Use ```finally```

In [258]:
def write_to_file(f):
    return # Do nothing

path = "some_file"
f = open(path, 'w')
try:
    write_to_file(f)
finally:
    f.close()    

The file handle f will always get closed.

Make code execute only if the ```try``` block succeeds using ```else```

In [259]:
f = open(path, 'w')
try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close() 

Succeeded


## Files and the Operating System

To open a file for reading or writing, use the built-in open function with either a relative or absolute file path

In [262]:
path = 'some_file'
f = open(path)

By default, the file is opened in read-only mode 'r'. 

Treat the file handle f like a list and iterate over the lines.

In [263]:
for line in f:
    pass

The lines come out of the file with the end-of-line (EOL) markers intact. 

In [264]:
lines = [x.rstrip() for x in open(path)]
lines

['The cow jumped over the moon.',
 'The moon orbits the earth.',
 'Does the cow orbit the earth?']

Closing the file releases its resources back to the operating system

In [265]:
f.close()

Use the with statement to automatically close the file f

In [267]:
with open(path) as f:
    lines = [x.rstrip() for x in f]
lines    

['The cow jumped over the moon.',
 'The moon orbits the earth.',
 'Does the cow orbit the earth?']

Common methods for readable files
- ```read``` - advances the file handle’s position by the number of bytes read
- ```seek``` - changes the file position to the indicated byte in the file
- ```tell``` - provides current position

In [291]:
f = open(path)
f.read(10)

'The cow ju'

In [292]:
f2 = open(path, 'rb') # Binary mode
f2.read(10)

b'The cow ju'

In [293]:
f.tell()

10

In [294]:
f.seek(3)
f.read(5)

' cow '

In [295]:
f.readlines(5)

['jumped over the moon.\n']

In [296]:
f.close()
f2.close()