# Data Structures ad Sequences

### Tuples

Their contents cannot be modified

In [None]:
tup = tuple('string')
tup

In [None]:
tup[0]

In [None]:
tup = tuple(['foo', [1, 2], True])

In [None]:
tup[2] = False

In [None]:
# if an object inside a tuple is mutable, such as a list, we can modify it in-place
tup[1].append(4)

In [None]:
tup

### Unpacking tuples

In [None]:
tup = 4, 5, (6, 7)

In [None]:
a, b, (c, d) = tup

In [None]:
d

### Swapping variable names

In [None]:
a, b = 1, 2

In [None]:
a

In [None]:
b

In [None]:
b, a =a, b

In [None]:
a

In [None]:
b

In [None]:
seq = [(1, 3, 3), (4, 5, 6), (7, 8, 9)]

In [None]:
for a, b, c in seq:
    print('a={0}, b={1}, c={2}'.format(a, b, c))

In [None]:
values=1,2,3,4,5,2,5,4,0,3

In [None]:
a, b, *_ = values

In [None]:
_

In [None]:
values.count(3)

# List

In contrast with tuples, their contents can be modified. The list function is frequently used in data processing as a way to materialize an iterator or generator expression

In [None]:
a_list = [2, 3, 5, None]

In [None]:
tup = tuple(['foo', 'bar', 'baz'])

In [None]:
b_list = list(tup)

In [None]:
b_list

In [None]:
b_list[1] = 'peekaboo'

In [None]:
b_list

In [None]:
b_list.append('dward')

In [None]:
b_list

In [None]:
b_list.insert(2, 'beck')

In [None]:
b_list

In [None]:
b_list.pop(1)

In [None]:
b_list

In [None]:
b_list.append('foo')

In [None]:
b_list

In [None]:
b_list.remove('foo')

In [None]:
b_list

In [None]:
'beck' in b_list

# List concatenation 
If you have a list already defined, you can append multiple elements to it using the
extend method:

In [None]:
list_of_list = [[1, True, 'foo', 'ManU', 'Courage']]

In [None]:
list_of_list

In [None]:
def extend_list(list_of_list:list) ->list:
    everything = []
    for chunk in list_of_list:
        everything.extend(chunk)
    return everything

In [None]:
everything = extend_list(list_of_list)
everything

# Sorting

In [None]:
b = ['saw', 'small', 'he', 'foxes', 'six']

In [None]:
b.sort(key=len)

In [None]:
b

# Slicing

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

In [None]:
seq[1:5]

# Enumerate

It’s common when iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:

In [None]:
i = 0
for value in seq:
    # do something with value
    print(value * value)
    i += 1

When you are indexing data, a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence:

In [None]:
some_list = ['foo', 'bar', 'baz']
mapping = {}

In [None]:
for i, v in enumerate(some_list):
    mapping[v]= i

In [None]:
mapping

The sorted function returns a new sorted list from the elements of any sequence:

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])

# Zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:

In [None]:
seq1 = ['foo', 'bar', 'baz']

In [None]:
seq2 = ['one', 'two', 'three']

In [None]:
zipped = zip(seq1, seq2)

In [None]:
list(zipped)

In [None]:
seq3 = [False, True]

In [None]:
list(zip(seq1, seq2, seq3))

A very common use of zip is simultaneously iterating over multiple sequences, possi‐
bly also combined with enumerate:

In [None]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

Given a “zipped” sequence, zip can be applied in a clever way to “unzip” the sequence. Another way to think about this is converting a list of rows into a list of columns.

In [None]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

In [None]:
first_names, last_names = zip(*pitchers)

In [None]:
first_names

In [None]:
last_names

# dict

In [None]:
d1 = {'a': 'some value', 'b': [1, 2, 3, 4]}

In [None]:
d1

In [None]:
d1[7] = 'an integer'
d1

In [None]:
d1['b']

In [None]:
list(d1.keys())

In [None]:
list(d1.values())

We can merge one dict into another using the update method:

In [None]:
d1.update({'b': 'foo', 'c': 12})
d1

# Creating dicts from sequences

In [None]:
mapping = dict(zip(range(5), reversed(range(5))))
mapping

# Default values

In [None]:
#It's very common to have logic like:
#if key in some_dict:
#    value = some_dict[key]
#else:
#    value = default_value

In [None]:
#the above if-else block cab be written as follows:
#value = some_dict.get(key, default_value)
# get by default will return None if the key is not present, while pop will raise an exception

In [10]:
words = ['apple', 'bat', 'bar', 'atom', 'book', 'cook', 'cool', 'quick']

In [11]:
by_letter = {}

In [None]:
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

In [None]:
by_letter

In [12]:
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

In [13]:
by_letter

{'a': ['apple', 'atom'],
 'b': ['bat', 'bar', 'book'],
 'c': ['cook', 'cool'],
 'q': ['quick']}

# Valid dict key types

While the values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too. We can check whether an object is hashable (can be used as a key in a dict) with the hash function.

In [14]:
hash('string')

-6347026155153897450

In [15]:
hash((1, 2, (2, 3)))

1097636502276347782

In [16]:
hash((1, 2, [2, 3])) # fails because lists are mutable 

TypeError: unhashable type: 'list'

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can

In [17]:
d = {}

In [18]:
d[tuple([1, 2, 3])] =5

In [19]:
d

{(1, 2, 3): 5}

# Set

A set is an unordered collection of unique elements. You can think of them like dicts, but keys only, no value

In [24]:
a= set([1, 2, 3, 4, 5, 2, 3])
a

{1, 2, 3, 4, 5}

In [25]:
b = set([3, 4, 5, 6, 7, 8, 6, 9, 8])
b

{3, 4, 5, 6, 7, 8, 9}

In [26]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [27]:
a | b # binary operator

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [28]:
a.intersection(b) # a & b

{3, 4, 5}

# List, Set, and Dict Comprehension

They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression

# List Comprehension

In [29]:
# [expr for val in collection if condition]

In [30]:
#result = []
#for val in collection:
#    if condition:
#        result.append(expr)

In [31]:
# For example, given a list of strings, we could filter out strings with length 2 or less
# and also convert them to uppercase like this
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [32]:
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

# Dict Comprehension

In [29]:
# dict_comp = {key-expr: value-expr for value in collection if condition}

In [35]:
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

# Set Comprehension

In [29]:
# set_comp = {exp for value in collection if condition}

In [33]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [38]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
           ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

In [39]:
names_of_interest = []
for names in all_data:
    double_a = [name for name in names if name.count('a') >= 2]
    names_of_interest.extend(double_a)

In [40]:
names_of_interest

['Maria', 'Natalia']

In [41]:
# Method 2
result = [name for names in all_data for name in names
         if name.count('a') >= 2]

In [42]:
result

['Maria', 'Natalia']

In [43]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [53]:
flattened = [x*2 for tup in some_tuples for x in tup]
flattened

[2, 4, 6, 8, 10, 12, 14, 16, 18]

In [51]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x*2)

In [52]:
flattened

[2, 4, 6, 8, 10, 12, 14, 16, 18]

In [54]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Functions

In [55]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a': a, 'b': b, 'c': c}

In [56]:
f()

{'a': 5, 'b': 6, 'c': 7}

In [57]:
# Functions are Objects

In [70]:
states = ['     Alabama   ',   'Georgia!', 'Georgia', 'georgia', 'Florida',
         'south  carolina##',   'West virgina?']

In [71]:
# cleaning : 
#stripping whitespace, removing punctuation symbols and Standardizing on proper capitalization.

In [72]:
import re

In [78]:
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [79]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South  Carolina',
 'West Virgina']

# Reusable and Generic Function
An alternative approach that you may find useful is to make a list of the operations you want to apply to a particular set of strings

In [91]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

In [92]:
clean_ops = [str.strip, remove_punctuation, str.title]

In [93]:
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [94]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South  Carolina',
 'West Virgina']

# Use functions as arguments to other functions

In [99]:
for x in map(remove_punctuation, states):
    print(x)

     Alabama   
Georgia
Georgia
georgia
Florida
south  carolina
West virgina


# Lambda Functions

It's way of writing functions consisting of a single statement, the result of which is the return value.

In [105]:
def short_function(x):
    return x * 2

In [106]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

In [107]:
ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [108]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [109]:
strings.sort(key=lambda x: len(set(list(x))))

In [110]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

# Partial Argument Application

In [111]:
def add_numbers(x, y):
    return x + y

In [116]:
add_five = lambda y: add_numbers(5, y)

In [117]:
from functools import partial
add_five = partial(add_numbers, 5)

# Generators

Having a consistent way to iterate over sequences, like objects in a list or lines in a file, is an important Python feature. This is accomplished by means of the iterator protocol, a generic way to make objects iterable. For example, iterating over a dict yields the dict keys:

In [118]:
some_dict = {'a':1, 'b':2, 'c': 3}

In [119]:
for key in some_dict:
    print(key)

a
b
c


An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop

In [120]:
dict_iterator = iter(some_dict)

In [121]:
dict_iterator

<dict_keyiterator at 0x7f90e0cc1a48>

In [122]:
list(dict_iterator)

['a', 'b', 'c']

A generator is a concise way to construct a new iterable object

In [123]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [124]:
gen = squares()

In [125]:
gen

<generator object squares at 0x7f91096cfb48>

In [126]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [127]:
gen = ( x ** 2 for x in range(100))

In [128]:
gen

<generator object <genexpr> at 0x7f90e062e728>

In [129]:
def _make_gen():
    for x in range(100):
        yield x ** 2

In [130]:
gen = _make_gen()

In [131]:
sum(x ** 2 for x in range(100))

328350

In [132]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# itertools module

In [133]:
import itertools

In [134]:
first_letter = lambda x: x[0]

In [135]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [136]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


# Error and Exceptions Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs. In data analysis applications, many functions only work on certain kinds of input

In [137]:
float('1.2345')

1.2345

In [138]:
float('something')

ValueError: could not convert string to float: 'something'

Suppose we wanted a version of float that fails gracefully, returning the input argu‐ ment. We can do this by writing a function that encloses the call to float in a try/ except block

In [139]:
def attemp_float(x):
    try:
        return float(x)
    except:
        return x

In [140]:
attemp_float('1.2345')

1.2345

In [141]:
attemp_float('something')

'something'

You might notice that float can raise exceptions other than ValueError

In [142]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You might want to only suppress ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program. To do that, write the exception type after except

In [143]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [144]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

You can catch multiple exception types by writing a tuple of exception types instead (the parentheses are required):

In [145]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [146]:
attempt_float((1, 2))

(1, 2)

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the try block succeeds or not. To do this, use finally:

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

# Files and the Operating System

In [149]:
path = '../data/example/segismundo.txt'

In [150]:
f = open(path)

By default, the file is opened in read-only mode 'r'. We can then treat the file handle f like a list and iterate over the lines like so:

In [151]:
for line in f:
    pass

In [152]:
lines = [x.rstrip() for x in open(path)]

In [153]:
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.']

In [154]:
f.close()

One of the ways to make it easier to clean up open files is to use the with statement:

In [155]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

This will automatically close the file f when exiting the with block.

In [157]:
lines

['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.']

In [158]:
f = open(path)

In [159]:
f.read(10)

'Sueña el r'

In [160]:
f.tell()

11

You can check the default encoding in the sys module:

In [161]:
import sys

In [162]:
sys.getdefaultencoding()

'utf-8'

In [163]:
f.seek(3)

3

In [164]:
f.close()

# Write text to a file

In [165]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

In [166]:
with open('tmp.txt') as f:
    lines = f.readlines()

In [167]:
lines

['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.']