## Python Built in Data Structures



## Tuples

Immutable sequence of elements. 

In [1]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [7]:
tuple([3, 1, 'asdf'])

(3, 1, 'asdf')

In [8]:
tuple('strings')

('s', 't', 'r', 'i', 'n', 'g', 's')

In [10]:
tup[1]

5

### Tuples can contain any kind of objects including a list or boolean values

In [12]:
tup_1 = ('foo', [3, 9, 4], True)

If an object inside a tuple is mutable, such as a list, it can be modified in place. 

In [15]:
tup_1[1].append(5)
tup_1

('foo', [3, 9, 4, 5, 5], True)

Tuples can be concatenated to produce longer tuples.

In [16]:
tup + tup_1

(4, 5, 6, 'foo', [3, 9, 4, 5, 5], True)

Multiplying with an integer concatenates that many copies of tuples (same happens with lists).

In [17]:
tup * 3

(4, 5, 6, 4, 5, 6, 4, 5, 6)

Assigning elements of a tuple to variables

In [18]:
a, b, c = tup

print(a,b,c)

4 5 6


Iterating over elements of a tuple (or a list)

In [23]:
seq = (1, 2, 3), (4, 5, 6), (7, 8, 9)

for a, b, c in seq:
    print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


### Tuple Methods

count(): to count the number of occurences of an element of a tuple (or a list).


In [25]:
tup_2 = 1, 2, 3, 4, 2, 3, 4, 2, 5

tup_2.count(4)

2

## Lists

Mutable sequence of elements. 

A tuple can be converted in to a list using list(tuple).

Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions.

In [29]:
gen = range(10)
print(gen)

list(gen)

range(0, 10)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

### append()

Elements can be added to the end of the list using append().

In [33]:
a_list = [12, 32, 43]

In [35]:
a_list.append(55)
a_list

[12, 32, 43, 55, 55]

### insert()

can insert element(s) at a specific location. 

In [38]:
a_list.insert(1, 76)
a_list

[12, 76, 76, 32, 43, 55, 55]

insert() is computationally more expensive than append().

### pop()
to remove element(s) from a location. 

In [40]:
a_list.pop(1)
a_list

[12, 76, 32, 43, 55, 55]

### remove()

to delete an element using its value. In case of more than one occurerances, the first instance of the value is removed.

In [41]:
a_list.remove(55)

In [42]:
a_list

[12, 76, 32, 43, 55]

### in and not in 

to see if the value is an element of the list or not.

In [43]:
76 in a_list

True

In [44]:
43 not in a_list

False

### Concatenating lists

In [45]:
a_list + ['fd', 3, 9]

[12, 76, 32, 43, 55, 'fd', 3, 9]

### extend()

To add multiple elements to the list. 

In [46]:
a_list.extend(['gh', 8, (0, 4)])

In [47]:
a_list

[12, 76, 32, 43, 55, 'gh', 8, (0, 4)]

When concatenating, extend() is cheaper than +.

A list can be sorted in-place using **sort()** function. 

In [52]:
b = [2, 54, 6, 3]
b.sort()
b

[2, 3, 6, 54]

sort(key=option) will sort the elements based on the criterion defined by option. 

In [54]:
c = ['saw', 'small', 'He', 'foxes', 'six']
c.sort(key=len)

c

['He', 'saw', 'six', 'small', 'foxes']

The built-in bisect module implements binary search and insertion into a **sorted** list. bisect.bisect finds the location where an element should be inserted to keep it sorted, while bisect.insort actually inserts the element into that location. Bisect itself does not sort lists. 

In [62]:
import bisect
d = [1, 2, 2, 2, 3, 4, 7]

print(bisect.bisect(d, 5))
bisect.insort(d, 5)
d

6


[1, 2, 2, 2, 3, 4, 5, 7]

### Slicing

In [63]:
d[2:4]

[2, 2]

Slices can also be assigned to with a sequence.

In [64]:
d[2:4] = [6, 9]

In [65]:
d

[1, 2, 6, 9, 3, 4, 5, 7]

In [66]:
d.sort()

In [67]:
d

[1, 2, 3, 4, 5, 6, 7, 9]

In [68]:
d[:4]

[1, 2, 3, 4]

In [69]:
d[2:]

[3, 4, 5, 6, 7, 9]

Negative indices slice the sequence relative to the end.

In [70]:
d[-4:]

[5, 6, 7, 9]

A *step* can also be used after a second colon to, say, take every other element. [::-1] can reverse the elements of a tuple or list. 

In [72]:
d[::3]

[1, 4, 7]

In [75]:
e = d[::-1]

In [76]:
e

[9, 7, 6, 5, 4, 3, 2, 1]

### enumerate()

enumerate returns a sequence of (i, value) tuples.

In [78]:
for value in enumerate(d):
    print(i, value)

7 (0, 1)
7 (1, 2)
7 (2, 3)
7 (3, 4)
7 (4, 5)
7 (5, 6)
7 (6, 7)
7 (7, 9)


In [80]:
enumerate(d)

<enumerate at 0x1066245a0>

When indexing data, a helpful pattern that uses enumerate is computing a *dict* mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence.

In [87]:
some_list = ['foo', 'bar', 'baz']
mapping = {}

for i, v in enumerate(some_list):
    mapping[v] = i
    print(v, i)
    
mapping

foo 0
bar 1
baz 2


{'foo': 0, 'bar': 1, 'baz': 2}

In [88]:

dict2 = {}

for i, v in enumerate(d):
    dict2[v] = i
    print(v, i)
    
dict2

1 0
2 1
3 2
4 3
5 4
6 5
7 6
9 7


{1: 0, 2: 1, 3: 2, 4: 3, 5: 4, 6: 5, 7: 6, 9: 7}

### zip()

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples.

In [90]:
seq1 = ['foo', 'bar', 'baz']

seq2 = ['one', 'two', 'three']

seq3= zip(seq1, seq2)
list(seq3)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate.


In [91]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


Given a “zipped” sequence, zip can be applied in a clever way to “unzip” the sequence. Another way to think about this is converting a list of rows into a list of columns.

In [94]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]

first_names, last_names = zip(*pitchers)
first_names

('Nolan', 'Roger', 'Schilling')

In [95]:
last_names

('Ryan', 'Clemens', 'Curt')

## Dictionaries

*dict* is likely the most important built-in Python data structure. A more common name for it is hash map or associative array. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating one is to use curly braces {} and colons to separate keys and values.

In [96]:
d1 = {'a': 43, 'b': [2, 3, 6, 0]}
d1

{'a': 43, 'b': [2, 3, 6, 0]}

In [97]:
d1[0] = 'adf'

In [98]:
d1

{'a': 43, 'b': [2, 3, 6, 0], 0: 'adf'}

In [104]:
d1['b']


[0, 2, 3, 6]

In [105]:
'a' in d1

True

In [108]:
d1[2] = 'delete'

d1

{'a': 43, 'b': [0, 2, 3, 6], 0: 'adf', 2: 'delete'}

In [109]:
del d1[2]

In [110]:
d1

{'a': 43, 'b': [0, 2, 3, 6], 0: 'adf'}

pop(key) deletes and returns the value associate with key. 

In [111]:
d1.pop(0)

'adf'

In [113]:
print(d1.keys())
print(d1.values())

dict_keys(['a', 'b'])
dict_values([43, [0, 2, 3, 6]])


### update()
to update the dictionary by changing current values and/or adding new ones. 

In [114]:
d1.update({'b' : 'foo', 'c' : 12})
d1

{'a': 43, 'b': 'foo', 'c': 12}

### Creating dicts from sequences

In [116]:
key_list = seq1
value_list = seq2

mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
    
mapping

{'foo': 'one', 'bar': 'two', 'baz': 'three'}

### Valid dict key types

While the values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dict) with the hash function.

In [117]:
hash('a')

-3319291657969026086

In [118]:
hash((5, 6 [7, 6])) ## fails because lists are mutable

TypeError: 'int' object is not subscriptable

# List, Set, and Dict Comprehensions

[*return this* for *each element* of *the list* if *condition* is true]


In [119]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]


['BAT', 'CAR', 'DOVE', 'PYTHON']

In [120]:
 unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [121]:
set(map(len, strings))

{1, 2, 3, 4, 6}

In [122]:
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

### Nested List Comprehensions

In [123]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

You might have gotten these names from a couple of files and decided to organize them by language. Now, suppose we wanted to get a single list containing all names with two or more e’s in them. We could certainly do this with a simple for loop.

In [124]:
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)

using nested list comprehension to achieve the same task.


In [125]:
result = [name for names in all_data for name in names
       if name.count('e') >= 2]

In [126]:
result

['Steven']

In [129]:
a_list = range(10)
a_list

range(0, 10)

In [133]:
[x**2 for x in a_list]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

### Take out even numbers from a list

In [135]:
[x*x for x in a_list if x % 2 == 0]

[0, 4, 16, 36, 64]

### Iterating over two list combined using zip

In [139]:
b_list = range(10,19)
[x+y for x, y in zip(a_list, b_list)]

[10, 12, 14, 16, 18, 20, 22, 24, 26]

In [140]:
a_list

range(0, 10)

In [142]:
list(zip(a_list, b_list))

[(0, 10),
 (1, 11),
 (2, 12),
 (3, 13),
 (4, 14),
 (5, 15),
 (6, 16),
 (7, 17),
 (8, 18)]

help("json") to look up the documentation of a module. 

In [156]:
list(enumerate(a_list))

[(0, 0),
 (1, 1),
 (2, 2),
 (3, 3),
 (4, 4),
 (5, 5),
 (6, 6),
 (7, 7),
 (8, 8),
 (9, 9)]

### Converting a list of tuples in to a list (flattening the list of tuples)

In [157]:
# using list comprehension
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
[x for tup in some_tuples for x in tup]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [159]:
# using for loops
flattened = []
for tup in some_tuples:
    for x in tup:
        flattened.append(x)
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## Functions
Each function can have *positional arguments* and *keyword arguments*. Keyword arguments are most commonly used to specify default values or optional arguments. In the following function, x and y are positional arguments while z is a keyword argument.

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages. 


In [162]:
def some_func(x, y, z=1.5):
    if z >= 3:
        return (x+y)/z
    else:
        return (x-y)/z

In [163]:
some_func(4, 2, 9)

0.6666666666666666

## A function to Clean Text Data.

In [None]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
        'south   carolina##', 'West virginia?']

In [180]:
import re

def clean_text_data(strings):
    result = []
    for words in strings:
        words = words.strip()
        words = re.sub('[!#?]', '', words)
        words = words.title()
        result.append(words)
    return result

In [181]:
clean_text_data(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

## Making a List of Operations

In [182]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [184]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

We can use functions as arguments to other functions like the built-in map function, which applies a function to a sequence of some kind.

In [185]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


As another example, suppose we wanted to sort a collection of strings by the number of **distinct** letters in each string.

## Lambda/Anonymous Function

In [186]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

strings.sort(key=lambda x: len(set(list(x))))

In [187]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

## Currying: Partial Argument Application

Currying is computer science jargon (named after the mathematician Haskell Curry) that means deriving new functions from existing ones by partial argument application. For example, suppose we had a trivial function that adds two numbers together.

In [188]:
def add_numbers(x, y):
    return x + y

In [190]:
add_five = lambda y: add_numbers(5, y)

In [192]:
add_five(4)

9

## Generators

*iterator protocol* is a generic way to make objects iterable in python. e.g. dictionaries are iterated on their keys.

A generator is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. To create a generator, use the yield keyword instead of return in a function.

In [199]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [197]:
mapping

{'foo': 'one', 'bar': 'two', 'baz': 'three'}

In [198]:
for key in mapping:
    print(key)

foo
bar
baz


In [208]:
gen = squares()
gen

<generator object squares at 0x103b83930>

It is not until we request elements from the generator that it begins executing its code.

In [209]:
for x in gen:
    print(x)

Generating squares from 1 to 100
1
4
9
16
25
36
49
64
81
100


## Errors and Exception Handling

Suppose we wanted a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to float in a try/except block.



In [210]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [211]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the try block succeeds or not. To do this, use finally.

In [218]:
path = '/Users/malikrao/Documents/Python_for_Data_Science/ml_models/a_list'

f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

NameError: name 'write_to_file' is not defined

In [219]:

f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

Failed


In [214]:
%pwd


'/Users/malikrao/Documents/Python_for_Data_Science/ml_models'