# Built-in Data Structures, Functions, 

## Data Structures and Sequences

### Tuple

A tuple is a fixed-length, immutable sequence of Python objects. 
The easiest way to create one is with a comma-separated sequence of values:

In [0]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [0]:
#must use paranthesis for complicated tuples: a tuple of tuples

nested_tup = (4, 5, 6), (7, 8)
nested_tup

In [0]:
#You can convert any sequence or iterator to a tuple by invoking tuple:

t=tuple([4, 0, 2])
t2=tuple("string")
tup = tuple(['foo', [1, 2], True])
tup


('foo', [1, 2], True)

In [0]:
#tup[1]=[2,4]#can  be accessed with C like indexes
print (tup[1])

[1, 2]


In [0]:
tup = tuple(['foo', [1, 2], True])#even if the elements are mutable once in a tuple 
#once the tuple is cre‐ ated it’s not possible to modify which object is stored in each slot:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

In [0]:
#If an object inside a tuple is mutable, such as a list, you can modify it in-place:
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [0]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

In [0]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking tuples

In [0]:
tup = (4, 5, 6)
a, b, c = tup
a


4

In [0]:
b

5

In [0]:
#a funny swap
a,b=b,a
a

In [0]:
b

In [0]:
tup = 4, 5, (6, 7)
#Even sequences with nested tuples can be unpacked:
a, b, (c, d) = tup
d

In [0]:
tmp = a
a = b
b = tmp

In [0]:
a, b = 1, 2
a
b
b, a = a, b
a
b

In [0]:
#A common use of variable unpacking is iterating over sequences of tuples or lists:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [0]:
values = 1, 2, 3, 4, 5
a,*rest,b = values # beware * wild char
a, b
(rest)

[2, 3, 4]

In [0]:
#traditionally _ is used for unwanted variables
a, b, *_ = values

#### Tuple methods

In [0]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

### List

In contrast with tuples, lists are variable-length and their contents can be modified in-place. You can define them using square brackets [] or using the list type funtion:

In [0]:
a_list = [2, 3, 7, None]
tup = ('foo', 'bar', 'baz')
b_list = list(tup)

In [0]:
b_list

['foo', 'bar', 'baz']

In [0]:
b_list[1] = 'peekaboo'
b_list

['foo', 'peekaboo', 'baz']

In [0]:
#The list function is frequently used in data processing as a way to materialize an iterator!
#or generator! expression:

gen = range(10)
gen
list(gen)


#### Adding and removing elements

In [0]:
b_list.append('dwarf')
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

In [0]:
b_list.insert(1, 'red')
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

In [0]:
b_list.pop(2)

'peekaboo'

In [0]:
b_list

['foo', 'red', 'baz', 'dwarf']

In [0]:
b_list.append('foo')
b_list

['foo', 'red', 'baz', 'dwarf', 'foo']

In [0]:
b_list.remove('foo')
b_list

['red', 'baz', 'dwarf', 'foo']

In [0]:
'dwarf' in b_list

True

In [0]:
'dwarf' not in b_list

False

Checking whether a list contains a value is a lot slower than doing so with dicts and sets .


#### Concatenating and combining lists

In [0]:
[4, None, 'foo'] + [7, 8, (2, 3)]

In [0]:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

In [0]:
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)

<i>is faster than <i>

In [0]:
everything = []
for chunk in list_of_lists:
    everything = everything + chunk

#### Sorting

In [0]:
a = [7, 2, 5, 1, 3]
a.sort()
a

In [0]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

#### Binary search and maintaining a sorted list

In [1]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]
bisect.bisect(c, 2)
bisect.bisect(c, 5)
bisect.insort(c, 6)
c

[1, 2, 2, 2, 3, 4, 6, 7]

#### Slicing

In [0]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

In [0]:
seq[3:4]

In [0]:
#Slices can also be assigned to with a sequence:
seq[3:4] = [6, 3]
seq

While the element at the start index is included, the stop index is not included, so
that the number of elements in the result is stop - start.

In [0]:
seq[:5]
seq[3:]

In [0]:
#Negative indices slice the sequence relative to the end:

seq[-4:]

In [0]:
seq[-6:-2]

   see page 59

In [0]:
#A step can also be used after a second colon to, say, take every other element:
seq[::2]

In [0]:
#reverse alistor tuple
seq[::-1]

### Built-in Sequence Functions

#### enumerate

In [0]:
# enumerate returns asequence of (i, value) tuples:
for i, value in enumerate(collection):
   # do something with value

When you are indexing data, a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence:

In [0]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
type(mapping)

In [0]:
for i, v in enumerate(some_list):
    print(i,v)

In [0]:
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

#### sorted

The sorted function returns a new sorted list from the elements of any sequence:


In [0]:
sorted([7, 1, 2, 6, 0, 3, 2])
sorted('horse race')

#### zip

zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:

In [0]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)

In [0]:
list(zipped)

In [0]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

A very common use of zip is simultaneously iterating over multiple sequences, possi‐
bly also combined with enumerate:

In [0]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

Given a “zipped” sequence, zip can be applied in a clever way to “unzip” the sequence. Another way to think about this is converting a list of rows into a list of columns. The syntax, which looks a bit magical, is:

In [0]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
            ('Schilling', 'Curt')]
first_names, last_names = zip(*pitchers)
first_names
last_names

#### reversed

 reversed is a <b>generator</b> , so it does not create the reversed sequence until materialized

In [0]:
list(reversed(range(10)))

### dict (aka hash map or associative array)

dict is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating one is to use curly braces {} and colons to separate keys and values:

In [0]:
empty_dict = {}
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1

In [0]:
#You can access, insert, or set elements using the same syntax as for accessing elements of a list :
d1[7] = 'an integer'
d1
d1[7]

In [0]:
#You can check if a dict contains a !key!  with in keyword
'b' in d1

In [0]:
d1[5] = 'some value'
d1
d1['dummy'] = 'another value'
d1

In [0]:
#You can delete values either using the del keyword 
#or the pop method (which simul‐ taneously returns the value and deletes the key):
del d1[5]
d1
ret = d1.pop('dummy')
ret

In [0]:
d1

In [0]:
list(d1.keys())
list(d1.values())

In [0]:
d1

In [0]:
#You can merge one dict into another using 
d1.update({'b' : 'foo', 'c' : 12})
d1

#### Creating dicts from sequences

In [0]:
mapping = {}
mapping["yeni"]=2
mapping

In [0]:
for key, value in zip(key_list, value_list):
    mapping[key] = value

In [0]:
#Since a dict is essentially a collection of 2-tuples, the dict function accepts a list of 2-tuples:
mapping = dict(zip(range(5), reversed(range(5))))
mapping
ls=[1,2]
ls2=["a","b"]
dict(zip(ls,ls2))

#### Default values

In [0]:
#this is equivalent
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

In [0]:
#to this
value = some_dict.get(key, default_value)

In [0]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter


{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [0]:
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter    

In [0]:

for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)

The built-in collections module has a useful class, defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dict:

In [0]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter
    

#### Valid dict key types

While the values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dict) with the hash function:

To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can:


In [0]:
d = {}
d[tuple([1, 2, 3])] = 5
d

### set

A set is an unordered collection of unique elements.  A set can be created in two ways: via the set function or via a set literal with curly braces:

In [0]:
set([2, 2, 2, 1, 3, 3])
{2, 2, 2, 1, 3, 3}

In [0]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference.

In [0]:
a.union(b)
a | b

In [0]:
a.intersection(b)
a & b
a

All of the logical set operations have <b>in-place</b> counterparts, which enable you to replace the contents of the set on the left side of the operation with the result. For very large sets, this may be more efficient:


In [0]:
c = a.copy()
c

In [0]:
c |= b

c

In [0]:
d = a.copy()
d &= b
d

Like dicts, set elements generally must be immutable. To have list-like elements, you must convert it to a tuple:

In [0]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

In [0]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)
a_set.issuperset({1, 2, 3})

Sets are equal if and only if their contents are equal:

In [0]:
{1, 2, 3} == {3, 2, 1}

### List, Set, and Dict Comprehensions

In [0]:
[expr for val in collection if condition] 
#This is equivalent to the following for loop:
result = []
for val in collection:
    if condition: result.append(expr)

The filter condition can be omitted, leaving only the expression

In [0]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

Set and dict comprehensions are a natural extension, producing sets and dicts in an
idiomatically similar way instead of lists. A dict comprehension looks like this:<br>
<b>dict_comp = {key-expr : value-expr for value in collection if condition}</b>

sA set comprehension looks like the equivalent list comprehension except with curly braces instead of square brackets:<br>
<b>set_comp = {expr for value in collection if condition}</b>

In [0]:
unique_lengths = {len(x) for x in strings}
unique_lengths

In [0]:
set(map(len, strings))

As a simple dict comprehension example, we could create a lookup map of these strings to their locations in the list:

In [0]:
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

#### Nested list comprehensions

In [0]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Meriae', 'Juan', 'Javier', 'Natalia', 'Pilar']]

Now, suppose we wanted to get a single list containing all names with two or more e’s in them

In [0]:
names_of_interest = [] 
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)
names_of_interest

In [0]:
result = [name for names in all_data 
          for name in names
          if name.count('e') >= 2]
result

The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before. Here is another example where we “flatten” a list of tuples of integers into a simple list of integers:


In [0]:
flattened = []
for tup in some_tuples: for x in tup:
            flattened.append(x)

In [0]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples 
             for x in tup]
flattened

fYou can have arbitrarily many levels of nesting, though if you have more than two or three levels of nesting you should probably start to question whether this makes sense from a code readability standpoint. It’s important to distinguish the syntax just shown from a list comprehension inside a list comprehension, which is also perfectly valid:

In [0]:
[[x for x in tup] for tup in some_tuples]

## Functions

Functions are declared with the def keyword and returned from with the return key‐word:

In [0]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments. In the preceding function, x and y are positional arguments while z is a keyword argu‐ment. This means that the function can be called in any of these ways:

In [0]:
my_function(5, 6, z=0.7)
my_function(3.14, 7, 3.5)
my_function(10, 20)

<p>The main restriction on function arguments is that the keyword arguments must fol‐ low the positional arguments (if any). You can specify keyword arguments in any order; this frees you from having to remember which order the function arguments were specified in and only what their names are.</p><p>
It is possible to use keywords for passing positional arguments as well. In the preceding example, we could also have written:</p>

In [0]:
my_function(x=5, y=6, z=7)
my_function(y=6, x=5, z=7)

### Namespaces, Scope, and Local Functions

In [0]:
def func():
    a = []
    for i in range(5):
        a.append(i)

In [0]:
a = []
def func():
    for i in range(5):
        a.append(i)

Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the global keyword:

In [0]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

### Returning Multiple Values

In [0]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

In [0]:
return_value = f()

In [0]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

### Functions Are Objects

In [0]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'W-est virginia?']

In [0]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?-]', '', value)
        value = value.title()
        result.append(value)
    return result

In [0]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [0]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [0]:
clean_strings(states, clean_ops)

<p>A more functional pattern like this enables you to easily modify how the strings are transformed at a very high level. The clean_strings function is also now more reus‐ able and generic.</p>
You can use <i>functions<i > as arguments to other functions like the built-in <b>map</b>:function, which applies a function to a sequence of some kind:


In [0]:
for x in map(remove_punctuation, states):
    print(x)

### Anonymous (Lambda) Functions

lambda functions are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword.

In [0]:
def short_function(x):
    return x * 2
#equivalent 
equiv_anon = lambda x: x * 2

In [0]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

As another example, suppose you wanted to sort a collection of strings by the number of distinct letters in each string:

In [0]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [0]:
strings.sort(key=lambda x: len(set(list(x))))
strings

### Currying: Partial Argument Application

Currying is computer science jargon (named after the mathematician Haskell Curry) that means deriving new functions from existing ones by partial argument applica‐ tion. For example, suppose we had a trivial function that adds two numbers together:


In [0]:
def add_numbers(x, y):
    return x + y

In [0]:
add_five = lambda y: add_numbers(5, y)

The second argument to add_numbers is said to be curried. There’s nothing very fancy here, as all we’ve really done is define a new function that calls an existing function. The built-in functools module can simplify this process using the partial function:


In [0]:
from functools import partial
add_five = partial(add_numbers, 5)

### Generators

In [0]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

When you write for key in some_dict, the Python interpreter first attempts to cre‐ ate an iterator out of some_dict:

In [0]:
dict_iterator = iter(some_dict)
dict_iterator


An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min, max, and sum, and type constructors like list and tuple:

In [0]:
list(dict_iterator)

A generator is a concise way to construct a new iterable object. Whereas normal func‐ tions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. To create a generator, use the yield keyword instead of return in a function:

In [0]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [0]:
gen = squares()
gen

<generator object squares at 0x10bd584f8>

It is not until you request elements from the generator that it begins executing its code:
    

In [0]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

#### Generator expresssions

Another even more concise way to make a generator is by using a generator expres‐ sion. This is a generator analogue to list, dict, and set comprehensions; to create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [0]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x10bd587c8>

In [0]:
#which is equivalent to
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()

Generator expressions can be used instead of list comprehensions as function argu‐ ments in many cases:


In [0]:
sum(x ** 2 for x in range(100))
dict((i, i **2) for i in range(5))

#### itertools module

In [0]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### Errors and Exception Handling

In [0]:
float('1.2345')
float('something')

In [0]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [0]:
attempt_float('1.2345')
attempt_float('something')

In [0]:
float((1, 2))

In [0]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [0]:
attempt_float((1, 2))

In [0]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [0]:
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

In [0]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

#### Exceptions in IPython

In [0]:
In [10]: %run examples/ipython_bug.py
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/home/wesm/code/pydata-book/examples/ipython_bug.py in <module>()
     13     throws_an_exception()
     14
---> 15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in calling_things()
     11 def calling_things():
     12     works_fine()
---> 13     throws_an_exception()
     14
     15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in throws_an_exception()
      7     a = 5
      8     b = 6
----> 9     assert(a + b == 10)
     10
     11 def calling_things():

AssertionError:

## Files and the Operating System

In [0]:
%pushd book-materials

In [0]:
path = 'examples/segismundo.txt'
f = open(path)

In [0]:
for line in f:
    pass

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll often see code to get an EOL-free list of lines in a file like:

In [0]:
lines = [x.rstrip() for x in open(path)]
lines

In [0]:
f.close()

In [0]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

This will automatically close the file f when exiting the <b>with</b> block.

If we had typed f = open(path, 'w'), a new file at examples/segismundo.txt would have been created (be careful!), overwriting any one in its place. There is also the 'x' file mode, which creates a writable file but fails if the file path already exists.

In [0]:
f = open(path)
f.read(10)
f2 = open(path, 'rb')  # Binary mode
f2.read(10)

In [0]:
f.tell()
f2.tell()

In [0]:
import sys
sys.getdefaultencoding()

In [0]:
f.seek(3)
f.read(1)

In [0]:
f.close()
f2.close()

To write text to a file, you can use the file’s write or writelines methods. For exam‐
ple, we could create a version of prof_mod.py with no blank lines like so:

In [0]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
with open('tmp.txt') as f:
    lines = f.readlines()
lines

In [0]:
import os
os.remove('tmp.txt')

### Bytes and Unicode with Files

In [0]:
with open(path) as f:
    chars = f.read(10)
chars

In [0]:
with open(path, 'rb') as f:
    data = f.read(10)
data

In [0]:
data.decode('utf8')
data[:4].decode('utf8')

In [0]:
sink_path = 'sink.txt'
with open(path) as source:
    with open(sink_path, 'xt', encoding='iso-8859-1') as sink:
        sink.write(source.read())
with open(sink_path, encoding='iso-8859-1') as f:
    print(f.read(10))

In [0]:
os.remove(sink_path)

In [0]:
f = open(path)
f.read(5)
f.seek(4)
f.read(1)
f.close()

In [0]:
%popd

## Conclusion