# Chapter 3.  Built-in Data Structures, Functions, Files

## 3.1  Data Structures and Sequences

### Tuple

A tuple is a fixed-length, immutable sequence of Python objects. The easiest way to create one is with a comma-separated sequence of values.

In [None]:
tup = 4, 5, 6
tup

When you’re defining tuples in more complicated expressions, it’s often necessary to enclose the values in parentheses.

In [None]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

You can convert any sequence or iterator to a tuple by invoking tuple.

In [None]:
tup=tuple([4, 0, 2])
print(tup)

In [None]:
tup = tuple('spring')
print(tup)

In [None]:
tup[0]

In [None]:
tup = tuple(['foo', [1, 2], True])
print(tup)

In [None]:
tup[2] = False

If an object inside a tuple is mutable, such as a list, you can modify it in-place.

In [None]:
tup[1].append(3)
tup

You can concatenate tuples using the + operator to produce longer tuples.

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

In [None]:
('foo', 'bar') * 4

#### Unpacking tuples

In [None]:
tup = (4, 5, 6)
a, b, c = tup
b

In [None]:
tup = (4, 5, 6)
(a, b, c) = tup
b

In [None]:
tup = (4, 5, 6)
a, b = tup

In [None]:
tup = (4, 5, 6)
a, *b = tup
b

In [None]:
tup = (4,5)
a, b, c = tup

In [None]:
tup = (4,5)
a, b, *c = tup
c

In [None]:
tup = 4, 5, (6, 7)
a, b, c = tup
c

In [None]:
a, b, (c, d) = tup
d

Using this functionality you can easily swap variable names, a task which in many languages might look like.

In [None]:
a, b = 1, 2
print(a)
print(b)

tmp = a
a = b
b = tmp

print(a)
print(b)

In [None]:
a, b = 1, 2
print(a)
print(b)

In [None]:
b, a = a, b
print(a)
print(b)

A common use of variable unpacking is iterating over sequences of tuples or lists:


In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(a,b,c)
    print('a={0}, b={1}, c={2}'.format(a, b, c))

The Python language recently acquired some more advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple.   
This uses the special syntax *rest, which is also used in function signatures to capture an arbitrarily long list of positional arguments

In [None]:
values = 1, 2, 3, 4, 5
a, b, *rest = values

In [None]:
a, b

In [None]:
rest

This rest bit is sometimes something you want to discard; there is nothing special about the rest name.   
As a matter of convention, many Python programmers will use the underscore (_) for unwanted variables.

In [None]:
a, b, *_ = values

In [None]:
for _ in range(5):
    print("spring")

#### Tuple methods

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

### List

In contrast with tuples, lists are variable-length and their contents can be modified in-place.   
You can define them using square brackets [] or using the list type function.

In [None]:
a_list = [2, 3, 7, None]

In [None]:
tup = ('foo', 'bar', 'baz')
b_list = list(tup)

In [None]:
b_list

In [None]:
b_list[1] = 'peekaboo'

In [None]:
b_list

Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions. 

In [None]:
gen = range(10)
gen

In [None]:
list(gen)

#### Adding and removing elements

In [None]:
b_list

In [None]:
b_list.append('dwarf')

In [None]:
b_list

In [None]:
b_list.insert(1, 'red')
b_list

<img style="float: left;" src="pic/pic_0_1.png">

<span style="color:red">insert is computationally expensive compared with append, because references to subsequent elements have to be shifted internally to make room for the new element. 

The inverse operation to insert is pop, which removes and returns an element at a particular index.

In [None]:
b_list.pop(2)
b_list

Elements can be removed by value with **remove**, which locates the first such value and removes it from the list.

In [None]:
b_list.append('foo')
b_list

In [None]:
b_list.remove('foo')
b_list

In [None]:
'dwarf' in b_list

In [None]:
'dwarf' not in b_list

Checking whether a list contains a value is a lot slower than doing so with dicts and sets (to be introduced shortly), as Python makes a linear scan across the values of the list, whereas it can check the others (based on hash tables) in constant time.


#### Concatenating and combining lists

In [None]:
[4, None, 'foo'] + [7, 8, (2, 3)]

In [None]:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over.   
Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable.

In [None]:
L1=list(range(10))
L2=[]
for _ in range(10):
    L2.append(L1)
L2

In [None]:
L1=list(range(1000))
L2=[]
for _ in range(1000):
    L2.append(L1)

In [None]:
import time
start_time = time.time()

everything = []
for chunk in L2:
    everything.extend(chunk)
    
elapsed_time = time.time() - start_time
print(elapsed_time)

In [None]:
start_time = time.time()

everything = []
for chunk in L2:
    everything = everything + chunk

elapsed_time = time.time() - start_time
print(elapsed_time)

#### Sorting

In [None]:
a = [7, 2, 5, 1, 3]
a.sort()
a

In [None]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

#### Binary search and maintaining a sorted list

The built-in **bisect** module implements binary search and insertion into a sorted list.   
**bisect.bisect** finds the location where an element should be inserted to keep it sorted, while **bisect.insort** actually inserts the element into that location.

In [None]:
import bisect

In [None]:
c = [1, 2, 2, 2, 3, 4, 7]

In [None]:
bisect.bisect(c, 2)

In [None]:
bisect.bisect(c, 5)

In [None]:
bisect.insort(c, 6)

In [None]:
c

<img style="float: left;" src="pic/pic_0_1.png">

<span style="color:red">The bisect module functions do not check whether the list is sorted, as doing so would be computationally expensive.   
Thus, using them with an unsorted list will succeed without error but may lead to incorrect results.

In [None]:
c=[1,3,5,2,4,8]
bisect.insort(c, 6)

In [None]:
c

#### Slicing

You can select sections of most sequence types by using slice notation, which in its basic form consists of **start:stop** passed to the indexing operator [ ].

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

Slices can also be assigned to with a sequence.

In [None]:
seq[3:4] = [6, 3]
seq

While the element at the **start** index is included, the stop index is not included, so that the number of elements in the result is**stop - start**. 

Either the **start** or **stop** can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively.

In [None]:
seq[:5]

In [None]:
seq[3:]

Negative indices slice the sequence relative to the end.

In [None]:
seq[-4:]

In [None]:
seq[-6:-2]

A step can also be used after a second colon to, say, take every other element.

In [None]:
seq[::2]

In [None]:
seq[::-1]

<img style="float: left;" src="pic/pic09.png" width=600>

### Built-in Sequence Functions

#### enumerate

It’s common when iterating over a sequence to want to keep track of the index of the current item.  
Since this is so common, Python has a built-in function, **enumerate**, which returns a sequence of (i, value) tuples:

```python
for i, value in enumerate(collection):
   # do something with value
```

When you are indexing data, a helpful pattern that uses **enumerate** is computing a dict mapping the values of a sequence (which are assumed to be unique) to their locations in the sequence.

In [None]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

#### sorted

The **sorted** function returns a new sorted list from the elements of any sequence.

In [None]:
sorted([7, 1, 2, 6, 0, 3, 2])

In [None]:
sorted('horse race')

#### zip

**zip** “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples.

In [None]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)

In [None]:
print(zipped)

In [None]:
list(zipped)

**zip** can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence.

In [None]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

A very common use of **zip** is simultaneously iterating over multiple sequences, possibly also combined with **enumerate**.

In [None]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

In [None]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
            ('Schilling', 'Curt')]
first_names, last_names = zip(*pitchers)

print(list(zip(*pitchers)))
print(first_names)

The * operator can be used in conjuncton with zip( ) to unzip the list.

<p style="font-family: Courier New; font-size: 1.15em;">zip(*zippedList)

In [None]:
coordinate = ['x', 'y', 'z']
value = [3, 4, 5]

In [None]:
result = zip(coordinate, value)
resultList = list(result)
print(resultList)

In [None]:
print(result)

In [None]:
c, v =  zip(*resultList)
print('c =', c)
print('v =', v)

#### reversed

**reversed** iterates over the elements of a sequence in reverse order.  
Keep in mind that reversed is a generator (to be discussed in some more detail later), so it does not create the reversed sequence until materialized (e.g., with list or a for loop). 

In [None]:
list(reversed(range(10)))

In [None]:
reversed(range(10))

### dict

**dict** is likely the most important built-in Python data structure.   
It is a flexibly sized collection of key-value pairs, where key and value are Python objects.   
One approach for creating one is to use curly braces {} and colons to separate keys and values

In [None]:
empty_dict = {}

In [None]:
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1

You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple.

In [None]:
d1[7] = 'an integer'
d1

In [None]:
d1['b']

You can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value.

In [None]:
'b' in d1

You can delete values either using the **del** keyword or the**pop** method (which simultaneously returns the value and deletes the key).

In [None]:
d1[5] = 'some value'
d1

In [None]:
d1['dummy'] = 'another value'
d1

In [None]:
del d1[5]

In [None]:
d1

In [None]:
ret = d1.pop('dummy')

In [None]:
ret

In [None]:
d1

The **keys** and **values** method give you iterators of the dict’s keys and values, respectively.   
While the key-value pairs are not in any particular order, these functions output the keys and values in the same order.

In [None]:
list(d1.keys())

In [None]:
d1.keys()

In [None]:
list(d1.values())

In [None]:
d1.values()

You can merge one dict into another using the **update** method.

In [None]:
d1.update({'b' : 'foo', 'c' : 12})
d1

#### Creating dicts from sequences

It’s common to occasionally end up with two sequences that you want to pair up element-wise in a dict.   
As a first cut, you might write code like this:

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

Since a dict is essentially a collection of 2-tuples, the dict function accepts a list of 2-tuples.

In [None]:
mapping = dict(zip(range(5), reversed(range(5))))
mapping

#### Default values

It is very common to have logic like:

```python
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```

Thus, the dict methods get and pop can take a default value to be returned, so that the above if-else block can be written simply as:

```python
value = some_dict.get(key, default_value)
```

**get** by default will return **None** if the key is not present, while **pop** will raise an exception. With setting values, a common case is for the values in a dict to be other collections, like lists.   
For example, you could imagine categorizing a list of words by their first letters as a dict of lists:

In [None]:
words = ['apple', 'bat', 'bar', 'atom', 'book']

In [None]:
by_letter = {}

In [None]:
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter

The **setdefault** dict method is for precisely this purpose.   
The **setdefault( )** method returns the value of the item with the specified key.  
If the key does not exist, insert the key, with the specified value.  
The preceding for loop can be rewritten as

In [None]:
by_letter = {}

In [None]:
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
    print(by_letter)
by_letter

### set

A set is an unordered collection of unique elements.   
You can think of them like dicts, but keys only, no values.   
A set can be created in two ways: via the set function or via a set literal with curly braces.

In [None]:
set([2, 2, 2, 1, 3, 3])

In [None]:
{2, 2, 2, 1, 3, 3}

In [None]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [None]:
a.union(b)

In [None]:
a | b

In [None]:
a.intersection(b)

In [None]:
a & b

<img style="float: left;" src="pic/pic10.png" width=650>

In [None]:
c = a.copy()

In [None]:
c |= b

In [None]:
c

In [None]:
d = a.copy()

In [None]:
d &= b

In [None]:
d

You can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set.

In [None]:
a_set = {1, 2, 3, 4, 5}

In [None]:
{1, 2, 3}.issubset(a_set)

In [None]:
a_set.issuperset({1, 2, 3})

In [None]:
{1, 2, 3} == {3, 2, 1}

### List, Set, and Dict Comprehensions

List comprehensions are one of the most-loved Python language features. 
They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in one concise expression.   

They take the basic form:

<p style="font-family: Courier New; font-size: 1.15em;">[expr for val in collection if condition] 

This is equivalent to the following for loop:

<pre>
<p style="font-family: Courier New; font-size: 1.15em;">
result = []  
for val in collection:  
    if condition:  
    result.append(expr)

The filter condition can be omitted, leaving only the expression.   
For example, given a list of strings, we could filter out strings with length 2 or less and also convert them to uppercase like this:

In [None]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [None]:
[x.upper() for x in strings if len(x) > 2]

Set and dict comprehensions are a natural extension, producing sets and dicts in an idiomatically similar way instead of lists.  
A dict comprehension looks like this:

<pre>
<p style="font-family: Courier New; font-size: 1.15em;">
dict_comp = {key-expr : value-expr for value in collection if condition}

A set comprehension looks like the equivalent list comprehension except with curly braces instead of square brackets:

<pre>
<p style="font-family: Courier New; font-size: 1.15em;">
set_comp = {expr for value in collection if condition}

Like list comprehensions, set and dict comprehensions are mostly conveniences, but they similarly can make code both easier to write and read.   
Consider the list of strings from before.   
Suppose we wanted a set containing just the lengths of the strings contained in the collection; we could easily compute this using a set comprehension:

In [None]:
unique_lengths = {len(x) for x in strings}
unique_lengths

**map** applies a function to all the items in an input_list.

<pre><p style="font-family: Courier New; font-size: 1.15em;">
map(function_to_apply, list_of_inputs)

In [None]:
set(map(len, strings))

In [None]:
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

#### Nested list comprehensions

In [None]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

In [None]:
[[x for x in tup] for tup in some_tuples]

## 3.2  Functions

Functions are declared with the def keyword and returned from with the return keyword:

```python
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)
```

If Python reaches the end
of a function without encountering a return statement, **None** is returned automatically

Each function can have positional arguments and keyword arguments.   
Keyword arguments are most commonly used to specify default values or optional arguments.   
In the preceding function, x and y are positional arguments while z is a keyword argument.  
This means that the function can be called in any of these ways:

```python
my_function(5, 6, z=0.7)
my_function(3.14, 7, 3.5)
my_function(10, 20)
```

### Returning Multiple Values

Python has the ability to return multiple values from a function with
simple syntax. Here’s an example:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

x, y, z = f()

print(x,y,z)
type(x)

In data analysis and other scientific applications, you may find yourself doing this
often.   
What’s happening here is that the function is actually just returning one object,
namely a tuple, which is then being unpacked into the result variables.

In [None]:
k = f()
type(k)

A
potentially attractive alternative to returning multiple values like before might be to
return a dict instead:

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}


k=f()

print(k)
type(k)

### Anonymous (Lambda) Functions

A lambda function is a small anonymous function.  
A lambda function can take any number of arguments, but can only have one expression.

<p style="font-family: Courier New; font-size: 1.15em;">lambda arguments : expression

The expression is executed and the result is returned:

In [None]:
def hannam(x):
    return x * 2

In [None]:
y=hannam(3)
print(y)

In [None]:
hnu = lambda x: x * 2

In [None]:
z=hnu(3)
print(z)

In [None]:
x = lambda a, b : a * b
print(x(5, 6))

In [None]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

In [None]:
ints = [4, 0, 1, 5, 6]

In [None]:
apply_to_list(ints, lambda x: x * 2)

In [None]:
apply_to_list(ints, hannam)

In [None]:
apply_to_list(ints, hnu)

In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [None]:
strings.sort()
strings

In [None]:
strings.sort?

suppose you wanted to sort a collection of strings by the number
of distinct letters in each string

In [None]:
strings.sort(key=lambda x: len(set(list(x))))
strings

In [None]:
strings.sort(key=len)
strings

### Generators

이하 생략

In [None]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

In [None]:
dict_iterator = iter(some_dict)
dict_iterator

In [None]:
list(dict_iterator)

In [None]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [None]:
gen = squares()
gen

In [None]:
for x in gen:
    print(x, end=' ')

#### Generator expresssions

In [None]:
gen = (x ** 2 for x in range(100))
gen

def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()

In [None]:
sum(x ** 2 for x in range(100))
dict((i, i **2) for i in range(5))

#### itertools module

In [None]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

### Errors and Exception Handling

In [None]:
float('1.2345')
float('something')

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [None]:
attempt_float('1.2345')
attempt_float('something')

In [None]:
float((1, 2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [None]:
attempt_float((1, 2))

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

#### Exceptions in IPython

In [10]: %run examples/ipython_bug.py
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/home/wesm/code/pydata-book/examples/ipython_bug.py in <module>()
     13     throws_an_exception()
     14
---> 15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in calling_things()
     11 def calling_things():
     12     works_fine()
---> 13     throws_an_exception()
     14
     15 calling_things()

/home/wesm/code/pydata-book/examples/ipython_bug.py in throws_an_exception()
      7     a = 5
      8     b = 6
----> 9     assert(a + b == 10)
     10
     11 def calling_things():

AssertionError:

## Files and the Operating System

In [None]:
%pushd book-materials

In [None]:
path = 'examples/segismundo.txt'
f = open(path)

for line in f:
    pass

In [None]:
lines = [x.rstrip() for x in open(path)]
lines

In [None]:
f.close()

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

In [None]:
f = open(path)
f.read(10)
f2 = open(path, 'rb')  # Binary mode
f2.read(10)

In [None]:
f.tell()
f2.tell()

In [None]:
import sys
sys.getdefaultencoding()

In [None]:
f.seek(3)
f.read(1)

In [None]:
f.close()
f2.close()

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
with open('tmp.txt') as f:
    lines = f.readlines()
lines

In [None]:
import os
os.remove('tmp.txt')

### Bytes and Unicode with Files

In [None]:
with open(path) as f:
    chars = f.read(10)
chars

In [None]:
with open(path, 'rb') as f:
    data = f.read(10)
data

In [None]:
data.decode('utf8')
data[:4].decode('utf8')

In [None]:
sink_path = 'sink.txt'
with open(path) as source:
    with open(sink_path, 'xt', encoding='iso-8859-1') as sink:
        sink.write(source.read())
with open(sink_path, encoding='iso-8859-1') as f:
    print(f.read(10))

In [None]:
os.remove(sink_path)

In [None]:
f = open(path)
f.read(5)
f.seek(4)
f.read(1)
f.close()

In [None]:
%popd

## Conclusion