# Built-in Data Structures, Functions 

## Data Structures and Sequences
* Primitive data structures (pure, simple values of a data)
 * Integer
 * Float
 * String
 * Boolean
* Non-Primitive Data Structures (store a collection of values in various formats.)
 * List
 * Tuple
 * Dictionary
 * Set, FrozenSet

### Tuple
- Tuples are enclosed in parentheses ()
- Tuples are faster and consume less memory than Lists.
- Immutable

In [None]:
tup = (1, 2, 3)
tup

In [1]:
tup = 4, 5, 6
tup

(4, 5, 6)

In [None]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

In [None]:
tuple([4, 0, 2])

In [None]:
tup = tuple('string')
tup

In [None]:
tup[0]

In [None]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

In [None]:
tup[1].append(3)

In [None]:
tup

In [None]:
(4, None, 'foo') + (6, 0) + ('bar',)

In [None]:
('foo', 'bar') * 4

#### Unpacking tuples

In [None]:
tup = (4, 5, 6)
a, b, c = tup
b

In [None]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

In [None]:
b

In [None]:
tmp = a
a = b
b = tmp

In [None]:
a, b = 1, 2
a
b
b, a = a, b
a
b

#### Excercise: Complete the following code by adding sum of all three numbers as the last element of the printout 

#### Desired output

### Unpacking (advanced)

In [None]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a, b

In [None]:
rest

In [None]:
a, b, *_ = values # Underscore for ignoring the specific values. (so-called “I don’t care”)

In [None]:
_

#### Tuple methods

index()	Searches the tuple for a specified value and returns the position of where it was found

In [None]:
a = (42, 18, 19, 15)

In [None]:
a.index(19)

count()	Returns the number of times a specified value occurs in a tuple

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

### List
- Lists are enclosed in square brackets []
- Mutable

In [None]:
a_list = [2, 3, 7, None]
a_list

In [None]:
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
b_list

In [None]:
b_list[1] = 'peekaboo'
b_list

In [None]:
gen = range(1000)
list(gen)

#### Adding and removing elements

The list methods make it very easy to use a list as a **stack**, where the last element added is the first element retrieved (“last-in, first-out”). To add an item to the top of the stack, use append(). To retrieve an item from the top of the stack, use pop() without an explicit index.

It is also possible to use a list as a **queue**, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).

(https://docs.python.org/3.3/tutorial/datastructures.html#using-lists-as-stacks)

In [None]:
b_list.append('dwarf') # appends an element to the end of the list
b_list

In [None]:
b_list.insert(1, 'red') # inserts the element at the given index, shifting elements to the right
b_list

In [None]:
print(b_list.pop(2)) # removes the item at the given index from the list and returns the removed item
b_list

In [None]:
b_list.append('foo')
print(b_list)
b_list.remove('foo') # takes a single element as an argument and removes it from the list
print(b_list)

In [None]:
'dwarf' in b_list

In [None]:
'dwarf' not in b_list

#### Concatenating and combining lists

In [None]:
[4, None, 'foo'] + [7, 8, (2, 3)]

In [None]:
x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
x

```python
everything = []
for chunk in list_of_lists:
    everything.extend(chunk)
```

```python
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
```

In [None]:
a_list = ['a', 'b', 'c']
b_list = ['d', 'e', 'f']
c_list = ['g', 'h', 'i']
list_of_lists = [a_list, b_list, c_list]
print(list_of_lists)
everything = []
for chunk in list_of_lists:
    everything = everything + chunk
everything

#### Sorting

In [None]:
a = [7, 2, 5, 1, 3]
a.sort()
a

In [None]:
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
b

#### Binary search and maintaining a sorted list

In [None]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]
print(bisect.bisect(c, 2)) # Index where the first larger value is
print(bisect.bisect(c, 5))
bisect.insort(c, 6) # Binary Insertion that keeps the list sorted
print(c)

#### Slicing

In [None]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]

| values           | 7  | 2  | 3  | 7  | 5  | 6  | 0  | 1  |
|------------------|----|----|----|----|----|----|----|----|
| indexes          | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
| negative indexes | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

In [None]:
seq[1:5] # start refers to the index of the element which is used as a start of our slice. stop refers to the index of the element we should stop just before to finish our slice.

In [None]:
seq

In [None]:
seq[3:4]

In [None]:
seq[:5]

In [None]:
seq[3:]

In [None]:
seq

In [None]:
seq[-4:]

In [None]:
seq[-6:-2]

In [None]:
seq[::2] # The full slice syntax is: start:stop:step.

In [None]:
seq[::-1]

In [None]:
seq[3:4] = [6, 3]
seq

In [None]:
seq[3] = [6, 3]
seq

### Excercise

Datacamp - Intro to Python for Data Science - Python Lists 

### Built-in Sequence Functions

#### enumerate

In [None]:
i = 0
for value in ['blue', 'orange', 'yellow']:
    print(i, value)
    i += 1 # The same as i = i + 1

In [None]:
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

#### sorted

In [None]:
values = [7, 1, 2, 6, 0, 3, 2]

In [None]:
sorted(values)

In [None]:
sorted(values, reverse=True)

In [None]:
sorted('horse race')

#### zip

In [None]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2) # zip() function returns a zip object, which is an iterator of tuples where the first item in each passed iterator is paired together.
list(zipped)

In [None]:
seq3 = [False, True] # shorter than the others
list(zip(seq1, seq2, seq3))

In [None]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

In [None]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
            ('Schilling', 'Curt')]
first_names, last_names = zip(*pitchers) # The * character is known as the unpacking operator. 
print(first_names)
print(last_names)

#### reversed

In [None]:
list(reversed(range(10)))

### Dictionaries
- An unordered set of key: value pairs
- Curly Braces" are used in Python to define a dictionary {}
- Mutable

In [1]:
empty_dict = {}
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [2]:
d1[7] = 'an integer'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [3]:
d1['b']

[1, 2, 3, 4]

In [4]:
'b' in d1 # dict key exists

True

In [5]:
d1[5] = 'some value'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [6]:
d1['dummy'] = 'another value'
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [8]:
del d1[5]
d1

KeyError: 5

In [9]:
ret = d1.pop('dummy')
ret

'another value'

In [11]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [12]:
list(d1.keys())

['a', 'b', 7]

In [13]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

In [14]:
list(d1.items())

[('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

In [15]:
d1.update({'b' : 'foo', 'c' : 12}) # index b exists, index c does not exist
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

#### Creating dicts from sequences

```python
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
```

In [16]:
mapping = {}
for key, value in zip(range(5), reversed(range(5))):
    mapping[key] = value
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

In [17]:
mapping = dict(zip(range(5), reversed(range(5))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#### Default values

```python
if key in some_dict:
    value = some_dict[key]
else:
    value = default_value
```    

In [18]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [19]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [20]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

#### Valid dict key types

In [21]:
hash('string')

-4949110766905003200

In [22]:
hash((1, 2, (2, 3)))

-9209053662355515447

In [23]:
hash((1, 2, [2, 3])) # fails because lists are mutable

TypeError: unhashable type: 'list'

In [24]:
d = {}
d[tuple([1, 2, 3])] = 5
d

{(1, 2, 3): 5}

### Set
- A set is an unordered collection with no duplicate elements.
- Mutable

In [25]:
hash(set([1, 2]))

TypeError: unhashable type: 'set'

In [26]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [27]:
{2, 2, 2, 1, 3, 3}

{1, 2, 3}

In [28]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [29]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [30]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [31]:
a.intersection(b)

{3, 4, 5}

In [32]:
a & b

{3, 4, 5}

In [33]:
c = a.copy()
c |= b # update
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [34]:
d = a.copy()
d &= b #intersection_update d = d & b
d

{3, 4, 5}

In [35]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

{(1, 2, 3, 4)}

In [36]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)

True

In [37]:
a_set.issuperset({1, 2, 3})

True

In [38]:
{1, 2, 3} == {3, 2, 1}

True

### FrozenSet
- An unordered collection with no duplicate elements.
- Immuutable

In [39]:
hash(frozenset([1, 2]))

-1826646154956904602

### List, Set, and Dict Comprehensions

![](img/ch04.gif)

In [40]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [41]:
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [None]:
set(map(len, strings))

In [42]:
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested list comprehensions

In [43]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

In [44]:
result = [name for names in all_data for name in names
          if name.count('e') >= 2]
result

['Steven']

In [45]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

In [46]:
[[x for x in tup] for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

## Functions
- A function is a block of code which only runs when it is called.
- You can pass data, known as parameters, into a function.
- A function can return data as a result.

In [47]:
def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

In [48]:
my_function(5, 6, z=0.7) # First positional parameters

0.06363636363636363

In [49]:
my_function(z=0.7, y=6, x=5) 

0.06363636363636363

In [50]:
my_function(3.14, 7, 3.5)

35.49

In [53]:
my_function(10, 20)

45.0

### Namespaces, Scope, and Local Functions

In [54]:
def func():
    ar = []
    for i in range(5):
        ar.append(i)
    print(ar)

In [55]:
func()

[0, 1, 2, 3, 4]


In [56]:
ar

NameError: name 'ar' is not defined

In [58]:
ar2 = []
ar_string = 'petr'
def func2():
    ar_string = 'pavel'
    for i in range(5):
        ar2.append(i)

In [59]:
func2()

In [60]:
ar2

[0, 1, 2, 3, 4]

In [61]:
ar_string

'petr'

In [62]:
ar3 = None
def bind_a_variable():
    global ar3
    ar3 = []
bind_a_variable()
print(ar3)

[]


### Returning Multiple Values

In [63]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()
print(a, b, c)

5 6 7


In [64]:
return_value = f()

In [65]:
return_value

(5, 6, 7)

In [66]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a' : a, 'b' : b, 'c' : c}

f()

{'a': 5, 'b': 6, 'c': 7}

### Functions Are Objects

In [68]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'West virginia?']

In [69]:
import re # regular expression matching operations

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value) # Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl.
        value = value.title()
        result.append(value)
    return result

clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

In [70]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

In [71]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


In [73]:
clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [74]:
statesCleaned = clean_strings(states, clean_ops)
statesCleaned

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

#### Excercise: make country name in 'states' list uniform using the mapping stored in variable 'mapping'.

In [None]:
states =['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia',
 'Holandsko',
 'Nizozemi']
mapping = {"Netherlands":['Holandsko','Nizozemi']}

### Anonymous (Lambda) Functions

In [75]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [76]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [77]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [78]:
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

In [79]:
apply_to_list(states,remove_punctuation)

['   Alabama ',
 'Georgia',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south   carolina',
 'West virginia']

### Currying: Partial Argument Application (Optional/Advanced)
A function returning another function that might return another function, but every returned function must take only one parameter at a time

In [80]:
def add_numbers(x, y):
    return x + y

In [81]:
add_five = lambda y: add_numbers(5, y)
add_five

<function __main__.<lambda>(y)>

In [82]:
add_five(20)

25

In [83]:
from functools import partial
add_five = partial(add_numbers, 5)
add_five

functools.partial(<function add_numbers at 0x000002649DECF550>, 5)

In [84]:
add_five(20)

25

### Generators
Generator functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.

In [85]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)

a
b
c


In [86]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x2649fb619f0>

In [87]:
list(dict_iterator)

['a', 'b', 'c']

In [88]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [89]:
gen = squares()
gen

<generator object squares at 0x000002649FB81200>

In [90]:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

#### Generator expresssions

In [91]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x000002649FB814A0>

In [92]:
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()
gen

<generator object _make_gen at 0x000002649FB81430>

In [93]:
sum(x ** 2 for x in range(100))

328350

In [94]:
dict((i, i **2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module

In [95]:
import itertools
first_letter = lambda x: x[0]
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
for letter, names_group in itertools.groupby(names, first_letter): #Make an iterator that returns consecutive keys and groups from the iterable 'names'. The key is a function 'first_letter' computing a key value for each element.
    print(letter, list(names_group)) # names_group is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


### Errors and Exception Handling

In [96]:
float('1.2345')
float('something')

ValueError: could not convert string to float: 'something'

In [97]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [98]:
attempt_float('1.2345')

1.2345

In [99]:
attempt_float('something')

'something'

In [100]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [101]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [102]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

In [103]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [104]:
attempt_float((1, 2))

(1, 2)

In [None]:
path = 'file/test.txt'
f = open(path, 'w') #change to r

try:
    f.write("Všechno, Lorum Ipsum     ")
except:
    print('Failed')
else:
    print('Succeeded')
finally: #The finally block, if specified, will be executed regardless if the try block raises an error or not.
    f.close()

## Files and the Operating System

In [None]:
%pushd file 
# go back ../

In [None]:
path = 'test.txt'
f = open(path)

In [None]:
lines = [x.rstrip() for x in open(path)]
lines

In [None]:
str.rstrip?

In [None]:
f.close()

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]
lines

In [None]:
f = open(path)
f.read(10)

In [None]:
f2 = open(path, 'rb')  # Binary mode
f2.read(10)

In [None]:
print(f.tell()) # The method tell() returns the current position of the file read/write pointer within the file.
print(f2.tell())

In [None]:
import sys
sys.getdefaultencoding()

In [None]:
print(f.seek(3)) # Python file method seek() sets the file's current position at the offset.
print(f.read(1))

In [None]:
f.close()
f2.close()

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
with open('tmp.txt') as f:
    lines = f.readlines()
lines

In [None]:
import os
os.remove('tmp.txt')

### Bytes and Unicode with Files

In [None]:
with open(path) as f:
    chars = f.read(10)
chars

In [None]:
with open(path, 'rb') as f:
    data = f.read(10)
data

In [None]:
data.decode('utf8')

In [None]:
data[:4].decode('utf8')

In [None]:
%popd # Change to directory popped off the top of the stack.

### Solution

In [None]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a={0}, b={1}, c={2}, sum={3}'.format(a, b, c,a+b+c))

### Solution

In [None]:
def map_strings(strings, mapping):
    result = []
    for value in strings:
        replacement = value
        for canonic_name, list_of_alternative_vals in mapping.items():
            if value in list_of_alternative_vals:
                replacement = canonic_name
        result.append(replacement)
    return result
map_strings(states,mapping)

Code examples from "Python for Data Analysis", 2nd Edition

The MIT License (MIT)

Copyright (c) 2017 Wes McKinney

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.