### 

### Python data structures and features used frequently
- Tuple
- List
- Dictionary
- Set

- Built-in Sequence functions
- Files

### Tuples
- A tuple is a fixed-length, immutable sequence of Python objects which, once assigned, cannot be changed

In [1]:
tup = (4, 5, 6)
tup1 = 4, 5, 6
print(tup, tup1)

(4, 5, 6) (4, 5, 6)


In [2]:
# convert any sequence/iterator to a tuple
tuple([2, 0, 1])

(2, 0, 1)

In [3]:
# while a tuple is immutable, it may contain mutable objects
tup2 = (1, [2, 3], 4)
tup2[1] = [3, 4] # Gives error

TypeError: 'tuple' object does not support item assignment

In [4]:
# whereas...
tup2[1].append(5)
tup2

(1, [2, 3, 5], 4)

In [10]:
tup3 = ('aa', 'bb') * 4
tup3

('aa', 'bb', 'aa', 'bb', 'aa', 'bb', 'aa', 'bb')

In [8]:
# Unpacking tuples
a, b, c = tup1
print(a, b, c)

4 5 6


In [9]:
# count the number of occurrences of a value
tup1.count(1)

0

In [11]:
# Unpacking to discard irrelevant values
x, y, *_ = tup3
print(x, y)

aa bb


## List
- In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are mutable.

In [18]:
l1 = [1, 2, None, 'b']
l1

[1, 2, None, 'b']

In [14]:
# The list built-in function is frequently used in data processing 
# as a way to materialize an iterator or generator expression:
gen = range(10)
print(gen)
print(list(gen))

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [19]:
# Adding and removing elements
l1.append('a')
l1.insert(2, 'x')
l1.pop(1)
l1.remove('b')
l1

[1, 'x', None, 'a']

In [20]:
# extend faster than concatenating
l1.extend([10, 15])
l1

[1, 'x', None, 'a', 10, 15]

In [22]:
# Sorting
l2 = [5, 10, 2, 1, 34]
l2.sort()
l2

[1, 2, 5, 10, 34]

In [24]:
# Slicing
l2[2:3]

[5]

In [25]:
l2[-1:]

[34]

In [27]:
l2[-1::-1]

[34, 10, 5, 2, 1]

## Dictionary
- A dictionary stores a collection of key-value pairs, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key. 

In [29]:
empty_dict = {}
empty_dict['a'] = 'aaa'
empty_dict

{'a': 'aaa'}

In [42]:
d1 = {'a': 'first', 'b': 'second', 3: 'third'}
d1

{'a': 'first', 'b': 'second', 3: 'third'}

In [43]:
d1['b']

'second'

In [44]:
# Check if a key exists in a dict
'b' in d1, '3' in d1

(True, False)

In [45]:
# Deleting values: del or pop
del d1['b']
ret = d1.pop('a')
d1, ret

({3: 'third'}, 'first')

In [48]:
# Iterating on keys and values
# dict.keys() returns an iterator on keys
# dict.values() returns an iterator on values
d1 = {'b': 'second', 'a': 'first', 3: 'third', 'd': 'fourth'}
list(d1.keys()), list(d1.values())

(['b', 'a', 3, 'd'], ['second', 'first', 'third', 'fourth'])

In [49]:
# dict.items() method to iterate over (key, value) tuples
list(d1.items())

[('b', 'second'), ('a', 'first'), (3, 'third'), ('d', 'fourth')]

In [50]:
# Update method changes values in place
d1.update({'b': 'fifth', 'c': 'sixth'})
d1

{'b': 'fifth', 'a': 'first', 3: 'third', 'd': 'fourth', 'c': 'sixth'}

In [51]:
# Creating dictionaries from sequences
mapping = {}
key_list = [1, 2, 3]
value_list = [1, 4, 9]
for key, value in zip(key_list, value_list):
    mapping[key] = value
mapping

{1: 1, 2: 4, 3: 9}

In [53]:
# Creating dictionaries from list of tuples
d3 = dict([(1,1), (2, 4), (3, 9)])
d3

{1: 1, 2: 4, 3: 9}

In [54]:
# Default values: dict.get(key, default_value=None)
val = d3.get(4, -1)
val

-1

In [56]:
# With setting values, it may be that the values in a dictionary are another kind of collection, like a list.
words = ['apple', 'banana', 'guava', 'berry', 'ark', 'book']
# Categorizing a list of words by their first letters as a dictionary of lists:
by_letter = {}
for word in words:
    letter = word[0]
    if letter in by_letter:
        by_letter[letter].append(word)
    else:
        by_letter[letter] = [word]

by_letter

{'a': ['apple', 'ark'], 'b': ['banana', 'berry', 'book'], 'g': ['guava']}

In [57]:
# Doing the same thing as above using setdefault
by_letter = {}
for word in words:
    letter = word[0]
    # Sets an empty list as default if key not found
    by_letter.setdefault(letter, []).append(word)

by_letter

{'a': ['apple', 'ark'], 'b': ['banana', 'berry', 'book'], 'g': ['guava']}

In [58]:
# The built-in collections module has a useful class, defaultdict, which makes this even easier. 
# To create one, you pass a type or function for generating the default value for each slot in the dictionary:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)
by_letter

defaultdict(list,
            {'a': ['apple', 'ark'],
             'b': ['banana', 'berry', 'book'],
             'g': ['guava']})

In [62]:
# While the values of a dictionary can be any Python object, 
# the keys generally have to be immutable objects like 
# scalar types (int, float, string) or 
# tuples (all the objects in the tuple need to be immutable, too). 
# The technical term here is hashability. You can check whether an object is hashable 
# (can be used as a key in a dictionary) with the hash function:
hash("string"), hash((1, 2))

(-5583122657705587872, -3550055125485641917)

In [63]:
hash([1, 2])

TypeError: unhashable type: 'list'

## Set
- A set is an unordered collection of unique elements.
- A set can be created in two ways: via the set function or via a set literal with curly braces:

In [64]:
set([1, 2, 3, 3, 3, 4, 5])

{1, 2, 3, 4, 5}

In [65]:
{1, 2, 3, 3, 4, 5}

{1, 2, 3, 4, 5}

In [66]:
# Sets support mathematical set operations like union, intersection, difference, and symmetric difference.
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

In [67]:
# Union or |
a.union(b), a | b

({1, 2, 3, 4, 5, 6, 7, 8}, {1, 2, 3, 4, 5, 6, 7, 8})

In [68]:
# Intersection or &
a.intersection(b), a & b

({3, 4, 5}, {3, 4, 5})

## Built-in Sequence Functions
- Very useful

### enumerate

In [69]:
# Python has a built-in function, enumerate, which returns a sequence of (i, value) tuples:
# To get indices while iterating over a sequence.
li1 = ['a', 'b', 'c', 'd']
for index, val in enumerate(li1):
    print(index, val)

0 a
1 b
2 c
3 d


### sorted

In [70]:
# The sorted function returns a new sorted list from the elements of any sequence:
sorted([2, 1, 10, 0, 5, 3])

[0, 1, 2, 3, 5, 10]

### zip

In [72]:
# zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

### reversed

In [73]:
# reversed iterates over the elements of a sequence in reverse order:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

## List, Set and Dictionary Comprehensions

In [76]:
# List comprehension
strings = ["a", "as", "bat", "car", "cat", "dove"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'CAT', 'DOVE']

In [78]:
# Set comprehension
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4}

In [79]:
set(map(len, strings))

{1, 2, 3, 4}

In [81]:
# Dictionary comprehension
loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'cat': 4, 'dove': 5}

In [85]:
# Nested list comprehension
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
             ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]
# Single list containing all names with two or more a’s in them
result = [name for names in all_data for name in names if name.count('a') >= 2]
result

['Maria', 'Natalia']

## Functions

- Python functions are objects. This means that they can be passed around as arguments, stored in lists, assigned variable names.
- Anonymous (lambda) functions: which are a way of writing functions consisting of a single statement, the result of which is the return value.
- **Generator**: Many objects in Python support iteration, such as over objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable. 

In [86]:
# Lambda functions: consist of a single statement
square = lambda x: x ** 2
square(2)

4

In [89]:
# Generators
# A generator is a convenient way, similar to writing a normal function, to construct a new iterable object. 
# Whereas normal functions execute and return a single result at a time, 
# generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. 
# To create a generator, use the yield keyword instead of return in a function:
def squares(n=10):
    print(f"Generating square from 1 to {n**2}")
    for i in range(1, n+1):
        yield n ** 2

gen = squares(5)
gen

<generator object squares at 0x753a5d959690>

In [90]:
list(gen)

Generating square from 1 to 25


[25, 25, 25, 25, 25]

In [91]:
# Another way to make a generator is by using a generator expression. 
# This is a generator analogue to list, dictionary, and set comprehensions. 
# To create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:
gen = (x ** 2 for x in range(10))
gen

<generator object <genexpr> at 0x753a5d9597e0>

In [92]:
for x in gen:
    print(x, end=" ")

0 1 4 9 16 25 36 49 64 81 

### itertools module
- The standard library itertools module has a collection of generators for many common data algorithms.

In [94]:
import itertools

def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


## Erros and exception handling
- using try and except blocks

In [95]:
float("1.233")

1.233

In [96]:
float("something")

ValueError: could not convert string to float: 'something'

In [97]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x
attempt_float("something")

'something'

## File Handling

In [100]:
# Opening files
from pathlib import Path
path = Path('numpy_basics.ipynb')

In [107]:
# By default opens it in read mode
f = open(path, "r", encoding="utf-8")

In [111]:
for line in f:
    print(line)

ValueError: I/O operation on closed file.

In [109]:
lines = [x.rstrip() for x in open(path, encoding="utf-8")]
lines[:5]

['{',
 ' "cells": [',
 '  {',
 '   "cell_type": "code",',
 '   "execution_count": null,']

In [110]:
f.close()

In [112]:
# This will automatically close the file f when exiting the with block.
with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]

lines[:5]

['{',
 ' "cells": [',
 '  {',
 '   "cell_type": "code",',
 '   "execution_count": null,']

In [114]:
# Get default encoding
import sys
sys.getdefaultencoding()

'utf-8'

In [115]:
# Writing to files
with open('tmp.txt', mode="w") as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

In [116]:
with open('tmp.txt') as f:
    lines = f.readlines()

lines[:5]

['{\n',
 ' "cells": [\n',
 '  {\n',
 '   "cell_type": "code",\n',
 '   "execution_count": null,\n']