# Data Science From Scratch Notes
## Chapter 2: A Crash Course in Python

## Functions

Define functions using `def:`

In [1]:
def double(x):
    """
    Place optional docstring that explains what the functioin does. 
    For example, this function multiplies its input by 2
    """
    return x * 2


In [2]:
double(2)

4

Python functions are *first class*: Functions can be assigned to variables and passed into functions

In [3]:
def apply_to_one(f):
    """Calls the function f with 1 as its argument"""
    return f(1)

my_double = double # refers to the previously defined function
x = apply_to_one(my_double)

In [4]:
x

2

*Lambda Functions*: Short anonymous functions.

The code below asks the lambda function to use 1 as its argument for x

In [5]:
y = apply_to_one(lambda x: x + 4) # equals 5
y

5

Function parameters can be given default arguments.

In [6]:
def my_print(message = "my default message"):
    print(message)
    
my_print("hello") # prints 'hello'
my_print() # prints 'my default message'

hello
my default message


You can specify arguments by name as well.

In [7]:
def full_name(first = "What's-his-name", last = "Something"):
    return first + " " + last

full_name("Joel", "Grus") # "Joel Grus"
full_name("Joel") # "Joel Something"
full_name(last = "Grus") # "What's-his-name Grus"

"What's-his-name Grus"

## Strings

* Strings can be delimited by single or double quotation marks

* Backslashes are used to encode special characters

In [8]:
tab_string = "\t"
tab_string


'\t'

* to actually write backslashes, use raw strings by `r""`

In [9]:
not_tab_string = r"\t"
not_tab_string

'\\t'

* Create multiline strings using three double quotes

In [10]:
multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""

multi_line_string

'This is the first line.\nand this is the second line\nand this is the third line'

* *f-string*: substitute values into strings

In [11]:
first_name = "Joel"
last_name = "Grus"

full_name1 = first_name + " " + last_name # string addition
full_name2 = "{0} {1}".format(first_name, last_name) # string.format
full_name3 = f"{first_name} {last_name}" # f-string method

print(full_name1)
print(full_name2)
print(full_name3)

Joel Grus
Joel Grus
Joel Grus


## Exceptions

*Exceptions* are when something goes wrong. Exceptions will cause the program to crash. Handle them using `try` and `except`:

In [12]:
try:
    print(0/0)
except ZeroDivisionError:
    print("cannot divide by zero")

cannot divide by zero


## Lists

The fundamental data structure in Python is the *list*.

*List*: an ordered collection of just about any data structure. Similar to R's lists where they can be nested as well.

Lists are 0-indexed.

In [13]:
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []] # [] denotes an empty list

list_length = len(integer_list) # equals 3
list_sum = sum(integer_list) # equals 6

Use square brackets to get or set the *n*th element of a list:

In [14]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

zero = x[0] # equals 0
print(zero)
one = x[1] # equals 1
print(one)
nine = x[-1] # equals 9, 'Pythonic' for last element
print(nine)
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
print(eight)
x[0] = -1 # now x is [-1, 1, 2, 3, ..., 9]
print(x)

0
1
9
8
[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]


* Square brackets also allow us to *slice* lists
* slice *i:j* means all elements from *i* (inclusive) to *j* (not inclusive)

In [15]:
first_three = x[:3] # [-1, 1, 2]
print(first_three)

three_to_end = x[3:] # [3, 4, ..., 9]
print(three_to_end)

one_to_four = x[1:5] # [1, 2, 3, 4]
print(one_to_four)

last_three = x[-3:] # [7, 8, 9]
print(last_three)

without_first_and_last = x[1:-1] # [1, 2, 3, ..., 8]
print(without_first_and_last)

copy_of_x = x[:] # [-1, 1, 2, ...,9]
print(copy_of_x)

[-1, 1, 2]
[3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4]
[7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]
[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]


* You can also slice strings and other "sequential" types

* A slice can take a third argument to indicate its *stride*, which can also be negative
    * Implicitly, the stride argument defaults to 1

In [16]:
every_third = x[::3]
print(every_third)

five_to_three = x[5:2:-1]
print(five_to_three)

[-1, 3, 6, 9]
[5, 4, 3]


The *in* operator checks for list membership:

In [17]:
1 in [1,2,3] # True

True

In [18]:
0 in [1,2,3] # False

False

* There are three ways to concatenate lists together
    1. Use `extend` to modify a list in place
    2. Use list addition if you don't want to modify the list
    3. Use `append` to to append to lists one item at a time

In [19]:
x = [1,2,3]
x.extend([4,5,6])
x

[1, 2, 3, 4, 5, 6]

In [20]:
x = [1,2,3]
y = x + [4,5,6]
y

[1, 2, 3, 4, 5, 6]

In [21]:
x = [1,2,3]
x.append(0)
y = x[-1]
z = len(x)
print(x)
print(y)
print(z)

[1, 2, 3, 0]
0
4


If you know how many elements a list contains, you can *unpack* the list:

In [22]:
x, y = [1, 2] # now x is 1, y is 2
print(x)
print(y)

1
2


Use an underscore for a value if you're going to throw away:

In [23]:
_, y = [1, 2] # now y == 2, didn't care about the first element
y

2

## Tuples

*Tuples*: lists' immutable cousin. Everything you can do to a list except modifying it can be done to a tuple. Tuples are specified using parentheses (or nothing) instead of square brackets:

In [24]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4

my_list[1] = 3 # my_list is now[1, 3]

try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")

cannot modify a tuple


Tuples are a convenient way to return multiple values from functions:

In [25]:
def sum_and_product(x, y):
    return (x + y), (x * y)

sp = sum_and_product(2, 3) # sp is (5, 6)
print(sp)

s, p = sum_and_product(5, 10) # s is 15 and p is 50
print(s)
print(p)

(5, 6)
15
50


## Dictionaries

Dictionaries associate *values* with *keys* and allow us to quickly retrieve the value corresponding to a given key:

In [26]:
empty_dict = {} # Pythonic
empty_dict2 = dict() # less Pythonic
grades = {"Joel": 80, "Tim": 95} # dictionary literal
grades

{'Joel': 80, 'Tim': 95}

Values for a key in a dictionary are looked up using square brackets:

In [27]:
joels_grade = grades["Joel"] # equals 80
joels_grade

80

We will get a `KeyError` if we ask for a key that's not in the dictionary:

In [28]:
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("no grade for Kate!")

no grade for Kate!


Check for the existence of a key using `in`:

In [29]:
joel_has_grade = "Joel" in grades # True
joel_has_grade

True

In [30]:
kate_has_grade = "Kate" in grades # False
kate_has_grade

False

Dictionaries have a `get` method that returns a default value (instead of raising an exception) when looking up a key that's not in the dicitonary:

In [31]:
joels_grade = grades.get("Joel", 0) # second argument is what you can set as the default value
print(joels_grade)

kates_grade = grades.get("Kate", 0) # Equals 0
print(kates_grade)

no_ones_grade = grades.get("No One") # if second argument is not supplied, the default is None
print(no_ones_grade)

80
0
None


Key/value pairs can be assigned using square brackets:

In [32]:
grades["Tim"] = 99 # replaces the old value
print(grades)

grades["Kate"] = 100 # adds a third entry
print(grades)

num_students = len(grades) # equals 3
print(num_students)

{'Joel': 80, 'Tim': 99}
{'Joel': 80, 'Tim': 99, 'Kate': 100}
3


Dictionaries can be used to represent structured data:

In [33]:
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "datascience", "awesome", "yolo"]
}

tweet

{'user': 'joelgrus',
 'text': 'Data Science is Awesome',
 'retweet_count': 100,
 'hashtags': ['#data', '#science', 'datascience', 'awesome', 'yolo']}

 In addition to looking for specific keys, we can also look at all of the keys:

In [34]:
tweet_keys = tweet.keys() # iterable for the keys
print(tweet_keys)

tweet_values = tweet.values() # iterable for the values
print(tweet_values)

tweet_items = tweet.items() # iterable for the (key, value) tuples
print(tweet.items)



dict_keys(['user', 'text', 'retweet_count', 'hashtags'])
dict_values(['joelgrus', 'Data Science is Awesome', 100, ['#data', '#science', 'datascience', 'awesome', 'yolo']])
<built-in method items of dict object at 0x1056b6200>


In [35]:
"user" in tweet_keys # Check for existence of "user". True, but not Pythonic


True

In [36]:
"user" in tweet # Pythonic way of checking for keys

True

In [37]:
"joelgrus" in tweet_values # True (slow but only way to check values)

True

We cannot use lists as keys (keys must be "hashable"). If we need a multipart key, use a tuple or turn the key into a string.

### defaultdict

`defaultdict`: When you try to look up a key it doesn't contain, it firsts adds a value for it using a zero-argument function you provided. Use `defaultdicts` by importing them from `collections`:

In [38]:
from collections import defaultdict

# word_counts = defaultdict(int) # int() produces 0
# for word in document:
#     word_counts[word] += 1

`defaultdicts` can also be used with `list`, `dict`, and lambda functions

In [39]:
dd_list = defaultdict(list) # list() produces an empty list
print(dd_list)
dd_list[2].append(1) # now dd_list contains {2: [1]}
print(dd_list)

defaultdict(<class 'list'>, {})
defaultdict(<class 'list'>, {2: [1]})


In [40]:
dd_dict = defaultdict(dict) # dict() produces an empty dict
print(dd_dict)
dd_dict["Joel"]["City"] = "Seattle" # {"Joel" : {"City": "Seattle"}}
print(dd_dict)

defaultdict(<class 'dict'>, {})
defaultdict(<class 'dict'>, {'Joel': {'City': 'Seattle'}})


In [41]:
dd_pair = defaultdict(lambda: [0, 0])
print(dd_pair)
dd_pair[2][1] = 2
print(dd_pair)

defaultdict(<function <lambda> at 0x1056fa9d0>, {})
defaultdict(<function <lambda> at 0x1056fa9d0>, {2: [0, 2]})


## Counters

A `Counter` converts a sequence of values into a `defaultdict(int)`-like object mapping keys to counts:

In [42]:
from collections import Counter
c = Counter([0, 1, 2, 0]) # c is {0: 2, 1: 1, 2: 1}
print(c)

Counter({0: 2, 1: 1, 2: 1})


We can also use the `most_common` method from a `Counter` instance:

In [43]:
# print the 2 most common numbers and their counts in c
c = Counter([0, 1, 2, 0, 0, 1, 1, 1, 1, 2])
print(c)

for num, count in c.most_common(2):
    print(num, count)

Counter({1: 5, 0: 3, 2: 2})
1 5
0 3


## Sets



A set is another Python data structure. It represents a collection of *distinct* elements. Define a set by listing its elements between curly braces:

In [44]:
primes_below_10 = {2, 3, 5, 7}

Keep in mind that you can't use `{}` to denote an empty set. Recall in the **Dictionaries** section that assigning `{}` to a variable denotes an `empty dict`. In order to create an empty set, we use `set()`:

In [45]:
s = set()
print(s)
s.add(1) # s is now {1}
print(s)
s.add(2) # s is now {1, 2}
print(s)
s.add(2) # s is still {1, 2} because sets represent a collection of distinct elements
print(s)

set()
{1}
{1, 2}
{1, 2}


Sets are used for two main reasons:

   1. `in` is a very fast operation on sets. Thus it is more appropriate to use a set to conduct a membership test than using a list.
   2. Use sets to find the *distinct* items in a collection.

In [46]:
# Using in
# stopwords_list = ["a", "an", "at"] + hundreds_of_other_words + ["yet", "you"]

# "zip" in stopwords_list # False, but have to check every element

# stopwords_set = set(stopwords_list)
# "zip" in stopwords_set # very fast to check

In [47]:
# Finding distinct items
item_list = [1, 2, 3, 1, 2, 3]
print(item_list)
num_items = len(item_list) # 6
item_set = set(item_list) # {1, 2, 3}
print(item_set)
num_distinct_items = len(item_set) # 3
distinct_item_list = list(item_set) # [1, 2, 3]
print(distinct_item_list)

[1, 2, 3, 1, 2, 3]
{1, 2, 3}
[1, 2, 3]


## Control Flow

You can perform an action conditionally using an `if` statement:

In [48]:
if 1 > 2:
    message = "if only 1 were greater than two..."
elif 1 > 3:
    message = "elif stands for 'else if'"
else:
    message = "when all else fails use els (if you want to)"

*Ternary* if-then-else statements can be made on one line as well:

In [49]:
parity = "even" if x % 2 == 0 else "odd"

Python's implementation of `while` loops:

In [50]:
x = 0
while x < 10:
    print(f"{x} is less than 10")
    x += 1

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


But `for` and `in` are more commonly used:

In [51]:
# range(10) is the numbers 0, 1, 2,..., 9
for x in range(10):
    print(f"{x} is less than 10")

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


More complex logic will use `continue` and `break`:

In [52]:
for x in range(10):
    if x == 3:
        continue # go immediately to the next iteration
    if x == 5:
        break # quit the loop entirely
    print(x)

0
1
2
4


## Truthiness

Booleans are capitalized:

In [53]:
one_is_less_than_two = 1 < 2 # equals True
print(one_is_less_than_two)
true_equals_false = True == False # equals False
print(true_equals_false)

True
False


In R, `NA` is used to indicate a nonexistent value. In Python, the equivalent is the value `None`:

In [54]:
x = None
assert x == None, "this is not the Pythonic way to check for None"
assert x is None, "this is the Pythonic way to check for None"

In [55]:
x is None # prints True

True

Python lets you use any value where it expects a Boolean. The following are treated as False:

   * `False`
   * `None`
   * `[]` (an empty `list`)
   * `{}` (an empty `dict`)
   * ""
   * set()
   * 0
   * 0.0

This allows us to easily use `if` statements to test for empty strings, empty dictionaries, etc. It also sometime causes bugs.

Python has an `all` function which takes an iterable and returns `True` when every element is truthy.
Python has an `any` function, which returns `True` when at least one element is truthy:

In [56]:
all([True, 1, 3]) # True, all are truthy

True

In [57]:
all([True, 1, {}]) # False, {} is falsy

False

In [58]:
any([True, 1, {}]) # True, True is truthy

True

In [59]:
all([]) # True, no falsy elements in the list

True

In [60]:
all([[],[],[]]) # False, list of empty lists. The empty lists are falsy

False

In [61]:
any([]) # False, no truthy elements in the list

False

#### Sorting

Every Python list has a `sort` method that sorts it in place. If you don't want to mess up your list, you can use the `sorted` functions, which returns a new list:

In [62]:
x = [4, 1, 2, 3]
y = sorted(x) # y is [1, 2, 3, 4], x is unchanged
print(x)
print(y)

[4, 1, 2, 3]
[1, 2, 3, 4]


In [63]:
x.sort() # now x is [1, 2, 3, 4]
print(x)

[1, 2, 3, 4]


As we can see, the `sort` and `sorted` sort a list from smallest to largest by default. We specify a `reverse=True` parameter to sort from largest to smallest. we can also use the `key` parameter to specify the way to sort:

In [64]:
# sort the list by absolute value from largest to smallest
x = sorted([-4, 1, -2, 3], key=abs, reverse=True) # is [-4, 3, -2, 1]
print(x)

[-4, 3, -2, 1]


## List Comprehensions

We use *list comprehensions* to transform a list into another list. These transformations can be choosing specific elements, transforming elements, or both.

In [65]:
even_numbers = [x for x in range(5) if x % 2 == 0] # [0, 2, 4]
print(even_numbers)

squares = [x * x for x in range(5)] # [0, 1, 4, 9, 16]
print(squares)

even_squares = [x * x for x in even_numbers] # [0, 4, 16]
print(even_squares)

[0, 2, 4]
[0, 1, 4, 9, 16]
[0, 4, 16]


Similarly, we can turn lists into dictionaries or sets:

In [66]:
square_dict = {x: x * x for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
print(square_dict)

square_set = {x * x for x in [1, -1]} # {1}
print(square_set)

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
{1}


Use an underscore as the variable if you don't need the value from the list:

In [67]:
zeros = [0 for _ in even_numbers] # has the same length as even_numbers
print(zeros)

[0, 0, 0]


A list comprehension can include multiple `fors`:

In [68]:
pairs = [(x, y)
        for x in range(10)
        for y in range(10)] # 100 pairs (0,0), (0,1) ... (9,8), (9,9)
print(pairs)

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9), (7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9), (8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9), (9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9)]


Later `fors` can use the rsults of earlier ones:

In [69]:
increasing_pairs = [(x, y)
                   for x in range(10)
                   for y in range(x + 1, 10)]
print(increasing_pairs)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]


## Automated Testing and `assert`

We can be confident our code is correct using either *types* or *automated tests*.

We will test our code using `assert` statements, which will cause our code to raise an `AssertionError` if the specified condition is not truthy:

In [70]:
assert 1 + 1 == 2
assert 1 + 1 == 2, "1 + 1 should equal 2 but didn't"

The optional message in the `assert` statement will be printed if the assertion fails. We should assert that functions we write are doing what we expect them to do:

In [71]:
def smallest_item(xs):
    return min(xs)

assert smallest_item([10, 20, 5, 40]) == 5
assert smallest_item([1, 0, -1, 2]) == -1

We can also assert things about inputs to functions:

In [72]:
def smallest_item(xs):
    assert xs, "empty list has no smallest item"
    return min(xs)

## Object-Oriented Programming

*Classes* allow us to encapsulate data and the functions that operate on them. Let's construct a class representing a "counting clicker".

It maintains a `count`, can be `clicked` to increment the count, allows you to `read_count`, and can be `reset` back to zero.

We define a class by using the `class` keyword and a PascalCase name:

In [73]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""
    # A class contains zero or more member functions
    # By convention, each takes a first parameter, self, that refers to the particular class instance
    # Usually a class has a constructor, named __init__.
    # It takes whatever parameters you need to construct an instance of your class
    # and does whatever setup you need.
    # Dunder methods like __init__ are considered "private" 
    # (users of the class are not supposed to directly call them)
    def __init__(self, count = 0):
        self.count = count
    
    # __repr__ produces the string representation of a class instance:
    def __repr__(self):
        return f"CountingClicker(count={self.count})"
    
    # Finally, implement the public API of our class:
    def click(self, num_times = 1):
        """Click the clicker some number of times."""
        self.count += num_times
    
    def read(self):
        return self.count
    
    def reset(self):
        self.count = 0

After we've defined the class, let's use `assert` to write test cases for our clicker:

In [74]:
clicker = CountingClicker()
assert clicker.read() == 0, "clicker should start with count 0"
clicker

CountingClicker(count=0)

In [75]:
clicker.click()
clicker.click()
assert clicker.read() == 2, "after two clicks, clicker should have count 2"
clicker.read()

2

In [76]:
clicker.reset()
assert clicker.read() == 0, "after reset, clicker should be back to 0"
clicker.read()

0

*Subclasses* can also be created, which *inherit* some of their functionality from a parent class. For example, we could create a non-reset-able clicker by using `CountingClicker` as the base class and overriding the `reset` method to do nothing:

In [77]:
# A subclass inherits all the behavior of its parent class.
class NoResetClicker(CountingClicker):
    # This class has all the same methods as CountingClicker.
    # Except that it has a reset method that does nothing.
    def reset(self):
        pass

In [78]:
clicker2 = NoResetClicker()
assert clicker2.read() == 0

In [79]:
clicker2.click()
assert clicker2.read() == 1

In [80]:
clicker2.reset()
assert clicker2.read() == 1, "reset shouldn't do anything"

## Iterables and Generators

We don't always need a list to retrieve specific elements by indices. If we only want the elements one at a time, there is no reason to keep an entire list of potentially billions of elements.

Most likely, we just need to iterate over the collection of elements using `for` and `in`. If this is the case, we can create *generators*, which can be iterated over just like lists but generate their values lazily on demand.

We can create generators with functions and the `yield` operator:

In [81]:
def generate_range(n):
    i = 0
    while i < n:
        yield i # every call to yield produces a value of the generator
        i += 1

The following loop will consume the `yielded` values one at a time until none are left:

In [82]:
for i in generate_range(10):
    print(f"i: {i}")

i: 0
i: 1
i: 2
i: 3
i: 4
i: 5
i: 6
i: 7
i: 8
i: 9


In fact, `range` is itself lazy, so there's no point in doing this.

With a generator, we can als create an infinite sequence:

In [83]:
def natural_numbers():
    """returns 1, 2, 3, ..."""
    n = 1
    while True:
        yield n
        n += 1

Although we shouldn't iterate over it without using some kind of `break` logic.

**Note:** The disadvantage of laziness is that we can only iterate through a generator once. If we need to iterate through something multiple times, we'll need to either re-create the generator each time or use a list. If generating the values is expensive, that might be a good reason to use a list instead.

Another way to create generators is by using `for` conprehensions wrapped in parentheses:

In [84]:
evens_below_20 = (i for i in generate_range(20) if i % 2 == 0)

Such a "generator comprehension" doesn't do any work until you iterate over it (using `for` or `next`). We can use this to build up elaborate data-processing pipelines:

In [85]:
# None of these computations *does* anything until we iterate
data = natural_numbers()
evens = (x for x in data if x % 2 == 0)
even_squares = (x ** 2 for x in evens)
even_squares_ending_in_six = (x for x in even_squares if x % 10 == 6)
# and so on

Sometimes, when we're iterating over lists or generators, we'll want both the values and their indices. Python has an `enumerate` function which turns values into pairs (`index`, `value`):

In [86]:
names = ["Alice", "Bob", "Charlie", "Debbie"]

# not Pythonic
for i in range(len(names)):
    print(f"name {i} is {names[i]}")

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


In [87]:
# also not Pythonic
i = 0
for name in names:
    print(f"name {i} is {names[i]}")
    i += 1

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


In [88]:
# Pythonic
for i, name in enumerate(names):
    print(f"name {i} is {name}")

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


## Randomness

We can generate random numbers with the `random` module:

In [90]:
import random
random.seed(10) # this ensures we get the same results every time

four_uniform_randoms = [random.random() for _ in range(4)]
print(four_uniform_randoms)

[0.5714025946899135, 0.4288890546751146, 0.5780913011344704, 0.20609823213950174]


The `random` module actually produces *pseudorandom* numbers based on an internal state that you can set with `random.seed` if you want to get reproducible results:

In [91]:
random.seed(10) # set the seed to 10
print(random.random())
random.seed(10) # reset the seed to 10
print(random.random())

0.5714025946899135
0.5714025946899135


We'll sometimes use `random.randrange`, which takes either one or two arguments and returns an element chosen randomly from the corresponding `range`:

In [92]:
random.randrange(10) # choose randomly from range(1) = [0, 1, ..., 9]
random.randrange(3, 6) # choose randomly from range(3, 6) = [3, 4, 5]

4

`random.shuffle` randomly reorders the elements of a list:

In [93]:
up_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
random.shuffle(up_to_ten)
print(up_to_ten)

[5, 6, 9, 2, 3, 7, 8, 4, 1, 10]


If we need to randomly pick one element from a list, we use `random.choice`:

In [94]:
my_best_friend = random.choice(["Alice", "Bob", "Charlie"])
print(my_best_friend)

Bob


If we need to randomly choose a sample of elements without replacement, we use `random.sample`:

In [95]:
lottery_numbers = range(60)
winning_numbers = random.sample(lottery_numbers, 6)
print(winning_numbers)

[4, 15, 47, 23, 2, 26]


To choose a smaple *with* replacement, we make multiple calls to `random.choice`:

In [96]:
four_with_replacement = [random.choice(range(10)) for _ in range(4)]
print(four_with_replacement)

[2, 9, 5, 6]


## Regular Expressions

Regular expressions (regex) provide a way of searching text:

In [100]:
import re

re_example = [ # All of these are true, because
    not re.match("a", "cat"), # 'cat' doesn't start with 'a'
    re.search("a", "cat"), # 'cat' has an 'a' in it
    not re.search("c", "dog"), # 'dog' doesn't have a 'c' in it.
    3 == len(re.split("[ab]", "carbs")), # split on a or b to ['c', 'r', 's']
    "R-D-" == re.sub("[0-9]", "-", "R2D2") # replace digits with dashes
]

print(re_example)

[True, <re.Match object; span=(1, 2), match='a'>, True, True, True]


## zip and Argument Unpacking

Sometimes we'll need to *zip* two or more iterables together. The `zip` function transforms multiple iterables into a single iterable of tuples of corresponding function:

In [101]:
list1 = ['a', 'b', 'c']
list2 = [1 ,2 ,3]

# zip is lazy, so we have to do something like the following
[pair for pair in zip(list1, list2)] # is [('a', 1), ('b', 2), ('c', 3)]

[('a', 1), ('b', 2), ('c', 3)]

What if the lists are different lengths? `zip` stops as soon as the first list ends.

We can also "unzip" a list using a strange trick:

In [105]:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)
print(letters)
print(numbers)

('a', 'b', 'c')
(1, 2, 3)


The asterisk (\*) performs *argument unpacking*, which uses the elements of `pairs` as individual arguments to `zip`. It ends up the same as if we had called:

In [106]:
letters, numbers = zip(('a', 1), ('b', 2), ('c', 3))
print(letters)
print(numbers)

('a', 'b', 'c')
(1, 2, 3)


We can use argument unpacking with any function:

In [107]:
def add(a,b): return a + b

add(1, 2) # returns 3
try:
    add([1, 2])
except TypeError:
    print("add expects two inputs")
add(*[1, 2]) # returns 3

add expects two inputs


3

## args and kwargs

Let's say we want to create a higher-order function that takes as input some function `f` and returns a new function that for any input returns twice the value of `f`:

In [108]:
def doubler(f):
    # Here we define a new function that keeps a reference to f
    def g(x):
        return 2 * f(x)
    
    # and return that new function
    return g

This works in some cases:

In [109]:
def f1(x):
    return x + 1

g = doubler(f1)
assert g(3) == 8, "(3 + 1) * 2 should equal 8"
assert g(-1) == 0, "(-1 + 1) * 2 shoudl equal 0"

However, it doesn't work with functions that take more than a single argument:

In [111]:
def f2(x, y):
    return x + y

g = doubler(f2)
try:
    g(1, 2)
except TypeError:
    print("as defined, g only takes one argument")

as defined, g only takes one argument


We need a way to specify a function that takes arbitrary arguments. We can do this with argument unpacking and a little bit of magic:

In [112]:
def magic(*args, **kwargs):
    print("unnamed args:", args)
    print("keyword args:", kwargs)

magic(1, 2, key="word", key2="word2")
print(magic)

unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}
<function magic at 0x1057039d0>


When we define a function like this, `args` is a tuple of its unnamed arguments and `kwargs` is a `dict` of its named arguments. It works the other way too, if we want to use a `list` or `tuple` and `dict` to *supply* arguments to a function:

In [115]:
def other_way_magic(x, y, z):
    return x + y + z

x_y_list = [1, 2]
z_dict = {"z": 3}
assert other_way_magic(*x_y_list, **z_dict) == 6, "1 + 2 + 3 should be 6"

We will only use it to produce higher-order functions whose inputs can accept arbitrary arguments:

In [116]:
def doubler_correct(f):
    """works no matter what kind of inputs f expects"""
    def g(*args, **kwargs):
        """whatever argument g is supplied, pass them through to f"""
        return 2 * f(*args, **kwargs)
    return g

g = doubler_correct(f2)
assert g(1, 2) == 6, "doubler should work now"

Our code will be more correct and readable if we are explicit about what sorts of arguments our functions require. Use `args` and `kwargs` only when we have no other option.

## Type Annotations

A *dynamically typed* language is one that doesn't care about the types of objects we use, as long as we use them in valid ways. Python is a dynamically typed language:

In [118]:
def add(a, b):
    return a + b

assert add(10, 5) == 15, "+ is valid for numbers"
assert add([1, 2], [3]) == [1, 2, 3], "+ is valid for lists"
assert add("hi ", "there") == "hi there", "+ is valid for strings"

In [119]:
try:
    add(10, "five")
except TypeError:
    print("cannot add an int to a string")

cannot add an int to a string


On the otherhand, *statically typed* languages require us to specify the types for our functions and objects:

In [120]:
def add(a: int, b: int) -> int:
    return a + b

add(10, 5) # you'd like this to be OK
add("hi ", "there") # you'd like this to be not OK

'hi there'

Notice how these type annotations don't actually *do* anything. We can still use the annotated `add` function to add strings, and the call to `add(10, "five")` will still raise the exact same `TypeError`.

That said, there are at least four good reasons to use type annotation in our Python code:

  * Types are a form of documentation. Compare the following two function stubs. The second one is more informative

In [121]:
# def dot_product(x, y): ...
    
# we have not yet defined Vector, but imagein we had
# def dot_product(x: Vector, y: Vector) _> float: ...

  * There are external tools (e.g. `mypy`) that will read our code, inspect the type annotations, and let us know about type errors *before we ever run our code*.
  * Forcing us to think about the types in our code makes us design cleaner functions and interfaces. The function whose `operation` parameter is allowed to be a `string`, or an `int`, or a `float`, or a `bool`. This is much more clear.

In [122]:
# from typing import Union

# def secretly_ugly_function(value, operation): ...

# def ugly_fuction(value: int,
#                 operation: Union[str, int, float, bool]) -> int: ...

  * Using types allows out editor to help us with things like autocomplete and to get angry at type errors.

### How to Write Type Annotations

For built-in types like `int` and `bool` and `float`, we just use the type itself as the annotation. What if we had a `list`?

In [124]:
def total(xs: list) -> float:
    return sum(total)

This isn't wrong, but the type is not specific enough: we want `xs` to be a `list` of `floats`, not a `list` of strings.

The `typing` module provides a number of parameterized types that we can use to do this:

In [125]:
from typing import List # note capital L

def total(xs: List[float]) -> float:
    return sum(total)

It's usually obvious what the type is for a variable:

In [126]:
# This is how to type-annotate variable when you define them.
# But this is unnecessary; it's "obvious" x is an int.
x: int = 5

But sometimes it's not obvious:

In [127]:
values = [] # what's my type?
best_so_far = None # what's my type?

In these cases, we supply inline type hints:

In [128]:
from typing import Optional

values: List[int] = []
best_so_far: Optional[float] = None # allowed to be either a float or None

The `typing` module contains many other types.

Since Python has first-class functions, we need a type to represent those as well:

In [130]:
from typing import Callable

# the type hint says that repeater is a function that takes
# two arguments, a string and an int, and returns a string.
def twice(repeater: Callable[[str, int], str], s: str) -> str:
    return repeater(s, 2)

def comma_repeater(s: str, n: int) -> str:
    n_copies = [s for _ in range(n)]
    return ', '.join(n_copies)

assert twice(comma_repeater, "type hints") == "type hints, type hints"

Since type annotations are just Python objects, we can assign them to variables to make them easier to refer to:

In [131]:
Number = int
Numbers = List[Number]

def total(xs: Numbers) -> Number:
    return sum(xs)