# 02 A crash course in Python

### The Zen of Python

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Code written in accordance with this “obvious” way (which may not be obvious at all to a newcomer) is often described as “Pythonic.”

## Whitespace Formatting

Many languages use curly braces to delimit blocks of code. Python uses indentation:

In [2]:
# the compund sign marks the start of a comment. Python itself
# ignores the comments, but they're helpful for anyone reading the code
for i in [1, 2, 3, 4, 5]:
    for j in [1, 2, 3, 4, 5]:
        print(i, j, i + j)
print('done looping')

1 1 2
1 2 3
1 3 4
1 4 5
1 5 6
2 1 3
2 2 4
2 3 5
2 4 6
2 5 7
3 1 4
3 2 5
3 3 6
3 4 7
3 5 8
4 1 5
4 2 6
4 3 7
4 4 8
4 5 9
5 1 6
5 2 7
5 3 8
5 4 9
5 5 10
done looping


This makes Python code very readable, but it also means that you have to be very careful with your formatting.

Whitespace is ignored inside parentheses and brackets, which can be helpful for long-winded computations and for making code easier to read:

In [3]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

easier_to_read_list_of_lists = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

You can also use a backslash to indicate that a statement continues onto the next line, although we’ll rarely do this:

In [4]:
two_plus_three = 2 + \
                    3
two_plus_three

5

### Modules
Certain features of Python are not loaded by default. These include both features that are included as part of the language as well as third-party features that you download yourself. In order to use these features, you’ll need to the modules that contain them.
One approach is to simply the module itself:

In [5]:
import re
my_regex = re.compile('[0-9]+', re.I)

Here, is the module containing functions and constants for working with regular expressions. After this type of you must prefix those functions with in order to access them.
If you already had a different in your code, you could use an alias:


In [6]:
import re as regex
my_regex = regex.compile('[0-9]+', re.I)

You might also do this if your module has an unwieldy name or if you’re going to be typing it a lot. For example, a standard convention when visualizing data with matplotlib is:

In [7]:
import matplotlib.pyplot as plt

If you need a few specific values from a module, you can import them explicitly and use them without qualification:

In [8]:
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()

If you were a bad person, you could import the entire contents of a module into your namespace, which might inadvertently overwrite variables you’ve already defined:

In [9]:
match = 0
from re import *
print(match)

<function match at 0x7feaa0112560>


However, since you are not a bad person, you won’t ever do this.

### Functions
A function is a rule for taking zero or more inputs and returning a corresponding output. In Python, we typically define functions using :

In [10]:
def double(x):
    """This is where you put an optiuonal docstring
    that explains what the function does.
    For eaxmple, this function multiplies its input by 2
    """
    return x * 2

Python functions are first-class, which means that we can assign them to variables and pass them into functions just like any other arguments:

In [11]:
def apply_to_one(f):
    """Calls the function f with 1 as its argument"""
    return f(1)

In [12]:
my_double = double
x = apply_to_one(my_double)

x

2

It is also easy to create short anonymous functions, or lambdas:

In [13]:
y = apply_to_one(lambda x: x + 4)
y

5

You can assign lambdas to variables, although most people will tell you that you should just use instead:

In [14]:
# don't do this
another_double = lambda x: 2 * x 

print(another_double(4))

# do this instead
def another_double(x):
    return 2 * x

print(another_double(4))

8
8


Function parameters can also be given default arguments, which only need to be specified when you want a value other than the default:

In [15]:
def my_print(message = 'my default messsage'):
    print(message)
    
my_print('hello')
my_print()

hello
my default messsage


It is sometimes useful to specify arguments by name:

In [16]:
def full_name(first='Whats his name', last='something'):
    return first + ' ' + last

print(full_name('joel', 'grus'))
print(full_name('joel'))
print(full_name(last='grus'))

joel grus
joel something
Whats his name grus


Strings can be delimited by single or double quotation marks (but the quotes have to match):

In [17]:
single_quoted_string = 'data science'
double_quoted_string = "data science"

Python uses backslashes to encode special characters. For example:

In [18]:
tab_string = '\t'
len(tab_string)

1

If you want backslashes as backslashes (which you might in Windows directory names or in regular expressions), you can create `raw` strings using `r""`:

In [19]:
not_tab_string = r'\t'
len(not_tab_string)

2

You can create multiline strings using three double quotes:

In [20]:
multi_line_string = """This is the first line.
and this is the second line
and this is the third line
"""

he f-string, which provides a simple way to substitute values into strings. For example, if we had the first name and last name given separately:

In [21]:
first_name = 'Joel'
last_name = 'Grus'

we might want to combine them into a full name. There are multiple ways to construct such a `full_name` string:

In [22]:
full_name1 = first_name + ' ' + last_name
full_name2 = '{0} {1}'.format(first_name, last_name)

print(full_name1)
print(full_name2)

Joel Grus
Joel Grus


but the f-string way is much less unwieldy:

In [23]:
full_name3 = f'{first_name} {last_name}'

print(full_name3)

Joel Grus


### Exceptions

When something goes wrong, Python raises an exception.
Unhandled, exceptions will cause your program to crash.
You can handle them using `try`
and `except`:

In [24]:
try:
    print(0 / 0)
except:
    print('cannot divide vy zero')

cannot divide vy zero


Although in many languages exceptions are considered bad, in Python there is no shame in using them to make your code cleaner, and we will sometimes do so.


### Lists
Probably the most fundamental data structure in Python is the list, which is simply an ordered collection (it is similar to what in other languages might be called an array, but with some added functionality):

In [25]:
integer_list = [1, 2, 3]
heterogeneous_list = ['string', 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]

list_length = len(integer_list)
list_sum = sum(integer_list)

print(integer_list)
print(list_length)
print(list_sum)

[1, 2, 3]
3
6


You can get or set the nth element of a list with square brackets:

In [26]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

zero = x[0]
one = x[1]

nine = x[-1]
eight = x[-2]
x[0] = -1

You can also use square brackets to slice lists. The slice `i:j` means all elements from (inclusive) to (not inclusive). If you leave off the start of
the slice, you’ll slice from the beginning of the list, and if you leave of the end of the slice, you’ll slice until the end of the list:

In [27]:
first_three = x[:3]
first_three

[-1, 1, 2]

In [28]:
three_to_end = x[3:]
three_to_end

[3, 4, 5, 6, 7, 8, 9]

In [29]:
one_to_four = x[1:5]
one_to_four

[1, 2, 3, 4]

In [30]:
last_three = x[-3:]
last_three

[7, 8, 9]

In [31]:
without_first_and_last = x[1:-1]
without_first_and_last

[1, 2, 3, 4, 5, 6, 7, 8]

In [32]:
copy_of_x = x[:]
copy_of_x

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You can similarly slice strings and other “sequential” types.
A slice can take a third argument to indicate its stride, which can be negative:


In [33]:
every_third = x[::3]
every_third

[-1, 3, 6, 9]

In [34]:
five_to_three = x[5:2:-1]
five_to_three

[5, 4, 3]

Python has an operator to check for list membership:

In [35]:
0 in [1, 2, 3]

False

In [36]:
1 in [1, 2, 3]

True

This check involves examining the elements of the list one at a time, which means that you probably shouldn’t use it unless you know your list is pretty small (or unless you don’t care how long the check takes).
It is easy to concatenate lists together. If you want to modify a list in place, you can use to add items from another collection:

In [37]:
x = [1, 2, 3]
x.extend([4, 5, 6])

If you don’t want to modify , you can use list addition:

In [38]:
x = [1, 2, 3]
y = x + [4, 5, 6]
y

[1, 2, 3, 4, 5, 6]

More frequently we will append to lists one item at a time:

In [39]:
x = [1, 2, 3]
x.append(0)
y = x[-1]
z = len(x)

print('y', y)
print('z', z)

y 0
z 4


It’s often convenient to unpack lists when you know how many elements they contain:

In [40]:
x, y = [1, 2]

print('x:', x)
print('y:', y)

x: 1
y: 2


although you will get a if you don’t have the same number of elements on both sides.


A common idiom is to use an underscore for a value you’re going to throw away:

In [41]:
__, y = [1, 2]

print('__:', __)
print('y:', y)

__: 1
y: 2


### Tuples

Tuples are lists’ immutable cousins. Pretty much anything you can do to a list that doesn’t involve modifying it, you can do to a tuple. You specify a tuple by using parentheses (or nothing) instead of square brackets:

In [42]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4
my_list[1] = 3

try:
    my_tuple[1] = 3
except TypeError:
    print('cannot modify a tuple')

cannot modify a tuple


Tuples are a convenient way to return multiple values from functions:

In [43]:
def sum_and_product(x, y):
    return (x + y), (x * y)

sp = sum_and_product(2, 3)
s, p = sum_and_product(5, 10)

Tuples (and lists) can also be used for multiple assignment

In [44]:
x, y = 1, 2
print('x: ', x, 'y: ', y)
x, y = y, x
print('x: ', x, 'y: ', y)

x:  1 y:  2
x:  2 y:  1


### Dictionaries
Another fundamental data structure is a dictionary, which associates values with keys and allows you to quickly retrieve the value corresponding to a given key:

In [45]:
empty_dict = {}
ampty_dict2 = dict()
grades = {
    'joel': 80,
    'tim': 95
}

You can look up the value for a key using square brackets:

In [46]:
joels_grade = grades['joel']
joels_grade

80

But you’ll get a if you ask for a key that’s not in the dictionary:

In [47]:
try:
    kates_grade = grades['kate']
except KeyError:
    print('no grade for kate.')

no grade for kate.


You can check for the existence of a key using :

In [48]:
joels_has_grade = 'joel' in grades
print(joels_has_grade)
kate_has_grade = 'kate' in grades
print(kate_has_grade)

True
False


This membership check is fast even for large dictionaries.

Dictionaries have a method that returns a default value (instead of raising an exception) when you look up a key that’s not in the dictionary:

In [49]:
joels_grade = grades.get('joel', 0)
kates_grade = grades.get('kate', 0)
kates_grade_var = grades.get('kate', 'No value')
no_ones_grade = grades.get('no one')

print(joels_grade)
print(kates_grade)
print(kates_grade_var)
print(no_ones_grade)

80
0
No value
None


You can assign key/value pairs using the same square brackets:

In [50]:
grades['tim'] = 99
grades['kate'] = 100
num_students = len(grades)

print(grades)
print(num_students)

{'joel': 80, 'tim': 99, 'kate': 100}
3


 you can use dictionaries to represent structured data:

In [51]:
tweet = {
    'user': 'joelgrus',
    'text': 'data science is awesome',
    'retweet_count': 100,
    'hashtags': ['#data', '#science', '#datascience', '#awesome', '#yolo']
}

although we’ll soon see a better approach.

Besides looking for specific keys, we can look at all of them:

In [52]:
tweet_keys = tweet.keys()
print(tweet_keys)

tweet_values = tweet.values()
print(tweet_values)

tweet_items = tweet.items()
print(tweet_items)

print('user' in tweet_keys)       # true but not pythonic
print('user' in tweet)            # pythonic way for checking for keys
print('joelgrus' in tweet_values) # true (slow but the only way to check it)

dict_keys(['user', 'text', 'retweet_count', 'hashtags'])
dict_values(['joelgrus', 'data science is awesome', 100, ['#data', '#science', '#datascience', '#awesome', '#yolo']])
dict_items([('user', 'joelgrus'), ('text', 'data science is awesome'), ('retweet_count', 100), ('hashtags', ['#data', '#science', '#datascience', '#awesome', '#yolo'])])
True
True
True


Dictionary keys must be “hashable”; in particular, you cannot use lists as keys. If you need a multipart key, you should probably use a tuple or figure out a way to turn the key into a string.


### defaultdict

Imagine that you’re trying to count the words in a document. An obvious approach is to create a dictionary in which the keys are words and the values are counts. As you check each word, you can increment its count if it’s already in the dictionary and add it to the dictionary if it’s not:

In [53]:
document = """this is a medium size string text for demonstrations purpouses
of dictionaries, the repeted words will be text text text and medium medium
and other words such as words or this or that. is is is is of of size string
"""
document = document.split()
word_counts = {}
for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1
word_counts

{'this': 2,
 'is': 5,
 'a': 1,
 'medium': 3,
 'size': 2,
 'string': 2,
 'text': 4,
 'for': 1,
 'demonstrations': 1,
 'purpouses': 1,
 'of': 3,
 'dictionaries,': 1,
 'the': 1,
 'repeted': 1,
 'words': 3,
 'will': 1,
 'be': 1,
 'and': 2,
 'other': 1,
 'such': 1,
 'as': 1,
 'or': 2,
 'that.': 1}

You could also use the “forgiveness is better than permission” approach and just handle the exception from trying to look up a missing key:

In [54]:
word_counts = {}
for word in document:
    try:
        word_counts[word] += 1
    except KeyError:
        word_counts[word] = 1
word_counts

{'this': 2,
 'is': 5,
 'a': 1,
 'medium': 3,
 'size': 2,
 'string': 2,
 'text': 4,
 'for': 1,
 'demonstrations': 1,
 'purpouses': 1,
 'of': 3,
 'dictionaries,': 1,
 'the': 1,
 'repeted': 1,
 'words': 3,
 'will': 1,
 'be': 1,
 'and': 2,
 'other': 1,
 'such': 1,
 'as': 1,
 'or': 2,
 'that.': 1}

A third approach is to use , which behaves gracefully for missing keys:

In [55]:
word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1
word_counts

{'this': 2,
 'is': 5,
 'a': 1,
 'medium': 3,
 'size': 2,
 'string': 2,
 'text': 4,
 'for': 1,
 'demonstrations': 1,
 'purpouses': 1,
 'of': 3,
 'dictionaries,': 1,
 'the': 1,
 'repeted': 1,
 'words': 3,
 'will': 1,
 'be': 1,
 'and': 2,
 'other': 1,
 'such': 1,
 'as': 1,
 'or': 2,
 'that.': 1}

Every one of these is slightly unwieldy, which is why is useful. A is like a regular dictionary, except that when you try to look up a key it doesn’t contain, it first adds a value for it using a zero argument function you provided when you created it. In order to use `defaultdicts`, you have to import them from `collections`:

In [56]:
from collections import defaultdict

word_counts = defaultdict(int)
for word in document:
    word_counts[word] += 1

They can also be useful with or , or even your own functions:

In [57]:
dd_list = defaultdict(list)
dd_list[2].append(1)
print(dd_list)

dd_dict = defaultdict(dict)
dd_dict['joel']['city'] = 'seatle'
print(dd_dict)

dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1
print(dd_pair)

defaultdict(<class 'list'>, {2: [1]})
defaultdict(<class 'dict'>, {'joel': {'city': 'seatle'}})
defaultdict(<function <lambda> at 0x7feab094fb00>, {2: [0, 1]})


These will be useful when we’re using dictionaries to “collect” results by some key and don’t want to have to check every time to see if the key exists yet.

### Counters

A `Counter` turns a sequence of values into a `defaultdict(int)`-like object mapping keys to counts:


In [58]:
from collections import Counter
c = Counter([0, 1, 2, 0])

This gives us a very simple way to solve our `word_counts` problem:

In [59]:
# recall, document is a list of words
word_counts = Counter(document)

A instance has a `most_common` method that is frequently useful:

In [60]:
# print the 10 most common words and their counts
for word, count in word_counts.most_common(10):
    print(word, count)

is 5
text 4
medium 3
of 3
words 3
this 2
size 2
string 2
and 2
or 2


### Sets
Another useful data structure is set, which represents a collection of distinct elements. You can define a set by listing its elements between curly braces:


In [61]:
primes_below_10 = {2, 3, 5, 7}

However, that doesn't work for empty `sets` as `{}` already
means "empty `dict`". In that case you'll need to use `set()` itself:

In [62]:
s = set()
s.add(1)
s.add(2)
s.add(3)
x = len(s)
y = 2 in s
z = 3 in s

We’ll use sets for two main reasons. The first is that `in` is a very fast operation on sets. If we have a large collection of items that we want to use for a membership test, a set is more appropriate than a list:

In [63]:
hundreds_of_other_words = ['.......']
stopwords_list = ['a', 'an', 'at'] + hundreds_of_other_words + ['yet', 'you']

In [64]:
'zip' in stopwords_list  # False, but we have to check every element

False

In [65]:
stopwords_set = set(stopwords_list)
'zip' in stopwords_set   # very fast to check

False

The second reason is to find the distinct items in a collection:

In [66]:
item_list = [1, 2, 3, 1, 2, 3]
num_items = len(item_list)
item_set = set(item_list)
num_distinct_items = len(item_set)
distinct_item_list = list(item_set)

We’ll use sets less frequently than dictionaries and lists.

### Control Flow

As in most programming languages, you can perform an action conditionally using `if`:

In [67]:
if 1 > 2:
    message = 'of only 1 were greater than two...'
elif 1 > 3:
    message = 'elif stands for "else if"'
else:
    message = 'when all else fails use else (if you want to)'

You can also write a ternary if-then-else on one line

In [68]:
parity = 'even' if x % 2 == 0 else 'odd'

Python has a `while` loop:

In [69]:
x = 0
while x < 10:
    print(f'{x} is less than 10')
    x += 1

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


although more often we’ll use `for` and `in`:

In [70]:
for x in range(0, 10):
    print(f'{x} is less than 10')

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


If you need more complex logic, you can use `continue` and `break`:

In [71]:
for x in range(0, 10):
    if x == 3:
        continue
    if x == 5:
        break
    print(x)

0
1
2
4


### Sorting
Every Python list has a method that sorts it in place. If you don’t want to mess up your list, you can use the function, which returns a new list:

In [72]:
x = [4, 1, 2, 3]
y = sorted(x)      # x is unchanged
x.sort()           # x in sorted in place

print(y)
print(x)

[1, 2, 3, 4]
[1, 2, 3, 4]


By default, (and ) sort a list from smallest to largest based on naively comparing the elements to one another.
If you want elements sorted from largest to smallest, you can specify `reverse=True` a parameter. And instead of comparing the elements
themselves, you can compare the results of a function that you specify with `key`:

In [73]:
# sort the list by absolute value from largest to smallest
x = sorted([-4, 1, -2, 3], key=abs, reverse=True)

# sort the words and counts form highest count to lowest
wc = sorted(word_counts.items(),
            key = lambda word_and_count: word_and_count[1],
            reverse=True)

 ### List Comprehensions


Frequently, you’ll want to transform a list into another list by choosing only certain elements, by transforming elements, or both. The Pythonic way to do this is with list comprehensions:


In [74]:
even_numbers = [x for x in range(5) if x % 2 == 0]
squares = [x * x for x in range(5)]
even_squared = [x * x for x in even_numbers]

print(even_numbers)
print(squares)
print(even_squared)

[0, 2, 4]
[0, 1, 4, 9, 16]
[0, 4, 16]


You can similarly turn lists into dictionaries or sets:

In [75]:
square_dict = {x: x * x for x in range(0, 5)}
square_set = {x * x for x in [1, -1]}

If you don’t need the value from the list, it’s common to use an underscore as the variable:

In [76]:
zeros = [0 for __ in even_numbers]

A list comprehension can include multiple `for`s:


In [77]:
pairs = [(x, y) for x in range(0, 10)
                    for y in range(0, 10)]
print(pairs)

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9), (7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9), (8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9), (9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9)]


and later `for`s can use the results of earlier ones:


In [78]:
increasing_pairs = [(x, y) for x in range(0, 1)
                               for y in range(x + 1, 10)]
print(increasing_pairs)

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9)]


### Automated Testing and assert

As data scientists, we’ll be writing a lot of code. How can we be confident our code is correct? One way is with types (discussed shortly), but another way is with automated tests.
There are elaborate frameworks for writing and running tests, but in this book we’ll restrict ourselves to using statements, which will cause
your code to raise an `AssertionError` if your specified condition is not truthy:

In [79]:
assert 1 + 1 == 2
assert 1 + 1 == 2, '1 + 1 should be equal 2 but didnt'

As you can see in the second case, you can optionally add a message to be printed if the assertion fails.

It’s not particularly interesting to assert that 1 + 1 = 2. What’s more interesting is to assert that functions you write are doing what you expect them to:


In [80]:
def smallest_item(xs):
    return min(xs)


assert smallest_item([10, 20, 5, 40]) == 5
assert smallest_item([1, 0, -1, 2]) == -1

Another less common use is to assert things about inputs to functions:

In [81]:
def smallest_item(xs):
    assert xs, 'empty list has no smallest item'
    return min(xs)

### Object-Oriented Programming

Like many languages, Python allows you to define classes that encapsulate data and the functions that operate on them. We’ll use them sometimes to make our code cleaner and simpler. It’s probably simplest to explain them by constructing a heavily annotated example.
Here we’ll construct a class representing a “counting clicker,” the sort that is used at the door to track how many people have shown up for the “advanced topics in data science” meetup.

It maintains a `count`, can be `click`ed to increment the count, allows you to `read_count`, and can be back to zero. (In real life one of these rolls
over from 9999 to 0000, but we won’t bother with that.)

To define a `class`, you use the keyword and a PascalCase name:

In [82]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""

A class contains zero or more member functions. By convention, each takes a first parameter, , that refers to the particular class instance.

Normally, a class has a constructor, named `__init__`. It takes whatever parameters you need to construct an instance of your class and does whatever setup you need:

In [83]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""
    def __init__(self, count=0):
        self.count = count

Although the constructor has a funny name, we construct instances of the clicker using just the class name:

In [84]:
clicker1 = CountingClicker()
clicker2 = CountingClicker(100)
clicker3 = CountingClicker(count=100)

Notice that the `__init__` method name starts and ends with double underscores. These “magic” methods are sometimes called “dunder” methods (double-UNDERscore, get it?) and represent “special” behaviors.

Another such method is `__repr__`, which produces the string representation of a class instance:

In [85]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""
    def __init__(self, count=0):
        self.count = count
    
    def __repr__(self):
        return f'CoutingClicker(count={self.count})'

And finally we need to implement the public API of our class:

In [86]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""
    def __init__(self, count=0):
        self.count = count
    
    def __repr__(self):
        return f'CoutingClicker(count={self.count})'
    
    def click(self, num_times=1):
        """Click the clicker some number of times."""
        self.count += num_times
    
    def read(self):
        return self.count
    
    def reset(self):
        self.count = 0

Having defined it, let’s use to write some test cases for our clicker:

In [87]:
clicker = CountingClicker()
assert clicker.read() == 0, 'clicker should start with count 0'

In [88]:
clicker.click()
clicker.click()
assert clicker.read() == 2, 'after two clicks, clicker should have count 2'

In [89]:
clicker.reset()
assert clicker.read() == 0, 'after reset, clicker should be back to 0'

Writing tests like these help us be confident that our code is working the way it’s designed to, and that it remains doing so whenever we make changes to it.


We’ll also occasionally create subclasses that inherit some of their functionality from a parent class. For example, we could create a non-reset- able clicker by using as the base class and overriding the `reset` method to do nothing:


In [90]:
# a subclass inherits all the behaviour of its parent class
class NoResetClicker(CountingClicker):
    # this class has all the same methods as CountingClicker
    # except tha tit has a reset method that does nothing
    def reset(self):
        pass

In [91]:
clicker2 = NoResetClicker()
assert clicker2.read() == 0

In [92]:
clicker2.click()
assert clicker2.read() == 1

In [93]:
clicker2.click()
assert clicker2.read() == 2

In [94]:
clicker2.reset()
assert clicker2.read() == 2

### Iterables and Generators

One nice thing about a list is that you can retrieve specific elements by their indices. But you don’t always need this! A list of a billion numbers takes up a lot of memory. If you only want the elements one at a time, there’s no good reason to keep them all around. If you only end up needing the first several elements, generating the entire billion is hugely wasteful.
Often all we need is to iterate over the collection using and . In this case we can create generators, which can be iterated over just like lists but generate their values lazily on demand.
One way to create generators is with functions and the `yield` operator:

In [96]:
def generate_range(n):
    i = 0
    while i < n:
        yield i
        i += 1

The following loop will consume the `yield`ed values one at a time until none are left:



In [98]:
for i in generate_range(10):
    print(f'i: {i}')

i: 0
i: 1
i: 2
i: 3
i: 4
i: 5
i: 6
i: 7
i: 8
i: 9


(In fact, `range` is itself lazy, so there’s no point in doing this.)

With a generator, you can even create an infinite sequence:

In [99]:
def generate_natural_numbers():
    """returns 1, 2, 3, ..."""
    n = 1
    while True:
        yield n
        n += 1

although you probably shouldn’t iterate over it without using some kind of `break` logic.

##### Tip
The flip side of laziness is that you can only iterate through a generator once. If you need to iterate through something multiple times, you’ll need to either re-create the generator each time or use a list. If generating the values is expensive, that might be a good reason to use a list instead.

A  second way to create generators is by using comprehensions wrapped in parentheses:

In [101]:
evens_below_20 = (i for i in generate_range(20) if i % 2 == 0)
evens_below_20

<generator object <genexpr> at 0x7feab093d8d0>

Such a “generator comprehension” doesn’t do any work until you iterate over it (using or ). We can use this to build up elaborate data- processing pipelines:

In [103]:
# none of these computation *does* anuthing until we iterate
data = generate_natural_numbers()
evens = (x for x in data if x % 2 == 0)
even_squares = (x ** 2 for x in evens)
even_squares_ending_in_six = (x for x in even_squares if x % 10 == 6)
# and so on

ot infrequently, when we’re iterating over a list or a generator we’ll want not just the values but also their indices. For this common case Python provides an function, which turns values into pairs
`(index, value):`

In [105]:
names = ['Alice', 'Bob', 'Charlie', 'Debbie']

# not pythonic
for i in range(len(names)):
    print(f'name {i} is {names[i]}')
    
# also not pythonic
i = 0
for name in names:
    print(f'name {i} is {names[i]}')
    i += 1
    
# pythonic
for i, name in enumerate(names):
    print(f'name {i} is {name}')

name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie
name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie
name 0 is Alice
name 1 is Bob
name 2 is Charlie
name 3 is Debbie


### Randomness

As we learn data science, we will frequently need to generate random numbers, 

In [108]:
import random
random.seed(10)  # this ensures we get the same results every time

four_uniform_randoms = [random.random() for __ in range(0, 4)]

four_uniform_randoms

[0.5714025946899135,
 0.4288890546751146,
 0.5780913011344704,
 0.20609823213950174]

The `random` module actually produces _pseudorandom_ (that is, determinisitc)
numbers based on an internal state that you can set with `random.seed` if you
want to get reproducible results:

In [109]:
random.seed(10)
print(random.random())

random.seed(10)
print(random.random())

0.5714025946899135
0.5714025946899135


We’ll sometimes use `random.randrange`, which takes either one or two arguments and returns an element chosen randomly from the corresponding `range`:

In [112]:
# choose randomly from range(10) = [0, 1, 2, ..., 9]
random.randrange(10)

9

In [113]:
# choose randomly from range(3, 6) = [3, 4, 5]
random.randrange(3, 6)

3

There are a few more methods that we’ll sometimes find convenient. For example,
`random.shuffle` randomly reorders the elements of a list:

In [114]:
up_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
random.shuffle(up_to_ten)
print(up_to_ten)

[10, 5, 6, 1, 2, 3, 7, 9, 8, 4]


In [115]:
# if you need to randomly pick one element from a list
# you can use random.choice:
my_best_friend = random.choice(['Alice', 'Bob', 'Charlie'])

And if you need to randomly choose a sample of elements without replacement
(i.e., with no duplicates), you can use `random.sample`:

In [116]:
lottery_numbers = range(0, 60)
winning_numbers = random.sample(lottery_numbers, 6)

winning_numbers

[4, 15, 47, 23, 2, 26]

To choose a sample of elements with replacement 
(i.e., allowing duplicates), you can just make multiple calls to `random.choice`:


In [117]:
four_with_replacement = [random.choice(range(0, 10)) for __ in range(0, 4)]
print(four_with_replacement)

[2, 9, 5, 6]


### Regular Expressions

Regular expressions provide a way of searching text. They are incredibly useful, but also fairly complicated—so much so that there are entire books written about them. We will get into their details the few times we encounter them; here are a few examples of how to use them in Python:

In [118]:
import re

re_examples = [
    not re.match('a', 'cat'),
    re.search('a', 'cat'),
    not re.search('c', 'dog'),
    3 == len(re.split('[ab]', 'carbs')),
    'R-D-' == re.sub('[0-9]', '-', 'R2D2')
]

assert all(re_examples), 'all the regex examples should be True'

### zip and Argument Unpacking

Often we will need to zip two or more iterables together. The `zip` function transforms multiple iterables into a single iterable of tuples of corresponding function:

In [121]:
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]

# zip is lazy
zip(list1, list2)

<zip at 0x7feab0bb1050>

In [122]:
[pair for pair in zip(list1, list2)]

[('a', 1), ('b', 2), ('c', 3)]

If the lists are different lengths, stops as soon as the first list ends.

You can also “unzip” a list using a strange trick:

In [124]:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)
print(letters, numbers)

('a', 'b', 'c') (1, 2, 3)


The asterisk (`*`) performs argument unpacking, which uses the elements of `pairs` as individual arguments to . It ends up the same as if you’d
called:


In [125]:
letters, numbers = zip(('a', 1), ('b', 2), ('c', 3))

You can use argument unpacking with any function:

In [126]:
def add(a, b):
    return a + b

print(add(1, 2))

try:
    add([1, 2])
except TypeError:
    print('add expects two inputs')

print(add(*[1, 2]))

3
add expects two inputs
3


### `args` and `kwargs`

Let’s say we want to create a higher-order function that takes as input some function `f` and returns a new function that for any input returns twice the
the value of `f`

In [127]:
def doubler(f):
    # here we define a new funciton that keeps a reference to f
    def g(x):
        return 2 * f(x)
    # and return that new function
    return g

In [128]:
# this works in some cases:
def f1(x):
    return x + 1

g = doubler(f1)
assert g(3) == 8, '(3 + 1) * 2 should equal 8'
assert g(-1) == 0, '(-1 + 1) * 2 should equal 0'

In [129]:
# however, it doesn't work with functions that make more thana  single
# argument
def f2(x, y):
    return x + y

g = doubler(f2)
try:
    g(1, 2)
except TypeError:
    print('as defined, g only takes one argument')

as defined, g only takes one argument


What we need is a way to specify a function that takes arbitrary arguments. We can do this with argument unpacking and a little bit of magic:

In [131]:
def magic(*args, **kwargs):
    print('unnamed args:', args)
    print('keyword args:', kwargs)
    return

magic(1, 2, key='word', key2='word2')

unnamed args: (1, 2)
keyword args: {'key': 'word', 'key2': 'word2'}


That is, when we define a function like this, is a tuple of its unnamed arguments and is a of its named arguments amd `kwargs` is a `dict` of its named arguments. It works the other
way too, if you want to use a  `list` (or `tuple`) and `dict` to supply arguments to a function:

In [137]:
def other_way_magic(x, y, z):
    return x + y + z

x_y_list = [1, 2]
z_dict = {'z': 3}
assert other_way_magic(*x_y_list, **z_dict) == 6, '1 + 2 + 3 should be 6'

You could do all sorts of strange tricks with this; we will only use it to produce higher-order functions whose inputs can accept arbitrary arguments:

In [142]:
def doubler_correct(f):
    """works no matter what kind of inputs f expects"""
    def g(*args, **kwargs):
        """whatever arguments g is supplied, pass them through to f"""
        return 2 * f(*args, **kwargs)
    return g

g = doubler_correct(f2)
assert g(1, 2) == 6, 'doubler should work now'

### Type Annotations

Python is a dynamically typed language. That means that it in general it doesn’t care about the types of objects we use, as long as we use them in valid ways:

In [144]:
def add(a, b):
    return a + b

assert add(10, 5) == 15, '+ is valid for numbers'
assert add([1, 2], [3]) == [1, 2, 3], '+ is valid for lists'
assert add('hi ', 'there') == 'hi there', '+ is valid for strings'

try:
    add(10, 'five')
except TypeError:
    print('cannot add an int to a string')

cannot add an int to a string


whereas in a statically typed language our functions and objects would have specific types:


In [145]:
def add(a:int, b:int) -> int:
    return a + b

add(10, 5)
add('hi ', 'there')

'hi there'

In fact, recent versions of Python do (sort of) have this functionality. The preceding version of with the type annotations is valid Python 3.6!
However, these type annotations don’t actually do anything. You can still use the annotated function to add strings, and the call to `add(10, 'five')` 
will still raise the exact same `TypeError`.

That said, there are still (at least) four good reasons to use type annotations in your Python code:

- ypes are an important form of documentation. This is doubly true in a book that is using code to teach you theoretical and mathematical concepts. Compare the following two function stubs:

```python
def dot_product(x, y):
    ...
    
def dot_product(x: Vector, y:Vector) -> float:
    ...
```

I find the second one exceedingly more informative; hopefully you do too. (At this point I have gotten so used to type hinting that I now find untyped Python difficult to read.)



- There are external tools (the most popular is ) that will read your code, inspect the type annotations, and let you know about type errors before you ever run your code. For example, if you ran `mypy` over a file containing `add('hi ', there')`, it would warn you:

`error: Argument 1 to "add" has incompatible type "str"; expected "int"`


Like `assert` testing, this is a good way to find mistakes in your code before you ever run it. The narrative in the book will not involve such a type checker; however, behind the scenes I will be running one, which will help ensure that the book itself is correct.


- Having to think about the types in your code forces you to design cleaner functions and interfaces.

Sometimes people insist that type hints may be valuable on
large projects but are not worth the time for small ones.
However, since type hints take almost no additional time
to type and allow your editor to save you time, I maintain
that they actually allow you to write code more quickly, even for small projects.

#### How to Write Type Annotations

As we’ve seen, for built-in types like `int` and `bool` and `float` , you just use the type itself as the annotation. What if you had (say) a `list`?
 

In [148]:
def total(xs: list) -> float:
    return sum(total)


This isn’t wrong, but the type is not specific enough. It’s clear we really want
`xs` to be a `list` of `floats` , not (say) a `list` of strings.

The module provides a number of parameterized types that we can use to do just this:


In [149]:
from typing import List   # note capital L

def total(xs: List[float]) -> float:
    return sum(total)

Up until now we’ve only specified annotations for function parameters and return types. For variables themselves it’s usually obvious what the type is:

In [151]:
# this is how to type-annotate variables when you
# define them. but this is unnecessary, its obvious x is an int
x: int = 5

In [152]:
# however, sometimes it's not obvious
values = []           # what's my type
best_so_far = None    # what's my type

In such cases we will supply inline type hints:

In [153]:
from typing import Optional

values: List[int] = []
best_so_far: Optional[float] = None  # allowed to be either float or None

The `typing` module contains many other types, only a few of which we’ll ever use:

In [156]:
# the type annotations in this snippet are all unnecesary
from typing import Dict, Iterable, Tuple

# keys are strings, values are int
counts: Dict[str, int] = {'data': 1, 'science': 2}

# lists and generators are both iterable
lazy = random.choice([True, False])
if lazy:
    evens: Iterable[int] = (x for x in range(0, 10) if x % 2 == 0)
else:
    evens = [0, 2, 4, 6, 8]
    
# tuples specify a type for each element
triple: Tuple[int, float, int] = (10, 2.3, 5)

Finally, since Python has first-class functions, we need a type to represent those as well. Here’s a pretty contrived example:

In [159]:
from typing import Callable

# the type hint says that repeater is a function that takes
# two arguments, a string and an int, and returns a string
def twice(repeater: Callable[[str, int], str], s:str) -> str:
    return repeater(s, 2)



#             a fun | arg1 | arg2 | return
# repeater: Callable[[str,  int],   str]
# ---> Callable[[list_of_types_for_args], list_of_types_for_return]


def comma_repeater(s: str, n:int) -> str:
    n_copies = [s for __ in range(0, n)]
    return ', '.join(n_copies)


assert twice(comma_repeater, 'type hints') == 'type hints, type hints'

As type annotations are just Python objects, we can assign them to variables to make them easier to refer to:

In [160]:
Number = int
Numbers = List[Number]

def total(xs: Numbers) -> Number:
    return sum(xs)