# A Crash Course in Python

## Whitespace  Formatting

Many languages use curly braces to delimit blocks of code. Python uses
indentation:

In [None]:
# The pound sign marks the start of a comment. Python itself
# ignores the comments, but they're helpful for anyone reading the code.

for i in [1,2,3,4,5]:
    print(i) # first line in "for i" block
    for j in [1,2,3,4,5]:
        print(j) # first line in "for j" block
        print(i+j) # last line in "for j" block
    print(i) # last line in "for i" block
print("Done Looping")

Whitespace is ignored inside parentheses and brackets, which can be
helpful for long-winded computations:


In [3]:
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)

long_winded_computation

210

and for making code easier to read:


In [4]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [5]:
easier_to_read_list_of_lists = [[1, 2, 3],
                                [4, 5, 6],
                                [7, 8, 9]]

## Modules

Certain features of Python are not loaded by default. These include both
features that are included as part of the language as well as third-party
features that you download yourself. In order to use these features, you’ll
need to import the modules that contain them.

One approach is to simply import the module itself:


In [7]:
import re

my_regex = re.compile("[0-9]+", re.I)

Here, re is the module containing functions and constants for working with
regular expressions. After this type of import you must prefix those
functions with re. in order to access them.


If you already had a different re in your code, you could use an alias:

In [6]:
import re as regex

my_regex = regex.compile("[0-9]+", regex.I)

You might also do this if your module has an unwieldy name or if you’re
going to be typing it a lot. For example, a standard convention when
visualizing data with matplotlib is:


In [None]:
import matplotlib.pyplot as plt
plt.plot(...)

If you need a few specific values from a module, you can import them
explicitly and use them without qualification:

In [None]:
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()


If you were a bad person, you could import the entire contents of a module
into your namespace, which might inadvertently overwrite variables you’ve
already defined:

In [None]:
match = 10
from re import * # uh oh, re has a match function
print(match) # "<function match at 0x10281e6a8>"

## Functions

A function is a rule for taking zero or more inputs and returning a
corresponding output. In Python, we typically define functions using def:

In [None]:
def double(x):
"""
This is where you put an optional docstring that explains what the
function does. For example, this function multiplies its input by 2.
"""
return x * 2

Python functions are first-class, which means that we can assign them to
variables and pass them into functions just like any other arguments:

In [None]:
def apply_to_one(f):
"""Calls the function f with 1 as its argument"""
return f(1)

In [None]:
my_double = double # refers to the previously defined function
x = apply_to_one(my_double) # equals 2

It is also easy to create short anonymous functions, or lambdas:

In [None]:
y = apply_to_one(lambda x: x + 4) # equals 5

You can assign lambdas to variables, although most people will tell you that
you should just use def instead:

In [None]:
another_double = lambda x: 2 * x # don't do this

In [None]:
def another_double(x):
"""Do this instead"""
return 2 * x

Function parameters can also be given default arguments, which only need
to be specified when you want a value other than the default:

In [None]:
def my_print(message = "my default message"):
print(message)

my_print("hello") # prints 'hello'
my_print() # prints 'my default message'

It is sometimes useful to specify arguments by name:

In [11]:
def full_name(first = "What's-his-name", last = "Something"):
    return first + " " + last

full_name("Joel", "Grus") # "Joel Grus"
full_name("Joel") # "Joel Something"
full_name(last="Grus") # "What's-his-name Grus"

'Joel Grus'

## Strings

In [1]:
# Example of Strings

single_quoted_string = 'data science'
double_quoted_string = "data science" 

In [10]:
# Python uses backslashes to encode special characters. 
# For example:

tab_string = "\t" # represents the tab character is 1
len(tab_string)

1

In [9]:
print(tab_string)

	


If you want backslashes as backslashes (which you might in Windows
directory names or in regular expressions), you can create raw strings using
r"":

In [8]:
not_tab_string = r"\t" # represents the characters '\' and 't'
len(not_tab_string) # is 2

2

In [7]:
print(not_tab_string)

\t


You can create multiline strings using three double quotes:

In [None]:
multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""

In [12]:
first_name = "Joel"
last_name = "Grus"

In [13]:
full_name1 = first_name + " " + last_name # string addition
full_name2 = "{0} {1}".format(first_name, last_name) # string.format

but the f-string way is much less unwieldy:

In [14]:
full_name3 = f"{first_name} {last_name}"

In [15]:
full_name1

'Joel Grus'

In [16]:
full_name2

'Joel Grus'

In [17]:
full_name3

'Joel Grus'

## Exceptions

When something goes wrong, Python raises an exception. Unhandled,
exceptions will cause your program to crash. You can handle them using
"try" and "except":

In [18]:
try:
    print(0/0)
except ZeroDivisionError:
    print("cannot divide by zero.")

cannot divide by zero.


## Lists

In [33]:
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]

In [34]:
list_length = len(integer_list) # equals 3
list_sum = sum(integer_list) # equals 6

You can get or set the "nth" element of a list with square brackets:


In [35]:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

zero = x[0] # equals 0, lists are 0-indexed
one = x[1] # equals 1
nine = x[-1] # equals 9, 'Pythonic' for last element
eight = x[-2] # equals 8, 'Pythonic' for next-to-last element
x[0] = -1 # now x is [-1, 1, 2, 3, ..., 9]

You can also use square brackets to slice lists.

In [36]:
first_three = x[:3] # [-1, 1, 2]
three_to_end = x[3:] # [3, 4, ..., 9]
one_to_four = x[1:5] # [1, 2, 3, 4]
last_three = x[-3:] # [7, 8, 9]
without_first_and_last = x[1:-1] # [1, 2, ..., 8]
copy_of_x = x[:] # [-1, 1, 2, ..., 9]

You can similarly slice strings and other “sequential” types.

In [37]:
every_third = x[::3] # [-1, 3, 6, 9]
five_to_three = x[5:2:-1] # [5, 4, 3]

In [38]:
1 in [1, 2, 3] # True
0 in [1, 2, 3] # False

False

It is easy to concatenate lists together. If you want to modify a list in place,
you can use extend to add items from another collection:

In [39]:
x = [1, 2, 3]
x.extend([4, 5, 6]) # x is now [1, 2, 3, 4, 5, 6]

If you don’t want to modify x, you can use list addition:


In [40]:
x = [1, 2, 3]
y = x + [4, 5, 6] # y is [1, 2, 3, 4, 5, 6]; x is unchanged

In [41]:
x = [1, 2, 3]
x.append(0) # x is now [1, 2, 3, 0]
y = x[-1] # equals 0
z = len(x) # equals 4

It’s often convenient to unpack lists when you know how many elements
they contain:

In [42]:
x, y = [1, 2] # now x is 1, y is 2

Although you will get a ValueError if you don’t have the same number of
elements on both sides.
A common idiom is to use an underscore for a value you’re going to throw
away:


In [43]:
_,y = [1, 2] # now y == 2, didn't care about the first element

In [44]:
y

2

## Tuples

Tuples are lists’ immutable cousins. Pretty much anything you can do to a
list that doesn’t involve modifying it, you can do to a tuple.

In [46]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4
my_list[1] = 3 # my_list is now [1, 3]

In [47]:
try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")

cannot modify a tuple


Tuples are a convenient way to return multiple values from functions:

In [48]:
def sum_and_product(x, y):
    return (x + y), (x * y)

In [49]:
sp = sum_and_product(2, 3) # sp is (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50

Tuples (and lists) can also be used for multiple assignment:

In [50]:
x, y = 1, 2 # now x is 1, y is 2
x, y = y, x # Pythonic way to swap variables; now x is 2, y is 1

In [51]:
x

2

In [52]:
y

1

## Dictionaries

Another fundamental data structure is a dictionary, which associates values
with keys and allows you to quickly retrieve the value corresponding to a
given key:

In [74]:
empty_dict = {}  #Pythonic
empty_dict2 = dict() #Less Pythonic
grades = {"Joel": 80, "Tim": 95}

In [62]:
joels_grade = grades["Joel"]
joels_grade

80

But you’ll get a KeyError if you ask for a key that’s not in the dictionary:

In [63]:
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("no grade for Kate!")


no grade for Kate!


You can check for the existence of a key using "in":


In [64]:
joel_has_grade = "Joel" in grades # True
kate_has_grade = "Kate" in grades # False

In [65]:
joel_has_grade

True

In [66]:
kate_has_grade

False

Dictionaries have a "get" method that returns a default value (instead of
raising an exception) when you look up a key that’s not in the dictionary:


In [79]:
joels_grade = grades.get("Joel", 0) # equals 80
kates_grade = grades.get("Kate", 0) # equals 0
no_ones_grade = grades.get("No One") # default is None

In [80]:
kates_grade

0

In [81]:
num_students = len(grades) 
num_students

2

In [82]:
grades["Tim"] = 99 # replaces the old value
grades["Kate"] = 100 # adds a third entry
num_students = len(grades) 
num_students

3

In [83]:
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}

Besides looking for specific keys, we can look at all of them:

In [84]:
tweet_keys = tweet.keys() # iterable for the keys
tweet_values = tweet.values() # iterable for the values
tweet_items = tweet.items() # iterable for the (key, value) tuples

In [85]:
tweet_keys

dict_keys(['user', 'text', 'retweet_count', 'hashtags'])

In [86]:
tweet_values

dict_values(['joelgrus', 'Data Science is Awesome', 100, ['#data', '#science', '#datascience', '#awesome', '#yolo']])

In [87]:
tweet_items

dict_items([('user', 'joelgrus'), ('text', 'Data Science is Awesome'), ('retweet_count', 100), ('hashtags', ['#data', '#science', '#datascience', '#awesome', '#yolo'])])

In [None]:
"user" in tweet_keys # True, but not Pythonic
"user" in tweet # Pythonic way of checking for keys
"joelgrus" in tweet_values # True (slow but the only way to check)

In [None]:
Dictionary keys must be “hashable”; in particular, you cannot use lists as
keys. If you need a multipart key, you should probably use a tuple or figure
out a way to turn the key into a string.

### defaultdict

Imagine that you’re trying to count the words in a document. An obvious
approach is to create a dictionary in which the keys are words and the
values are counts. As you check each word, you can increment its count if
it’s already in the dictionary and add it to the dictionary if it’s not:

In [88]:
word_counts = {}
for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1


NameError: name 'document' is not defined

In [None]:
You could also use the “forgiveness is better than permission” approach and
just handle the exception from trying to look up a missing key:

In [90]:
word_counts = {}
for word in document:
    try:
        word_counts[word] += 1
    except KeyError:
        word_counts[word] = 1

NameError: name 'document' is not defined

In [None]:
word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1


In [None]:
Every one of these is slightly unwieldy, which is why defaultdict is
useful. A defaultdict is like a regular dictionary, except that when you try
to look up a key it doesn’t contain, it first adds a value for it using a zero argument function you provided when you created it. In order to use
defaultdicts, you have to import them from collections:

In [None]:
from collections import defaultdict
word_counts = defaultdict(int) # int() produces 0
for word in document:
    word_counts[word] += 1

In [None]:
They can also be useful with list or dict, or even your own functions:


In [None]:
dd_list = defaultdict(list) # list() produces an empty list
dd_list[2].append(1) # now dd_list contains {2: [1]}

dd_dict = defaultdict(dict) # dict() produces an empty dict
dd_dict["Joel"]["City"] = "Seattle" # {"Joel" : {"City": Seattle"}}

dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1 # now dd_pair contains {2: [0, 1]}

In [None]:
These will be useful when we’re using dictionaries to “collect” results by
some key and don’t want to have to check every time to see if the key exists
yet.