# A Crash Course in Python

## The Basics

### Whitespace Formatting
Many languages use curly braces to delimit blocks of code. Python uses **indentation**:

In [None]:
for i in [1, 2, 3, 4, 5]:
    print(i)
    for j in [1, 2, 3, 4, 5]:
        print(j)
        print(i + j)
    print(i)
print("done looping")

Whitespace is **ignored** inside parentheses and brackets

In [None]:
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +
                           13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)

for making code easier to read

In [None]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
easier_to_read_list_of_lists = [ [1, 2, 3],
                                 [4, 5, 6],
                                 [7, 8, 9] ]

Use a **backslash** to indicate that a statement continues onto the next line

In [None]:
two_plus_three = 2 + \
                 3

### Modules

**Import** the modules that contain features
- import regular expression module: re is the module containing functions and constants for working with regular
expressions.

In [None]:
import re
my_regex = re.compile("[0-9]+", re.I)

You may use an **alias**

In [None]:
import re as regex
my_regex = regex.compile("[0-9]+", regex.I)

In [None]:
import matplotlib.pyplot as plt

You can import them explicitly and use them **without qualification**

In [None]:
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()

You could import the entire contents of a module into your
namespace, which might inadvertently overwrite variables you’ve already defined:

In [None]:
match = 10
from re import *     # uh oh, re has a match function
print(match)          # "<function re.match>"

### Arithmetic

Remember **quotient-remainder theorem**

$$n = d \cdot q + r$$

- $d$ is a divisor, $q$ is a quotient, $r$ is a remainder,
- $0 \leq r \lt q$ when $q$ is positive and $q \lt r \leq 0$ when $q$ is negative

In [None]:
print(2 ** 10)       # 1024
print(2 ** 0.5)      # 1.414...
print(2 ** -0.5)     # 
print(5 / 2)         # 2.5
print(5 % 3)         # 2
print(5 // 3)        # 1
print((-5) % 3)      # 1
print((-5) // 3)     # -2
print(5 % (-3))      # -1
print((-5) // (-3))  # 1
print((-5) % (-3))   # -2
print(7.2 // 3.5)    # 2.0
print(7.2 % 3.5)     # 0.2

### Functions

- A function is a rule for taking zero or more inputs and returning a corresponding output

In [1]:
# This
# is 
# a
# comment
# CTRL + / toggles comment/uncomment


# for PEP on docstring, refer to https://www.python.org/dev/peps/pep-0257/#abstract
def double(x):
    """this is where you put an optional docstring
    that explains what the function does.
    for example, this function multiplies its input by 2"""
    return x * 2

double(2)

4

In [None]:
help(double)

Python functions are **first-class**, which means that we can assign them to variables and
pass them into functions just like any other arguments:

In [None]:
def apply_to_one(f):
    """calls the function f with 1 as its argument"""
    return f(1)

my_double = double
x = apply_to_one(my_double)

print(x)

**Lambda function**: short anonymous functions

In [None]:
y = apply_to_one(lambda x: x + 4)

print(y)

In [None]:
another_double = lambda x: 2 * x
def another_double(x): return 2 * x   # more readable

In [None]:
add = lambda x, y : x + y
add(1,2)

Function parameters can also be given **default arguments**

In [None]:
def my_print(message="my default message"):
    print(message)
    
my_print("hello")  # prints 'hello'
my_print()         # prints 'my default message'

In [None]:
def subtract(a=0, b=0):
    return a - b

subtract(10, 5) # returns 5
subtract(0, 5)  # returns -5
subtract(b=5)   # same as previous
subtract(b=5, a=20)

### Strings

- Strings can be delimited by single or double quotation marks

In [None]:
single_quoted_string = 'data science'
double_quoted_string = "data science"

In [None]:
tab_string = "\t"   # represents the tab character
len(tab_string)     # is 1

multiline strings using triple-double-quotes

In [None]:
multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""

In [None]:
multi_line_string

### Exceptions

- When something goes wrong, Python raises an exception.

In [None]:
try:
    print(0 / 0)
except ZeroDivisionError:
    print("cannot divide by zero")

## Lists

- the most fundamental data structure in Python

In [None]:
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [ integer_list, heterogeneous_list, [] ]

list_length = len(integer_list)    # equals 3
list_sum    = sum(integer_list)    # equals 6

You can get or set the nth element of a list with square brackets

In [None]:
x = list(range(10))      # is the list [0, 1, ..., 9]
zero = x[0]        # equals 0, lists are 0-indexed
one = x[1]         # equals 1
nine = x[-1]       # equals 9, 'Pythonic' for last   element
eight = x[-2]      # equals 8, 'Pythonic' for next-to-last element
x[0] = -1          # now x is [-1, 1, 2, 3, ..., 9]

You can also use square brackets to “slice” lists:

In [None]:
first_three = x[:3]               # [-1, 1, 2]
three_to_end = x[3:]              # [3, 4, ..., 9]
one_to_four = x[1:5]              # [1, 2, 3, 4]
last_three = x[-3:]               # [7, 8, 9]
without_first_and_last = x[1:-1]  # [1, 2, ..., 8]
copy_of_x = x[:]                  # [-1, 1, 2, ..., 9]


In [None]:
print(x[::2])
print(x[::-1])

**in** operator to check for list membership

In [None]:
1 in [1, 2, 3]  # True
0 in [1, 2, 3]  # False

To concatenate lists together:

In [None]:
x = [1, 2, 3]
x.extend([4, 5, 6])  # x is now [1,2,3,4,5,6]

In [None]:
x = [1, 2, 3]
y = x + [4, 5, 6]  # y is [1, 2, 3, 4, 5, 6]; x is unchanged

To append to lists one item at a time:

In [None]:
x = [1, 2, 3]
x.append(0)  # x is now [1, 2, 3, 0]
y = x[-1]    # equals 0
z = len(x)   # equals 4


It is convenient to unpack lists:

In [None]:
x, y = [1, 2]  # now x is 1, y is 2

In [None]:
_, y = [1, 2] # now y == 2, didn't care about the first element

### Tuples

- Tuples are lists’ **immutable** cousins.

In [None]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4
my_list[1] = 3   # my_list is now [1, 3]

try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")


Tuples are a convenient way to **return multiple values** from functions:

In [None]:
def sum_and_product(x, y):
    return (x + y),(x * y)

sp = sum_and_product(2, 3)    # equals (5, 6)
s, p = sum_and_product(5, 10) # s is 15, p is 50

Tuples (and lists) can also be used for **multiple assignment**:

In [None]:
x, y = 1, 2   # now x is 1, y is 2
x, y = y, x   # Pythonic way to swap variables; now x is 2, y is 1


Python tuple

https://wiki.python.org/moin/TupleSyntax

Python tuple is defined by trailing comma, not by parenthesis
1,
1,2,
1,2,3,
parenthesis is optional

### Dictionaries

- Another fundamental data structure which associates **values with keys**
- It allows you to quickly retrieve the value corresponding to a given key:

In [None]:
empty_dict = {}                      # Pythonic
empty_dict2 = dict()                 # less Pythonic
grades = { "Joel" : 80, "Tim" : 95 } # dictionary literal

You can look up the value for a **key using square brackets**:

In [None]:
joels_grade = grades["Joel"]         # equals 80

In [None]:
grades = { "Joel" : 80, "Tim" : 95, "Tim" : 94 } # dictionary literal
grades

In [None]:
len(grades)

In [None]:
grades.keys()

In [None]:
grades.values()

In [None]:
grades["Tim"]

**KeyError** if you ask for a key that’s not in the dictionary:

In [None]:
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("no grade for Kate!")

You can **check for the existence of a key** using in :

In [None]:
joel_has_grade = "Joel" in grades    # True
kate_has_grade = "Kate" in grades    # False

Dictionaries have a get method that returns a default value (**instead of raising an
exception**) when you look up a key that’s not in the dictionary:

In [None]:
joels_grade = grades.get("Joel", 0)    # equals 80
kates_grade = grades.get("Kate", 0)    # equals 0
no_ones_grade = grades.get("No One")   # default default is None
no_ones_grade == None

You assign key-value pairs using the same square brackets:

In [None]:
grades["Tim"] = 99          # replaces the old value
grades["Kate"] = 100        # adds a third entry
num_students = len(grades)  # equals 3

We will frequently use dictionaries as a simple way to represent **structured data**:

In [None]:
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}

**Iteration**: we can look at all of them

In [None]:
tweet_keys   = tweet.keys()     # list of keys
tweet_values = tweet.values()   # list of values
tweet_items  = tweet.items()    # list of (key, value) tuples

"user" in tweet_keys            # True, but uses a slow list in
"user" in tweet                 # more Pythonic, uses faster dict in

"joelgrus" in tweet_values      # True

In [None]:
tweet_values

**WordCount Example**: Create a dictionary in which the keys are words and the values are counts.

In [None]:
document = ['I', 'am', 'a', 'boy', 'I', 'love', 'you']

**First Approach:**

In [None]:
word_counts = {}
for word in document:
    if word in word_counts:
        word_counts[word] += 1
    else:
        word_counts[word] = 1

**Second Approach:**

In [None]:
word_counts = {}
for word in document:
    try:
        word_counts[word] += 1
    except KeyError:
        word_counts[word] = 1


**Third Approach:**

In [None]:
word_counts = {}
for word in document:
    previous_count = word_counts.get(word, 0)
    word_counts[word] = previous_count + 1
    
word_counts

In [None]:
word_counts

### defaultdict

- A defaultdict is like a regular dictionary, except that when you try to look up a key it doesn’t contain, it first adds a value for it using a **zero-argument function** you provided when you created it.

In [None]:
from collections import defaultdict

word_counts = defaultdict(int)    # int() produces 0
# word_counts = defaultdict(lambda: 100)    # returns 100
for word in document:
    word_counts[word] += 1

print(word_counts)

In [None]:
int()

In [None]:
dd_list = defaultdict(list)          # list() produces an empty list
dd_list[2].append(1)                 # now dd_list contains {2: [1]}

dd_dict = defaultdict(dict)          # dict() produces an empty dict
dd_dict["Joel"]["City"] = "Seattle"  # { "Joel" : { "City" : Seattle"}}
dd_pair = defaultdict(lambda: [0, 0])
dd_pair[2][1] = 1                    # now dd_pair contains {2: [0,1]}

### Counter

- A Counter turns a sequence of values into a defaultdict(int)-like object mapping keys to counts.
- We will primarily use it to create **histograms**

In [None]:
from collections import Counter
c = Counter([0, 1, 2, 0])  # c is (basically) { 0 : 2, 1 : 1, 2 : 1 }

In [None]:
word_counts = Counter(document) 
word_counts

Use **help** function to see a man page

In [None]:
help(word_counts)

In [None]:
# print the 10 most common words and their counts
for word, count in word_counts.most_common(10):
    print(word, count)


### Sets

- Another data structure is **set**, which represents a collection of **distinct** elements:

In [None]:
s = set()
s.add(1)      # s is now { 1 }
s.add(2)      # s is now { 1, 2 }
s.add(2)      # s is still { 1, 2 }
x = len(s)    # equals 2
y = 2 in s    # equals True
z = 3 in s    # equals False

- For a membership test, a set is more appropriate than a list
- **in** is a very fast operation on sets.

In [None]:
hundreds_of_other_words = []
stopwords_list = ["a","an","at"] + hundreds_of_other_words + ["yet", "you"]

"zip" in stopwords_list  # False, but have to check every element

stopwords_set = set(stopwords_list)
"zip" in stopwords_set  # very fast to check; use hashing to find a member

To find the **distinct** items in a collection:

In [None]:
item_list = [1, 2, 3, 1, 2, 3]
num_items = len(item_list)          # 6
item_set = set(item_list)           # {1, 2, 3}
num_distinct_items = len(item_set)  # 3
distinct_item_list = list(item_set) # [1, 2, 3]

### Control Flow

**if** statement:

In [None]:
if 1 > 2:
    message = "if only 1 were greater than two..."
elif 1 > 3:
    message = "elif stands for 'else if'"
else:
    message = "when all else fails use else (if you want to)"

a **ternary** if-then-else on one line

In [None]:
parity = "even" if x % 2 == 0 else "odd"

**while** statement:

In [None]:
x = 0
while x < 10:
    print(x, "is less than 10")
    x += 1

**for** statement

In [None]:
for x in range(10):
    print(x, "is less than 10")

continue and break statement:

In [None]:
for x in range(10):
    if x == 3:
        continue   # go immediately to the next iteration
    if x == 5:
        break      # quit the loop entirely
    print(x)

### Truthiness

In [None]:
one_is_less_than_two = 1 < 2       # equals True
true_equals_false = True == False  # equals False

Python uses the value **None** to indicate a nonexistent value

In [None]:
x = None
print(x == None)   # prints True, but is not Pythonic
print(x is None)   # prints True, and is Pythonic

The following are all “Falsy”:

```python
False
None
[] : (an empty list)
{} : (an empty dict)
""
set()
0
0.0
```

In [None]:
s = 'abc'
if s:
    first_char = s[0]
else:
    first_char = ""


In [None]:
first_char = s and s[0]   # A simpler way of doing the same

In [None]:
x = None
safe_x = x or 0    # if x is either a number or possibly None
safe_x

- Python has an **all** function, which takes a list and returns True precisely when every element is truthy, and 
- an **any** function, which returns True when at least one element is truthy:

In [None]:
all([True, 1, { 3 }])    # True
all([True, 1, {}])       # False, {} is falsy
any([True, 1, {}])       # True, True is truthy
all([])                  # True, no falsy elements in the list
any([])                  # False, no truthy elements in the list

In [None]:
all([] + [True, True]) == all([]) and all([True, True])

In [None]:
any([] + [True, False]) == any([]) or any([True, False])

## The Not-So-Basics

### Sorting

In [None]:
x = [4,1,2,3]
y = sorted(x)    # is [1,2,3,4], x is unchanged
x.sort()         # now x is [1,2,3,4]

In [None]:
# sort the list by absolute value from largest to smallest
x = sorted([-4,1,-2,3], key=abs, reverse=True) # is [-4,3,-2,1]
# sort the words and counts from highest count to lowest
wc = sorted(word_counts.items(),
            key=lambda x: x[1],
            reverse=True)

wc

### List Comprehensions

- you’ll want to transform a list into another list, by choosing only certain elements, or by transforming elements, or both. The Pythonic way of doing this is list comprehensions:
- Always use list comprehension if possible.

In [None]:
even_numbers = [x for x in range(5) if x % 2 == 0]  # [0, 2, 4]
squares      = [x * x for x in range(5)]            # [0, 1, 4, 9, 16]
even_squares = [x * x for x in even_numbers]        # [0, 4, 16]

You can similarly turn lists into dictionaries or sets:

In [None]:
square_dict = { x : x * x for x in range(5) }  # { 0:0, 1:1, 2:4, 3:9, 4:16 }
square_set = { x * x for x in [1, -1] }        # { 1 }       

- It’s conventional to use an underscore as the variable:

In [None]:
zeroes = [0 for _ in even_numbers]    # has the same length as even_numbers

A list comprehension can include multiple **for**s:

In [None]:
pairs = [(x, y)
         for x in range(10)
         for y in range(10)] # 100 pairs (0,0) (0,1) ... (9,8), (9,9)

Easy to computer distance matrix

later **for**s can use the results of earlier ones:

In [None]:
increasing_pairs = [(x, y)
                    for x in range(10)
                    for y in range(x + 1, 10)]

### Generators and Iterators

- A problem with lists is that they can easily grow very big. range(1000000) creates an actual list of 1 million elements. If you only need to deal with them one at a time, this can be a huge source of inefficiency (or of running out of memory). If you potentially only need the **first few** values, then calculating them all is a waste.
- A generator is something that you can iterate over (for us, usually using for ) but whose values are produced only as needed (lazily).
- One way to create generators is with functions and the **yield** operator:

In [None]:
def lazy_range(n):
    """a lazy version of range"""
    i = 0
    while i < n:
        yield i
        i += 1

In [None]:
# The following loop will consume the yield ed values one at a time until none are left:
for i in lazy_range(10):
    print(i)

In [None]:
# The following loop will consume the yield ed values one at a time until none are left:
for i in lazy_range(10000):
    if i == 3: break
    print(i)

In [None]:
t = lazy_range(3)
next(t)
next(t)
next(t)
#next(t)

In [None]:
def lazy_inf_range():
    i = 0
    while True:
        yield i
        i += 1
         
t = lazy_inf_range()
next(t)
next(t)
next(t)

A second way to create generators is by using for comprehensions wrapped in parentheses:

In [None]:
lazy_evens_below_20 = (i for i in lazy_range(20) if i % 2 == 0)

In [None]:
lazy_evens_below_20

### Randomness

- To generate random numbers, we can do with the random module
- random.random() produces numbers uniformly between 0 and 1

In [None]:
import random

four_uniform_randoms = [random.random() for _ in range(4)]
four_uniform_randoms

if you want to get reproducible results:

In [None]:
random.seed(10)
print(random.random())
random.seed(10)
print(random.random())

random.randrange takes either 1 or 2 arguments and returns
an element chosen randomly from the corresponding range()

In [None]:
random.randrange(10)      # choose randomly from range(10) = [0, 1, ..., 9]
random.randrange(3, 6)    # choose randomly from range(3, 6) = [3, 4, 5]

random.shuffle randomly reorders the elements of a list:

In [None]:
up_to_ten = list(range(10))
random.shuffle(up_to_ten)
print(up_to_ten)       

To randomly pick one element from a list:

In [None]:
my_best_friend = random.choice(["Alice", "Bob", "Charlie"])

To randomly choose a sample of elements without replacement (i.e., with
no duplicates)

In [None]:
lottery_numbers = range(60)
winning_numbers = random.sample(lottery_numbers, 6)

To choose a sample of elements with replacement (i.e., allowing duplicates)

In [None]:
four_with_replacement = [random.choice(range(10)) for _ in range(4)]
four_with_replacement

### Regular Expressions

- Regular expressions provide a way of searching text.
- They are incredibly useful but also fairly complicated, so much so that there are entire books written about them.

In [None]:
import re
print(all([
    not re.match("a", "cat"),
    re.search("a", "cat"),
    not re.search("c", "dog"),
    3 == len(re.split("[ab]", "carbs")),
    "R-D-" == re.sub("[0-9]", "-", "R2D2")
    ])) # prints True

### Object-Oriented Programming

In [None]:
# by convention, we give classes PascalCase names
class Set:
    # these are the member functions
    # every one takes a first parameter "self" (another convention)
    # that refers to the particular Set object being used
    
    def __init__(self, values=None):
        """This is the constructor.
        It gets called when you create a new Set.
        You would use it like
        s1 = Set()          # empty set
        s2 = Set([1,2,2,3]) # initialize with values"""
        
        self.dict = {}  # each instance of Set has its own dict property
                        # which is what we'll use to track memberships
        if values is not None:
            for value in values:
                self.add(value)

    def __repr__(self):
        """this is the string representation of a Set object
        if you type it at the Python prompt or pass it to str()"""
        return "Set: " + str(self.dict.keys())

    # we'll represent membership by being a key in self.dict with value True
    def add(self, value):
        self.dict[value] = True

    # value is in the Set if it's a key in the dictionary
    def contains(self, value):
        return value in self.dict

    def remove(self, value):
        del self.dict[value]

In [None]:
s = Set([1,2,3])
s.add(4)
print(s.contains(4))    # True
s.remove(3)
print(s.contains(3))    # False

### Functional Tools

- When passing functions around, sometimes we’ll want to **partially apply (or curry)** functions to create new functions.

In [None]:
def exp(base, power):
    return base ** power

def two_to_the(power):
    return exp(2, power)

In [None]:
two_to_the(3)

A different approach is to use functools.partial :

In [None]:
from functools import partial

two_to_the = partial(exp, 2)     # is now a function of one variable
print(two_to_the(3))             # 8

In [None]:
square_of = partial(exp, power=2)
print(square_of(3))    # 9

We will also occasionally use **map, reduce, and filter**, which provide functional alternatives to list comprehensions:

- Always use map, reduce, and filter if possible

#### Map

In [None]:
def double(x):
    return 2 * x

xs = [1, 2, 3, 4]
twice_xs = [double(x) for x in xs]
twice_xs = map(double, xs)

list_doubler = partial(map, double)
twice_xs = list_doubler(xs)

In [None]:
def multiply(x, y): return x * y

products = map(multiply, [1, 2], [4, 5])    # [1 * 4, 2 * 5] = [4, 10]
list(products)

In [None]:
def multiply(x, y, z): return x * y * z

products = map(multiply, [1, 2], [4, 5], [10, 20])    # [1 * 4 * 10, 2 * 5 * 20]
list(products)

#### Filter

In [None]:
def is_even(x):
    """True if x is even, False if x is odd"""
    return x % 2 == 0

x_evens = [x for x in xs if is_even(x)]
x_evens = filter(is_even, xs)
print(list(x_evens))
list_evener = partial(filter, is_even)
x_evens = list_evener(xs)
print(list(x_evens))

#### Reduce

In [None]:
from functools import reduce

def multiply(x, y): return x * y

xs = [1,2,3]
x_product = reduce(multiply, xs)
print(x_product)
list_product = partial(reduce, multiply)
x_product = list_product(xs)
print(x_product)

### enumerate 

- To iterate over a list and use both its elements and their indexes:

In [None]:
documents = ["I", "am", "a", "boy"]
# not Pythonic
for i in range(len(documents)):
    document = documents[i]
    print(i, document)
    
# also not Pythonic
i = 0
for document in documents:
    print(i, document)
    i += 1

The Pythonic solution is enumerate , which produces tuples (index, element) :

In [None]:
for i, document in enumerate(documents):
    print(i, document)

In [None]:
for i in range(len(documents)): print(i)    # not Pythonic
    
for i, _ in enumerate(documents): print(i)  # Pythonic

### zip and unzip

- To zip two or more lists together. 
- zip transforms multiple lists into a single list of tuples of corresponding elements:

In [None]:
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]
list(zip(list1, list2))        # is [('a', 1), ('b', 2), ('c', 3)]

You can also “unzip” a list using a strange trick:

In [None]:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)
print(letters, numbers)

In [None]:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(('a', 1), ('b', 2), ('c', 3))
print(letters, numbers)