# Ch. 2 - Crash Course in Python

## Basics

<ul>
<li>Whitespace	is	ignored	inside	parentheses	and	brackets = helpful	for	long-winded	computations and	for	making	code	easier	to	read</li>
<li>Can use a backslash to indicate statements continue onto next line</li>

In [1]:
list_of_lists = [[1,2,3],[4,5,6],
                [7,8,9]]

two_plus_three = 2 + \
3
print(list_of_lists)
print(two_plus_three)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
5


IPython has a **magic function** `%paste` which allows one to *correctly* paste whatever is on the clipboard, and keep the formatting, in a shell, for example.

## Modules

**Modules** are **imported**, as well as explictly importing specific values/functions to use them without qualifiers.

In [2]:
from collections import defaultdict, Counter

If using Python 2.7 and want to overwrite using **integer division** (i.e. 5/2 = 2), use

In [3]:
from __future__ import division

print(5/2)

# use // to do integer division
print(5//2)

2.5
2


## Functions

These are rules taking in 0+ inputs and returning an output, defined starting with `def`. They are also **first-class** = they can be assigned to variables and passed into other functions.

In [4]:
def double(x):
    """optional docstring to describe
    what function does"""
    return x*2

def apply_to_one(func):
    """calls provided function with an
    argument of an integer = 1"""
    return func(1)

my_double = double
x = apply_to_one(my_double)
print(x)

2


We can also create **lambdas**, which are short, anonymous (non-named) functions, which we *could* assign to a variable, but it'd be better to use `def` instead in such cases.

In [5]:
# lambda argument = *action to do to argument*
another_double = lambda x: 2*x # bad

def another_double_2(x): # good
    return 2*x

In [6]:
# make function with default value
def my_print(message = "default"):
    print(message)
    
my_print("hello")
my_print()

hello
default


In [8]:
# call function while specifying argument by name
def subtract(a=0, b=0):
    #return a-b
    print(a-b)

subtract(10,7)
subtract(5)
subtract(b=5)

3
5
-5


## Strings

Can be delimited by single *or* double quotes (but they must match). 

To encode special characters, use must **escape** them with backslashes, but to use normal backslashes, create a **raw string** via `r"xxxxx"`.

Create multi-line strings with triple-double quotes

In [10]:
# tab as string
tab_as_tab = "\t"
print(len(tab_as_tab))

tab_as_string = r"\t"
print(tab_as_string)

1
\t


## Try and Except

To try and handle an **exception** (something going wrong), **try** to remedy it:

In [11]:
try:
    print(0/0)
except ZeroDivisionError:
    print("can't divide by zero")

can't divide by zero


## Lists

**List** = ordered **collection** (like an array but with added functionality).

In [13]:
int_list = [1,2,3]
heterogenous_list = [1,"string",0.5,True]
list_of_lists = [[1,2,3],[4,5],[6]]

In [15]:
print(len(list_of_lists))
print(sum(int_list))

3
6


In [21]:
# make list of integers from 0-9
x = range(10)

# get 1st and then 2nd element
print(x[0])
print(x[1])

0
1


In [22]:
# get the last and 2nd-to-last elements in the Pythonic way
print(x[-1])
print(x[-2])

9
8


In [27]:
# change 1st element to -1 
# must first convert to list first as range() is a generator in Python3
# no longer returns a list
x = list(x)
x[0] = -1

i = 0
while i < 5:
    print(x[i])
    i = i+1

-1
1
2
3
4


In [32]:
# slice lists with square brackets

# get 1st 3 elements
print(x[:3])
# get from 3rd element to the end
print(x[3:])

[-1, 1, 2]
[3, 4, 5, 6, 7, 8, 9]


In [33]:
# get last 3 elements
print(x[-3:])
# cut off 1st and last
print(x[1:-1])

[7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8]


In [35]:
# make a copy of list
x2 = x[:]
print(x2)

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [37]:
# check for values within a list
print(4 in x)
print(6 in [2,2,2,3])

True
False


The above list **membership** check goes over each element one-by-one, so it shouldn't be used unless the list is small.

In [38]:
# concatenate lists
x1 = [1,2,3]
x1.extend([3,4,5])
print(x1)

[1, 2, 3, 3, 4, 5]


In [43]:
# add to elements of list
y = x + [1,1,1]
print(y)

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 1, 1]


In [44]:
# append items to list one at a time
x.append(15)
print(x)

[-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 15]


It is helpful to **unpack lists** if we are unsure of how many elements it contains

In [45]:
x,y = [1,2]
print(x)
print(y)

1
2


The above returns a `ValueError` if there is not the same number of elements on both sides. If we're going to throw a value away while unpacking, we use an underscore.

In [46]:
_,y,z = [3,2,1] # 3 is nowhere to be found
print(y)
print(z)

2
1


## Tuples

These are **immutable** versions of lists, as anything we can do to lists, we can do to tuples, except change them.

In [47]:
my_list = [1,2]
my_tuple = (1,2)
tupl2 = 3,4
my_list[1] = 3

In [48]:
print(my_list)

[1, 3]


In [50]:
try:
    my_tuple[1] = 3
except TypeError:
    print("can't modify a tuple")

can't modify a tuple


A good use of tuples is returning multiple values from a function.

In [51]:
def sum_and_product(x,y):
    return (x+y),(x*y)

print(sum_and_product(2,3))

(5, 6)


In [52]:
sp = sum_and_product(4,3)
# unpack sp
s,p = sp
print(s)
print(p)

7
12


In [53]:
s,p = sum_and_product(5,10)
print(s)
print(p)

15
50


Can also use tuples and lists for **multiple assignment**.

In [54]:
x,y, = 1,2
print(x)
print(y)
# swap variables in the Pythonic way
x,y = y,x
print(x)
print(y)

1
2
2
1


## Dictionaries

These are containers for **key-value (kv) pairs** from which we can easily obtain values via their keys

In [1]:
empty_dict = {} # = Pythonic manner
empty_dict2 = dict() # = not-so Pythonic

grades = {"Joel":80,"Tim":95} # literal defining of dict

In [2]:
# look up values via a key in square brackets
joels_grade = grades["Joel"]
print(joels_grade)

80


In [3]:
# get KeyError if we ask for a key that doesn't exist in dict
grades["Bob"]

KeyError: 'Bob'

In [4]:
# check for existence of keys via Try/Except
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("No grade for Kate")

No grade for Kate


In [5]:
# check for existence of keys via 'in'
joel_has_grade = "Joel" in grades
rob_has_grade = "Rob" in grades

print(joel_has_grade)
print(rob_has_grade)

True
False


In [6]:
# get default value for a key instead of raising exception via .get()
joels_grade = grades.get("Joel",0)
kates_grade = grades.get("Kate",2)
no_ones_grade = grades.get("NA") # default value of None

print(joels_grade)
print(kates_grade)
print(no_ones_grade)

80
2
None


In [7]:
# assing KV pairs via brackets
grades["Tim"] = 99 # used to be 95
grades["Kate"] = 78 # new KV pair

print(grades)

{'Joel': 80, 'Tim': 99, 'Kate': 78}


Dictionaries are frequently used to display structured data in a simple manner.

In [8]:
tweet = {
    "user":"joel_grus"
    ,"text": "Hello world"
    ,"retweet_count":108
    ,"hashtags":["data","science","hello","world"] # list
}

print(tweet)

{'user': 'joel_grus', 'text': 'Hello world', 'retweet_count': 108, 'hashtags': ['data', 'science', 'hello', 'world']}


In [10]:
# look for ALL keys
tweet_keys = tweet.keys()

# get ALL values
tweet_vals = tweet.values()

# get all KV pairs (tuples w/in dict)
tweet_tuples = tweet.items()

print(tweet_keys)
print(tweet_vals)
print(tweet_tuples)

dict_keys(['user', 'text', 'retweet_count', 'hashtags'])
dict_values(['joel_grus', 'Hello world', 108, ['data', 'science', 'hello', 'world']])
dict_items([('user', 'joel_grus'), ('text', 'Hello world'), ('retweet_count', 108), ('hashtags', ['data', 'science', 'hello', 'world'])])


In [11]:
print("joel_grus" in tweet_keys)
print("joel_grus" in tweet_vals)

False
True


Dictionary keys *must* be immutable, so we **cannot use lists as keys**. 

To create a multi-part key, we must use a tuple or we must figure out a way to turn the key into a string.

## defaultdict

This is like a regular dictionary, but its imported from `collections` and when we attempt to lookup a key that doesn't exist, it adds the key to the dictionary with a value using a zero-argument function provided when the dictionary was created.

In [13]:
# try to count words in a doc in a dict where keys = words
# and vals = counts

# increment counts by 1 each time a word is checked and its in the
# dict, and add it to the dict if its not
word_counts = {};
doc = "Hello, world, hi"
for word in doc:
    if word in word_counts:
        word_counts[word] += 1 # increment word count if present
    else:
        word_counts[word] = 1 # add word if not present

In [14]:
# Try "forgiveness is better than permission" = handle the exception
# from trying to loop up a missing/invalid key
word_counts = {};
doc = "Hello, world, hi"
for word in doc:
    try:
        word_counts[word] += 1 # increment word count if present
    except KeyError:
        word_counts[word] = 1 # add word if not present

In [27]:
# another approach = use .get() with a default value for missing keys
word_counts = {};
doc = ["Hello", "world", "hi"]
for word in doc:
    previous_count = word_counts.get(word,1)
    word_counts[word] = previous_count + 1

In [16]:
# use defaultdict (better)
from collections import defaultdict

word_counts = defaultdict(int); # (int)= int() = produces a value of 0
doc = ["Hello", "world", "hi"]

for word in doc:
    word_counts[word] += 1 # increment word count if present

`defaultdict`'s are also useful with `list`'s or `dict`'s, or user-defined functions.

In [17]:
dd_list = defaultdict(list) # (list) = list() =  produces empty list

# dd.list[key].append(value)
dd_list[2].append(1) 

print(dd_list)

defaultdict(<class 'list'>, {2: [1]})


In [18]:
dd_dict = defaultdict(dict) # = dict() = produces empty dict

# dd_dict[Key][InnerKey] = Value
dd_dict["Joel"]["City"] = "Seattle" # uses dictionary as a value

print(dd_dict)

defaultdict(<class 'dict'>, {'Joel': {'City': 'Seattle'}})


In [22]:
# use lambda (anonymous) function to set 1st keys value = [0,0]
dd_pair = defaultdict(lambda: [0,0])
print(dict(dd_pair))

{}


In [23]:
# for key = 2, make the 2nd value for this key = 1
dd_pair[2][1] = 1
print(dict(dd_pair))

{2: [0, 1]}


These are also useful for collecting results by some key and we don't want to check if a key exists yet every time

## Counter

These turn sequences of values into a `defaultdict(int)`-like object, mapping keys to counts. It's primarily used to create histograms.

In [25]:
from collections import Counter
c = Counter([0,1,2,3,0])
print(c)

Counter({0: 2, 1: 1, 2: 1, 3: 1})


This provides a simple solution to the `word_counts` problem

In [33]:
doc = ["Hello", "world", "hi","hi","hi","hi","hi","but","flyers",
      "why","suck","pizza"]

word_counts = Counter(doc)
print(word_counts)

Counter({'hi': 5, 'Hello': 1, 'world': 1, 'but': 1, 'flyers': 1, 'why': 1, 'suck': 1, 'pizza': 1})


In [34]:
# get most common word
for word,count in word_counts.most_common(10):
    print(word,count)

hi 5
Hello 1
world 1
but 1
flyers 1
why 1
suck 1
pizza 1


In [35]:
word_counts.most_common(10)

[('hi', 5),
 ('Hello', 1),
 ('world', 1),
 ('but', 1),
 ('flyers', 1),
 ('why', 1),
 ('suck', 1),
 ('pizza', 1)]

## Sets

These are a collection of *distint* elements.

In [36]:
s = set()
s.add(1)
s.add(2)
s.add(2) # does not change the set as we already have a '2'
print(s)

{1, 2}


In [37]:
x = len(s)
y = 2 in s
z = 3 in s
print(x,y,z)

2 True False


Sets are useful because `in` works very fast on sets, and to find distinct elements in a collection (but used less than `dict`'s and `list`'s

If we have a large collection of items we want to use for a **membership test**, sets are more appropriate than lists.

In [41]:
stopwords_list = ["a","an","at"] + ["yet","you"];

print("zip" in stopwords_list) # must check every element in list

stopwords_set = set(stopwords_list)
print("zip" in stopwords_set) # faster

False
False


In [43]:
# find distinct items
item_list = [1,2,3,1,2,3]
num_items = len(item_list)

# get distinct items
item_set = set(item_list)
num_distinct_items = len(item_set)

# convert back to list
distinct_item_list = list(item_set)

print(item_list)
print(num_items)
print(item_set)
print(num_distinct_items)
print(distinct_item_list)

[1, 2, 3, 1, 2, 3]
6
{1, 2, 3}
3
[1, 2, 3]


## Control Flow

Perform an action **conditionally** using `if`

In [45]:
if 1 > 3:
    message = "if only 1 were greater than 3..."
elif 1 > 4:
    message = "nope"
else:
    message = "when all else fails (if desired)"
    
print(message)

when all else fails (if desired)


Can also write **ternary** if-then-else statements on one line

In [46]:
parity = "even" if x % 2 == 0 else "odd"
print(parity)

even


In [47]:
# WHILE loops
x = 0
while x < 10:
    print(x,"is less than 10")
    x += 1

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


In [48]:
# more common to use `for` and `in` than a `while`
for x in range(10):
    print(x,"is less than 10")

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


In [49]:
# more complex logic = use `continue` and `break`
for x in range(10):
    if x == 3:
        continue # go to next iteration (increase x)
    if x == 5:
        break # quit loop
    print(x)

0
1
2
4


## Truthiness

**Booleans** work like in other languages, but capitalized

In [50]:
print(1 < 2)
print(True == False)

True
False


Python uses `None` for nonexistent values (such as `Null`, `Nan` or `NA`'s)

In [51]:
x = None
print(x == None)
print(x is None)

True
True


We can use any value where Python expects a Boolean

In [53]:
# all False
falsy = [False, None, [], {}, "", set(), 0, 0.0]

Anythin else is treated at `True`. This makes it easy to use `if` statements to test for empty lists/strings/dicts, etc. But it can also cause bugs if we're not expecting this behavior

In [57]:
def function_returning_a_string():
    return "it's a string, baby"

s = function_returning_a_string()
if s:
    first_char = s[0]
else:
    first_char = ""
    
first_char

'i'

In [62]:
# simpler way of doing the above
# return 2nd arg when 1st arg is "Truthy", return 1st arg if not
first_char

'i'

In [63]:
# if x is either a number or possibly None
safe_x = x or 0 # returns 2nd arg since s = None
safe_x

0

`all()` takes a list + returns `True` precisely when *every* element is "Truthy"

`any()` returns `True` if *at least one* element is Truthy

In [66]:
print(all([True,1,{3}]))
print(all([True,1,{}]))
print(any([True,1,{}]))
print(all([]))
print(any([]))

True
False
True
True
False
