##### Python Crash Course Relevant Examples

##### Dictionaries

In [7]:
#allows retrieval of data in key value pairs
grades = {"Joel":80, "Tim":95}
grades["Joel"]

#you get KeyError if you ask for a key not in the dictionary
try:
    kates_grade = grades["kate"]
except KeyError:
    print('No kate in gradebook')
    
#can check for existance of a key as a True False
Joel_grade = "Joel" in grades
Kate_grade = "Kate" in grades
print(Joel_grade, Kate_grade)

#get method can give a default value when a key isn't present
kate_grade = grades.get("Kate", 0)
print(kate_grade)

#dictionaries are greate for structured data
#eg
tweet = {
        "user" : "joelgrus",
        "text" : "Data Science is Awesome",
        "retweet_count" : 100,
        "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
         }

#How to iterate
for key, value in tweet.items():
    print(key)
    print(value)

No kate in gradebook
True False
0
user
joelgrus
text
Data Science is Awesome
retweet_count
100
hashtags
['#data', '#science', '#datascience', '#awesome', '#yolo']


##### DefaultDict

It's like a regular dictionary but if you try to look up a key not contained in the dictionary, it adds a value for it using a zero arguement function provided in it's creation.

In [8]:
#eg
from collections import defaultdict

document = ['potato', 'tomato', 'potato']

word_counts = defaultdict(int) #integer 0 for uncontained
for word in document:
    word_counts[word] += 1
print(word_counts)

#other examples that will be useful
dd_list = defaultdict(list) #list creates an empty list
dd_list[2].append(1) # now dd_list contains {2: [1]}

dd_dict = defaultdict(dict) # dict() produces an empty dict
dd_dict["Joel"]["City"] = "Seattle" # {"Joel" : {"City": Seattle"}}

dd_pair = defaultdict(lambda: [0, 0])
print(dd_pair)
dd_pair[2][1] = 1 # now dd_pair contains {2: [0, 1]}
dd_pair

defaultdict(<class 'int'>, {'potato': 2, 'tomato': 1})
defaultdict(<function <lambda> at 0x00000265C83C07B8>, {})


defaultdict(<function __main__.<lambda>()>, {2: [0, 1]})

###### Counters

Turns a sequence of values into a defaultdict(int) like object mapping keys to counts

In [9]:
from collections import Counter
c = Counter([0,1,2,0])
print(c)

#with this we can word count easily
word_counts = Counter(document)
print(word_counts)

#Most common method
for word,count in word_counts.most_common(10): #selects 10 most common words
    print(word,count)

Counter({0: 2, 1: 1, 2: 1})
Counter({'potato': 2, 'tomato': 1})
potato 2
tomato 1


###### Sets

Another useful data structure, a collection of distinct elements. They are useful for two main reasons -in is a very fast operation on sets, much faster than lists. Also, for finding distinct items in a collection

In [10]:
primes_below_ten = {2,3,5,7}

#to make an empty set
s = set()

#In statement
s.add(2)
y = 2 in s
z = 3 in s
print(y,z)

#distinct items
item_list = [1, 2, 3, 1, 2, 3]
num_items = len(item_list) # 6
item_set = set(item_list) # {1, 2, 3}
num_distinct_items = len(item_set) # 3
distinct_item_list = list(item_set) # [1, 2, 3]

True False


##### Multiple List Comprehensions

In [11]:
#eg 1
pairs = [(x,y) for x in range(10)
        for y in range(x+1,10)]
print(pairs)

#eg 2
increasing_pairs = [(x,y)
                   for x in range(10) #for each x
                   for y in range(x+1, 10)] #compute y 

[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 6), (5, 7), (5, 8), (5, 9), (6, 7), (6, 8), (6, 9), (7, 8), (7, 9), (8, 9)]


##### Automated Testing and Assert

Automated tests is a way to check if code is correct

In [12]:
#Simple example
# assert 1+1==3, "1+1 != 3"

#more useful
def smallest_item(x):
    return min(x)

assert smallest_item([10,20,6,40]) == 5

AssertionError: 

##### Object-Oriented Programming - Code isn't running - seek another resource

Python allows you to define classes that encapsulate data and functions that operate on them. Can help make code cleaner and simpler. We'll explain them through example of constructing a counter clicker to implement a count, read it, and reset it to zero. We begin by 

In [None]:
class CountingClicker:
    """A class can and should have a docstring like a function"""


A class contains zero or more member functions. These take a first parameter: self. Self refers to the particular class instance. A class normally has a constructor called __init__. It takes in whatever parameters you need to construct an instance of your class and does the setup.

In [None]:
def __init__(self, count = 0):
    self.count = count

When we construct instances of the class we use the class name

In [None]:
clicker1 = CountingClicker()
clicker2 = CountingClicker(count=100)

##### Iterables and Generators

Often need to iteratle over the collection of data using generators (create  the next index on demand). We can create generators using the yield operator.

In [14]:
def generate_range(n):
    i=0
    while i<n:
        yield i #yield operator produces a value of the generator
        i += 1

#using it
for i in generate_range(10):
    print(f"i:{i}")

i:0
i:1
i:2
i:3
i:4
i:5
i:6
i:7
i:8
i:9


Generators can be made by using for comprehensions in (). This is useful as python doesn't do any work until you iterate over it using for or next. This makes it useful for data processing pipelines.

Tip: The flip side of laziness is that you can only iterate through a generator once. If you need to iterate through something multiple times, youâ€™ll need to either re-create the generator each time or use a list. If generating the values is expensive, that might be a good reason to use a list instead.


In [28]:
evens_below_20 = (i for i in generate_range(20) if i%2==0)

#pipeline
import numpy as np
data = np.arange(0,100)
evens = (x for x in data if x % 2==0)
even_squares = (x**2 for x in evens)
even_squares_ending_in_six = (x for x in even_squares if x % 10 == 6)

#my attempt
even = (x**2 for x in data if x % 2==0)
even2 = (x for x in even if x % 10 == 6)

#cant print even2 because its a generator object
#we have to iterate over it
for i in even_squares_ending_in_six:
    print(f"i:{i}")
    
for j in even2:
    print(f"j:{j}")

i:16
i:36
i:196
i:256
i:576
i:676
i:1156
i:1296
i:1936
i:2116
i:2916
i:3136
i:4096
i:4356
i:5476
i:5776
i:7056
i:7396
i:8836
i:9216
j:16
j:36
j:196
j:256
j:576
j:676
j:1156
j:1296
j:1936
j:2116
j:2916
j:3136
j:4096
j:4356
j:5476
j:5776
j:7056
j:7396
j:8836
j:9216


When iterating over a generator we sometimes want the values and their indices - this can be accomplished with the enumerate function which turns values into pairs (index, value)

In [29]:
names = ['Alice','Bob',"Charles","Bethany"]

for i, name in enumerate(names):
    print(f"name {i} is {name}")

name 0 is Alice
name 1 is Bob
name 2 is Charles
name 3 is Bethany


##### Regular Expressions

Provide a way of searching text. They are useful but complicated, here's an example of using them in the text.

In [35]:
import re

re_examples = [ #All of these are true because
    not re.match("a", "cat"), #'cat' doesn't start with an 'a'
    re.search("a", "cat"), #cat has an a in it
    not re.search("c", "dog"), #dog doesn't have c
    3 == len(re.split("[ab]", "carbs")), #split on a or b to ["c","r","s"]
    "R-D-" == re.sub("[0-9]", "-", "R2D2") #replace digits with dashes
]

assert all(re_examples)

##### Zip and Arguement Unpacking

zip lets you conbimne multiple iterables into a single iterable of tuples. zip is lazy like generators. if the two lists are of unequal length then zip stops at the end of the shorter list. 

You can "unzip" a list too with a strange trick. In it the * unpacks the arguments

In [38]:
list1 = ['a','b','c']
list2 = [1,2,3]

#its lazy so we have to use it in an iterable
print(zip(list1, list2))
pairs = [pair for pair in zip(list1, list2)]
print(pairs)

#the trick
letters, numbers = zip(*pairs)
print(letters)

<zip object at 0x00000265C8932EC8>
[('a', 1), ('b', 2), ('c', 3)]
('a', 'b', 'c')
