# A Crash Course in Python

This notebook is created from the script `crash_course_in_python.py` (retrieved from the books Github: https://github.com/joelgrus/data-science-from-scratch/blob/d5d0f117f41b3ccab3b07f1ee1fa21cfcf69afa1/scratch/crash_course_in_python.py) with some smaller modifications

In [None]:
"""
This is just code for the introduction to Python.
It also won't be used anywhere else in the book.
"""

## Whitespace formatting

Many languages use curly braces to delimit blocks of code. Python uses indentation. Python consider tabs and whitespaces as two different things. Always use spaces (maybe your editor or Jupyter Notebook is set to make spaces when you hit the tab button).

In [None]:
for i in [1, 2, 3, 4, 5]:
    print(i)                    # first line in "for i" block
    for j in [1, 2, 3, 4, 5]:
        print(j)                # first line in "for j" block
        print(i + j)            # last line in "for j" block
    print(i)                    # last line in "for i" block
print("done looping")

In [None]:
long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 +
                           13 + 14 + 15 + 16 + 17 + 18 + 19 + 20)

In [None]:
list_of_lists = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [None]:
list_of_lists

In [None]:
easier_to_read_list_of_lists = [[1, 2, 3],
                                [4, 5, 6],
                                [7, 8, 9]]

In [None]:
easier_to_read_list_of_lists

In [None]:
two_plus_three = 2 + \
                 3

In [None]:
for i in [1, 2, 3, 4, 5]:

    # notice the blank line
    print(i)

## Modules

Many Python features are note loaded by default and we often use external package for data science. You can also make your own modules. To use any of these you first need to load the module.

In [None]:
# Import an entire module (prefix subsequent function call with the name of the module)
import re
my_regex = re.compile("[0-9]+", re.I)
my_regex

In [None]:
# You can define what name you want to call the module by yourself
import re as regex
my_regex = regex.compile("[0-9]+", regex.I)
my_regex

Standard conventions for heavily used modules in data science are:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm

In [None]:
# If you only need a few specific function or classes from a module
# you can import then only and use them without a prefic
from collections import defaultdict, Counter
lookup = defaultdict(int)
my_counter = Counter()
my_counter

You can also import everything from a module to use all it functions without a prefix, but this you should never do! (How to do it: `from re import *`)

## Functions

A function is something that takes zero or more input and return an output. 

In Python, functions are first-class citizen, in the sense that we can assign them to variables and pass them into other functions etc.

In [None]:
# Definition of a function
def double(x):
    """
    This is where you put an optional docstring that explains what the
    function does. For example, this function multiplies its input by 2.
    """
    return x * 2

In [None]:
double

In [None]:
double(42)

In [None]:
?double

In [None]:
# Or press shift+tab when you're at the function to get the information on the function

In [None]:
def apply_to_one(f):
    """Calls the function f with 1 as its argument"""
    return f(1)

In [None]:
my_double = double             # refers to the previously defined function
x = apply_to_one(my_double)    # equals 2

In [None]:
x

In [None]:
assert x == 2

In [None]:
assert x == 4

In [None]:
# "Define" functions without naming the using classic lambda notation (when you're not going to use the function again)
y = apply_to_one(lambda x: x + 4)
assert y == 5

In [None]:
# Arguments to functions can have default arguments
def my_print(message = "my default message"):
    print(message)
    
my_print("hello")   # prints 'hello'
my_print()          # prints 'my default message'

In [None]:
# You can specify an argument by name when calling a function, which can be useful sometimes
def full_name(first = "What's-his-name", last = "Something"):
    return first + " " + last

In [None]:
full_name("Joel", "Grus")     # "Joel Grus"
full_name("Joel")             # "Joel Something"
full_name(last="Grus")        # "What's-his-name Grus"

Note the difference between using `print` and `return` in a function

In [None]:
full_name("Joel", "Grus")

In [None]:
full_name("Joel")

In [None]:
full_name(last="Joel")   # Note how this is different from the previous line

## Strings

In [None]:
# Using single or double quotes
single_quoted_string = 'data science'
double_quoted_string = "data science"

In [None]:
single_quoted_string

In [None]:
double_quoted_string

In [None]:
# Define multi line strings
multi_line_string = """This is the first line.
and this is the second line
and this is the third line"""

multi_line_string

In [None]:
# Concattinating strings

In [None]:
first_name = "Joel"
last_name = "Grus"

full_name1 = first_name + " " + last_name             # string addition
full_name2 = "{0} AwesomeMiddleName {1}".format(first_name, last_name)  # string.format

In [None]:
full_name1

In [None]:
full_name2

In [None]:
full_name3 = f"{first_name} MoreAwesomeMiddleName {last_name}"

full_name3

## Exceptions

When things go wrong Python raises expections. However, you can explicitly handle exceptions using `try` and `except`

In [None]:
try:
    print(0 / 0)
except ZeroDivisionError:
    print("cannot divide by zero")

## List

Are the most fundamental data strcuture in Python, which is an ordered collection (sometimes called an array in other languages)

In [None]:
integer_list = [1, 2, 3]
heterogeneous_list = ["string", 0.1, True]
list_of_lists = [integer_list, heterogeneous_list, []]

list_length = len(integer_list)     # equals 3
list_sum    = sum(integer_list)     # equals 6

In [None]:
integer_list

In [None]:
heterogeneous_list

In [None]:
list_of_lists

In [None]:
list_length

In [None]:
list_sum

In [None]:
# You can subset list by quare brackets
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
x[0]          # equals 0, lists are 0-indexed

In [None]:
x[1]  

In [None]:
x[-1]         # equals 9, 'Pythonic' for last element

In [None]:
x[-2]        # equals 8, 'Pythonic' for next-to-last element

In [None]:
# Or to assign elements of a list
x[0] = -1            # now x is [-1, 1, 2, 3, ..., 9]

x

In [None]:
# You can slice part of a list using :
x[:3] 

In [None]:
x[3:]

In [None]:
x[1:5] # note that it is up the the fifth element, but not including it

In [None]:
x[-3:]  # last_three

In [None]:
x[1:-1]  # without the first and the last element

In [None]:
x[:]  # A copy of the entire list x

In [None]:
x[5:2:-1]  # A third argument to slicing can be a stride, which can also be negative

In [None]:
# Checking for list membership
1 in [1, 2, 3]    # True

In [None]:
0 in [1, 2, 3]    # False

In [None]:
# Concatinating a list without changing it
x = [1, 2, 3]
y = x + [4, 5, 6]       # y is [1, 2, 3, 4, 5, 6]; x is unchanged
y

In [None]:
x

In [None]:
# Extending an excisting list
x = [1, 2, 3]
x.extend([4, 5, 6]) 
x # x is now [1, 2, 3, 4, 5, 6]

In [None]:
# Appinding one element at the time
x = [1, 2, 3]
x.append(0)      # x is now [1, 2, 3, 0]
x

In [None]:
# unpacking a list
x, y = [1, 2]    # now x is 1, y is 2

In [None]:
x

In [None]:
y

In [None]:
_, y = [1, 23]    # now y == 23, didn't care about the first element
y

In [None]:
# a list can be sorted
x = [4, 1, 2, 3]
# sorted returns a new sorted list
sorted(x)

In [None]:
# the sort method sort an existing list
x.sort()
x

## List comprehensions

A smart way of creating list from existing list by only chosing some elements or applying functions to them.

In [None]:
[x for x in range(5) if x % 2 == 0]

In [None]:
[x * x for x in range(5)]

In [None]:
[x * x for x in range(5) if x % 2 == 0] 

In [None]:
# Also works for dictionaries and sets
{x: x * x for x in range(5)}

In [None]:
{x * x for x in [2, 1, -1, -2]} 

In [None]:
# Underscore can be used if you do not need the element from the original list
[0 for _ in range(5)]      # has the same length as range(5)

In [None]:
increasing_pairs = [(x, y)                       # only pairs with x < y,
                    for x in range(10)           # range(lo, hi) equals
                    for y in range(x + 1, 10)]   # [lo, lo + 1, ..., hi - 1]
increasing_pairs

## Tuples

Are like lists but immutable

In [None]:
my_list = [1, 2]
my_tuple = (1, 2)
other_tuple = 3, 4

In [None]:
my_list

In [None]:
my_tuple

In [None]:
other_tuple

In [None]:
my_list[1] = 3      # my_list is now [1, 3]
my_list

In [None]:
try:
    my_tuple[1] = 3
except TypeError:
    print("cannot modify a tuple")

In [None]:
def sum_and_product(x, y):
    return (x + y), (x * y)

sp = sum_and_product(2, 3)     # sp is (5, 6)
s, p = sum_and_product(5, 10)  # s is 15, p is 50

sp

In [None]:
s

In [None]:
p

In [None]:
x, y = 1, 2     # now x is 1, y is 2
x, y = y, x     # Pythonic way to swap variables; now x is 2, y is 1

In [None]:
x

In [None]:
y

## Dictionaries

Dictionaries are an important key-value data structure. 

In [None]:
empty_dict = {}                     # Pythonic
empty_dict2 = dict()                # less Pythonic
grades = {"Joel": 80, "Tim": 95}    # dictionary literal
grades

In [None]:
grades["Joel"] 

In [None]:
try:
    kates_grade = grades["Kate"]
except KeyError:
    print("no grade for Kate!")

In [None]:
"Joel" in grades

In [None]:
"Kate" in grades 

In [None]:
grades.get("Joel", 0) # Get method returns a default instead of raising an exception. This default can be specied

In [None]:
grades.get("Kate") 

In [None]:
grades.get("Kate", 0) 

In [None]:
grades["Tim"] = 99                    # replaces the old value
grades["Kate"] = 100                  # adds a third entry
grades

In [None]:
# Another example of Dictionary with more complex data
tweet = {
    "user" : "joelgrus",
    "text" : "Data Science is Awesome",
    "retweet_count" : 100,
    "hashtags" : ["#data", "#science", "#datascience", "#awesome", "#yolo"]
}
tweet

In [None]:
     # iterable for the keys

In [None]:
tweet.values()   # iterable for the values

In [None]:
tweet.items()    # iterable for the (key, value) tuples

In [None]:
"user" in tweet.keys()          # True, but not Pythonic

In [None]:
"user" in tweet                 # Pythonic way of checking for keys

In [None]:
"joelgrus" in tweet.values()     # True (slow but the only way to check)

## Counters

Can count the occurences of elements of a list

In [None]:
from collections import Counter
c = Counter([0, 1, 2, 0, 5, 3, 1, 1])
c

In [None]:
c.most_common(3)

## Sets

A collection of distinct elements. Uses { and } instead of [ and ] used for lists

In [None]:
s = set()
s

In [None]:
s.add(1)       # s is now {1}
s

In [None]:
s.add(2)       # s is now {1, 2}
s

In [None]:
s.add(2)       # s is still {1, 2}
s

In [None]:
# The in operation that checks for membership is fast
2 in s

In [None]:
3 in s

In [None]:
# Sets are good at giving us distinct elements
set([0, 1, 2, 0, 5, 3, 1, 1])

## Control flows

In [None]:
if 1 > 2:
    message = "if only 1 were greater than two..."
elif 1 > 3:
    message = "elif stands for 'else if'"
else:
    message = "when all else fails use else (if you want to)"
    
message

In [None]:
"even" if x % 2 == 0 else "odd"

In [None]:
# while loops
x = 0
while x < 10:
    print(f"{x} is less than 10")
    x += 1

In [None]:
# For loops
# range(10) is the numbers 0, 1, ..., 9
for x in range(10):
    print(f"{x} is less than 10")

In [None]:
for x in range(10):
    if x == 3:
        continue  # go immediately to the next iteration
    if x == 5:
        break     # quit the loop entirely
    print(x)

## Booleans

In [None]:
1 < 2 

In [None]:
1 >= 2 

In [None]:
True == False

In [None]:
1 == 1

In [None]:
# None represent non existing truth value
None == False

In [None]:
None == True

In [None]:
x = 1 < 2

In [None]:
 x is True

In [None]:
x is None

In [None]:
all([1 < 2, 2 < 3, 3 < 4])

In [None]:
all([1 < 2, 2 < 3, 3 > 4])

In [None]:
any([1 < 2, 2 < 3, 3 > 4])

In [None]:
all([])

In [None]:
any([])

In [None]:
1 < 2 and 2 < 3  # conjunction "and"

In [None]:
1 < 2 and 2 > 3

In [None]:
1 < 2 or 2 > 3 # disjunction "or"

In [None]:
not 1 < 2 # negation

## Object-Oriented Programming

Like many other languages Python allow you define classes and methods on them

In [None]:
class CountingClicker:
    """A class can/should have a docstring, just like a function"""

    def __init__(self, count = 0):
        self.count = count

    def __repr__(self):
        return f"CountingClicker(count={self.count})"

    def click(self, num_times = 1):
        """Click the clicker some number of times."""
        self.count += num_times

    def read(self):
        return self.count

    def reset(self):
        self.count = 0

In [None]:
?CountingClicker

In [None]:
CountingClicker()

In [None]:
clicker = CountingClicker(1)

In [None]:
clicker.read() 

In [None]:
clicker.click()
clicker.click(7)
clicker.read() 

In [None]:
clicker

In [None]:
# A subclass inherits all the behavior of its parent class.
class NoResetClicker(CountingClicker):
    # This class has all the same methods as CountingClicker

    # Except that it has a reset method that does nothing.
    def reset(self):
        pass

## Iterables and generators

Sometimes we need special objects to iterate over in for loops for instance.

In [None]:
# We can define them as function using the yield operator
def generate_range(n):
    i = 0
    while i < n:
        yield i   # every call to yield produces a value of the generator
        i += 1

In [None]:
generate_range(10)

In [None]:
for i in generate_range(10):
    print(f"i: {i}")

In [None]:
# We can also generate infinite generators
def natural_numbers():
    """returns 1, 2, 3, ..."""
    n = 1
    while True:
        yield n
        n += 1

In [None]:
natural_numbers()

In [None]:
# Generators can only be used once
gen10 = generate_range(10)

for i in gen10:
    print(f"i: {i}")

In [None]:
for i in gen10:
    print(f"i: {i}")

In [None]:
# Using comprehension to define  a generator
evens_below_20 = (i for i in generate_range(20) if i % 2 == 0)
evens_below_20

In [None]:
# enumerate let us iterate of the indexes of a list of names
names = ["Alice", "Bob", "Charlie", "Debbie"]
names

In [None]:
enumerate(names)

In [None]:
for i, name in enumerate(names):
    print(f"name {i} is {name}")

## Randomness

The `random` module allow us to generate random numbers - that is "pseudo random numbers" based on an internal state that can be set with `random.seed`.

In [None]:
import random
random.random()

In [None]:
random.random()

In [None]:
# Setting a seed to ensure we always get the same random number - essential to ensure reproducability
random.seed(7439)
random.random()

In [None]:
random.seed(7439)
random.random()

In [None]:
[random.random() for _ in range(4)]

In [None]:
# random drawing from a list of integers
random.randrange(10)    # choose randomly from range(10) = [0, 1, ..., 9]

In [None]:
random.randrange(3, 6)  # choose randomly from range(3, 6) = [3, 4, 5]

In [None]:
# random shuffling
up_to_ten = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
random.shuffle(up_to_ten)
up_to_ten

In [None]:
# random choice
random.choice(["Alice", "Bob", "Charlie"]) 

In [None]:
# Random sampling without replacement
random.sample([1,2,3,4,5,6,7,8,9,10], 7)

In [None]:
# Random sampling with replacement
[random.choice([1,2,3,4,5,6,7,8,9,10]) for _ in range(7)]

## Regular expressions

Regular expression are also useful in Python and in Data Science

In [None]:
import re

re_examples = [                        # all of these are true, because
    not re.match("a", "cat"),              #  'cat' doesn't start with 'a'
    re.search("a", "cat"),                 #  'cat' has an 'a' in it
    not re.search("c", "dog"),             #  'dog' doesn't have a 'c' in it
    3 == len(re.split("[ab]", "carbs")),   #  split on a or b to ['c','r','s']
    "R-D-" == re.sub("[0-9]", "-", "R2D2") #  replace digits with dashes
    ]

re_examples

In [None]:
not re.match("a", "cat")             #  'cat' doesn't start with 'a'

In [None]:
if re.search("a", "cat"):
    print("That's true")

In [None]:
re.search("a", "cat")                 #  'cat' has an 'a' in it

In [None]:
if re.search("a", "cat"):
    print("That's true")

In [None]:
re.split("[ab]", "carbs")    #  split on a or b to ['c','r','s']

In [None]:
re.sub("[0-9]", "-", "R2D2") #  replace digits with dashes

## zip and argument unpacking

In [None]:
# Zip iterables together
list1 = ["a", "b", "c"]
list2 = [1, 2, 3]

zip(list1, list2)

In [None]:
[pair for pair in zip(list1, list2)]

In [None]:
zipped = [pair for pair in zip(list1, list2)]
zipped

In [None]:
letters, numbers = zip(*zipped)

In [None]:
letters

In [None]:
numbers

## Type annotations

Python is dynamically typed, that is it does not care about the type of objects as long as they are used in a valid way

In [None]:
def add(a, b):
    return a + b

In [None]:
add(10, 5)    # + is valid for numbers

In [None]:
add([1, 2], [3])   # + is valid for lists

In [None]:
add("hi ", "there")   # + is valid for strings

In [None]:
try:
    add(10, "five")
except TypeError:
    print("cannot add an int to a string")

In [None]:
# You can add type notation, but doesn't actually do anything
def add(a: int, b: int) -> int:
    return a + b

In [None]:
add(10, 5)           # you'd like this to be OK

In [None]:
add("hi ", "there")  # you'd like this to be not OK

The book list four reasons for still using type notation (and the book uses them)

- A form of documenation
- External tools can use it
- Makes you think about designing clear functions
- Your editor might have functionality that can utilize it

In [None]:
# To type annotate a list of floats
from typing import List  # note capital L

def total(xs: List[float]) -> float:
    return sum(xs)

... see the book for more examples of types