# A Crash Course in Python

This is not a comprehensive Python tutorial but instead is intended to highlight the parts of the language that will be most important to data folk (some of which are often not the focus of Python tutorials). If you have never used Python before, you probably want to supplement this with some sort of beginner tutorial.

### Whitespace Formatting
Many languages use curly braces to delimit blocks of code. Python uses indentation:

In [192]:
for i in[1,2,3]:
    print(i)# first line in "for i" block
    for j in [4,5]:
        print(j) # first line in "for j" block
        print(i+j)# last line in "for j" block
    print(i)# last line in "for i" block
print("done looping")

1
4
5
5
6
1
2
4
6
5
7
2
3
4
7
5
8
3
done looping


## Modules

Certain features of Python are not loaded by default. These include both features that are included as part of the language as well as third-party features that you download yourself. In order to use these features, you’ll need to import the modules that contain them.


In [193]:
import re
my_regex = re.compile("[0-9]+",re.I) # how to assign a variable with '='

In [None]:
import re as regex
my_regex = regex.compile("[0-9]+",regex.I)

## (Main) types

In [199]:
type(1.0)

float

In [200]:
type(1)

int

In [201]:
type('Hello world')

str

In [202]:
type("Hello world double quot.")

str

### Strings

Python is great for string processing. Check the [documentation](https://docs.python.org/3.7/library/stdtypes.html) for more on this 


In [172]:
tab_string="\t" # represents the tab character

In [None]:
not_tab_string=r"\t"# to use back slaches as back slashes, raw strings represents the characters '\' and 't'

In [186]:
first_name = 'Nick'
second_name = 'Staines    '

In [187]:
full_name = first_name + " " + second_name
full_name

'Nick Staines    '

In [188]:
f"{first_name} {second_name}" # f strings are super cool

'Nick Staines    '

In [189]:
full_name.split(" ")

['Nick', 'Staines', '', '', '', '']

In [190]:
full_name.lower()

'nick staines    '

In [191]:
full_name.strip()

'Nick Staines'

## Data structures

### Lists
Probably the most fundamental data structure in Python is the list, which is simply an ordered collection

In [31]:
integer_list = [1,2,3]
heterogeneous_list = ["string",0.1,True]
list_of_lists = [integer_list, 
                 heterogeneous_list, 
                 [] ]

In [32]:
# other properties 
len(integer_list)

3

In [33]:
# using built-in sum function 
sum(integer_list)

6

#### Indexing

In [34]:
x=[0,1,2,3,4,5,6,7,8,9]

x[0]

0

In [35]:
x[-1] # Pythonic for last element

9

In [36]:
x[-2] 

8

#### Slicing 

In [37]:
first_three = x[:3]
first_three

[0, 1, 2]

In [38]:
three_to_end=x[3:]
three_to_end

[3, 4, 5, 6, 7, 8, 9]

In [39]:
without_first_and_last=x[1:-1]
without_first_and_last

[1, 2, 3, 4, 5, 6, 7, 8]

In [40]:
copy_of_x=x[:]

#### Membership

**This check involves examining the elements of the list one at a time, which means that you probably shouldn’t use it unless you know your list is pretty small**

In [41]:
1 in [1,2,3] 

True

In [42]:
0 in [1,2,3]

False

#### Methods

In [44]:
x = [1,2,3]
x.extend([4,5,6]) # modify x in-place
x

[1, 2, 3, 4, 5, 6]

In [45]:
x = [1,2,3]
x.append(0) # also modify x in place
x

[1, 2, 3, 0]

In [46]:
x, y = [1,2] # unpack

In [47]:
x

1

In [48]:
y

2

### Tuples

Tuples are lists’ immutable cousins. Pretty much anything you can do to a list that doesn’t involve modifying it, you can do to a tuple. You specify a tuple by using parentheses (or nothing) instead of square brackets

In [49]:
my_list=[1,2]
my_tuple=(1,2)
other_tuple=3,4

In [51]:
my_list[1]=3
my_list

[1, 3]

In [53]:
my_tuple[1]=3 #error

TypeError: 'tuple' object does not support item assignment

### Dictionaries

Another fundamental data structure is a dictionary, which associates values with keys and allows you to quickly retrieve the value corresponding to a given key:

In [54]:
empty_dict= {} 
grades={"FGS": 10 ,"SPC":11, 
         'NS': 20, 'CH': 30}

In [55]:
grades['CH']

30

In [57]:
grades['DA'] # error

KeyError: 'DA'

In [60]:
'DA' in grades , 'FGS' in grades # Look ma no Brackets! 

(False, True)

Methods

In [61]:
grades.get('SPC', 0)

11

In [62]:
grades.get('DA', 0)

0

In [63]:
grades.keys() 

dict_keys(['FGS', 'SPC', 'NS', 'CH'])

In [64]:
grades.values()

dict_values([10, 11, 20, 30])

In [65]:
grades.items() # very usefull!! 

dict_items([('FGS', 10), ('SPC', 11), ('NS', 20), ('CH', 30)])

#### Counters

A Counter turns a sequence of values into a `defaultdict(int)` -like (**homework**) object mapping keys to counts:

In [66]:
from collections import Counter

In [67]:
c = Counter([0, 1, 2, 0])
c

Counter({0: 2, 1: 1, 2: 1})

In [152]:
import urllib3 # ignore code for now

url = "https://gist.githubusercontent.com/provpup/2fc41686eab7400b796b/raw/b575bd01a58494dfddc1d6429ef0167e709abf9b/hamlet.txt"

http = urllib3.PoolManager()
response = http.request('GET', url)
data = response.data.decode('utf-8')

In [None]:
h = Counter(word.strip().lower()
        for word in data.split(" ")
        if word)
h # python is fast, as any task use the correct tool

In [154]:
h.most_common(10) # very useful

[('the', 1082),
 ('and', 939),
 ('to', 727),
 ('of', 670),
 ('a', 539),
 ('i', 523),
 ('my', 519),
 ('you', 433),
 ('in', 420),
 ('ham.', 358)]

### Sets

Another useful data structure is set, which represents a collection of distinct elements. You can define a set by listing its elements between curly braces.
However, that doesn’t work for empty sets, as {} already means “emptydict.” In that case you’ll need to use `set()` itself:

In [155]:
primes_below_10 = {2,3,5,7}

In [125]:
s = set()
s.add(1)# s is now {1}
s.add(2)# s is now {1, 2}
s.add(2)# s is still {1, 2}
x=len(s)# equals 2
2 in s # sets are VERY FAST for membership checking!

True

We’ll use sets for two main reasons. The first is that in is a very fast operation on sets. 
If we have a large collection of items that we want to use for a membership test, a set is more appropriate than a list.

In [141]:
ten_k_list = list(range(1_000_000))

In [142]:
%timeit 999000 in ten_k_list

35 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [143]:
ten_k_set = set(range(10_000))

In [144]:
%timeit 999000 in ten_k_set

142 ns ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


The second reason is to find the distinct items in a collection

In [156]:
item_list=[1,2,3,1,2,3]
set(item_list)

{1, 2, 3}

In [161]:
len(set(data.split(" "))) # in hamlet

8829

## Looping 

Python’s for loops are actually for each loops

Because we don’t actually care about the indexes in our loop, there is a much simpler method of looping we can use:

In [194]:
colors = ["red", "green", "blue", "purple"]
for color in colors:
    print(color)

red
green
blue
purple


In [195]:
# if we need indexes
presidents = ["Washington", "Adams", "Jefferson", "Madison", "Monroe", "Adams", "Jackson"]
for num, name in enumerate(presidents, start=1):
    print("President {}: {}".format(num, name))

President 1: Washington
President 2: Adams
President 3: Jefferson
President 4: Madison
President 5: Monroe
President 6: Adams
President 7: Jackson


## Control Flow

In [198]:
if 1 > 2:
    message = "if only 1 were greater than two..."
elif 1 > 3: 
    message = "elif stands for 'else if'"
else:
    message = "when all else fails use else (if you want to)"

In [197]:
message

'when all else fails use else (if you want to)'

In [204]:
x = 0
while x < 10:
    print(f"{x} is less then 10")
    x += 1 # pythonic way for x = x + 1 
    
    

0 is less then 10
1 is less then 10
2 is less then 10
3 is less then 10
4 is less then 10
5 is less then 10
6 is less then 10
7 is less then 10
8 is less then 10
9 is less then 10


In [205]:
# range(10) is the numbers 0, 1, ..., 9
for x in range(10):
    print(f"{x} is less than 10")

0 is less than 10
1 is less than 10
2 is less than 10
3 is less than 10
4 is less than 10
5 is less than 10
6 is less than 10
7 is less than 10
8 is less than 10
9 is less than 10


If you need more complex logic, you can use continue and break:

In [206]:
for x in range(10):
    if x == 3: # assert equality in Python
        continue #go immediately to the next iteration
    if x == 5:
        break #exit the loop
    print(x)

0
1
2
4


## Truthiness

In [211]:
one_is_less_than_two=1<2 # equals True
true_equals_false = True == False

In [212]:
x = None

assert x == None,"this is the not the Pythonic way to check for None"  # assert statement checking for validity
assert x is None,"this is the Pythonic way to check for None"

In [213]:
falsy_items = [
    False,
    None,
    [],
    {},
    "",
    set(),
    0,
    0.0   
]

In [217]:
for item in falsy_items:
    print(f"item is {item} and type is {bool(item)}")

item is False and type is False
item is None and type is False
item is [] and type is False
item is {} and type is False
item is  and type is False
item is set() and type is False
item is 0 and type is False
item is 0.0 and type is False


Pretty much anything else gets treated as True.

In [220]:
bool(10), bool("hello"), bool([1,2,3])

(True, True, True)

## Functions

A function is a rule for taking zero or more inputs and returning a corresponding output. In Python, we typically define functions using def

In [221]:
def double(x):
    """
    This is where you put an optional docstring 
    that explains what the function does. 
    For example, this function multiplies its input by 2
    """
    return x * 2

xs = [1,10,100]

for x in xs:
    print(double(x=x))

2
20
200


Python functions are first-class, which means that we can assign them to variables and pass them into functions just like any other arguments:

In [222]:
def apply_to_one(f):
    """Calls the function f with 1 as its argument"""
    return f(1)

my_double = double # assign a funtion to variable
print(apply_to_one(my_double))
    

2


Function parameters can also be given default arguments, which only needto be specified when you want a value other than the default:

In [223]:
def full_name(first="What's-his-name", last="Something"):
    return first + " " + last

In [225]:
full_name()

"What's-his-name Something"