# Base Python Coding Review

*Author: Evan Carey, written for BH Analytics*

*Copyright 2017-2019, BH Analytics, LLC*

## Overview

We will now review base Python collections (groups of values), and functions. You will use these concepts throughout your data engineering and manipulation efforts in Python. 

* Python Collections
* Slicing
* Loops / list comprehension
* Functions
* Generators
* Classes and OOP design principles

## Packages

We will load up a few packages first:

In [1]:
# load some packages
import sys
import os
import textwrap # adding this to make wrapping text easier for printed materials. 

Now that I have loaded that package, I can access the objects inside the package by typing `package.object` generally. In this case, I am getting the Python version by typing `sys.version`, and then calling the `textwrap.fill()` function on that result. 

In [2]:
# Get Version information
print(textwrap.fill(sys.version),'\n')

3.6.7 | packaged by conda-forge | (default, Feb 26 2019, 03:50:56)
[GCC 7.3.0] 



I am going to add this piece of code to ensure all the Python output comes through each of the jupyter notebook chunks. If not, the default behaviour is to only print the final output. 

In [3]:
## So all output comes through from Ipython
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Check your Working Directory

As always, I start by changing the working directory to be 

In [4]:
# Working Directory
print("My working directory:\n" + os.getcwd())
# Set Working Directory (if needed)
os.chdir(r"/home/ra/host/BH_Analytics/Discover/DataEngineering/")
# Confirm it changed the working Directory
print("My working directory:\n" + os.getcwd())

My working directory:
/home/ra/host/BH_Analytics/Discover/DataEngineering/notebooks
My working directory:
/home/ra/host/BH_Analytics/Discover/DataEngineering


## Data Structures (containers or collections)

What if we want to organize multiple python objects into a single object? We call this a 'collection' of objects. An example might be a list of customer_ids, or a list of addresses...  
There are multiple collections (or containers) in Python we can use. We will go over the basics here:

* Tuple
 * Immutable, fixed length sequence
* Lists
 * Mutable sequence of objects
* Sets
 * Unique collection of objects
* Dicts (dictionaries)
 * List of objects with a "key"

### Tuple

A tuple is similar to a list, but it is immutable. The `()` is implied when creating a comma separated object.

In [5]:
# Tuple
tup1 = 3, 4, 5, 6
tup2 = (3, 4, 5, 6)
tup1 == tup2

True

In [6]:
# Immutable, assignment raises error (like strings)
tup1[1] = 10

TypeError: 'tuple' object does not support item assignment

We can convert any iterable object to a tuple with the `tuple()` function. If we call this on a string, what do you expect to happen?  
Are the two calls below identical?

In [7]:
# type conversion
b = ("long string!")
print(b)
type(b) # didn't work

long string!


str

In [8]:
# Now it is a tuple
b = "long string!",
print(b)
type(b)

('long string!',)


tuple

In [9]:
# this converts it as an iterable, probably not what we wanted...
b = tuple("long string!")
print(b)

('l', 'o', 'n', 'g', ' ', 's', 't', 'r', 'i', 'n', 'g', '!')


More notes on Tuples:
* Tuples can be concatenated with `"+"`
* Tuples can be nested
* We index tuples as we have other objects
 * Use `[]` Counting starts at 0


In [10]:
# Tuple concatenation
tup1 = 1, 2
tup2 = 3, 4
tup1 + tup2

(1, 2, 3, 4)

In [11]:
# Nesting
nest = (("the", "first"), ("the", "second"))
nest[0]
nest[0][0]
nest[0][0][2]

('the', 'first')

'the'

'e'

Tuple unpacking - This is a "Pythonic" thing. Tuples will "unpack" themselves upon assignment if possible.

In [12]:
# Unpacking
a, b = nest
print(a)

(a1, a2), (b1, b2) = nest
print(a1)

('the', 'first')
the


### Lists

A list is a container that can organize other values, much like a tuple. The main difference is that lists are mutable, so we can grow them, or change elements of them. We create a list using the square brackets `[]`

In [13]:
#### Containers
## Lists
colList = ["white", "yellow", "red"]
colList
type(colList)

['white', 'yellow', 'red']

list

In [14]:
## First Element
colList[0]
## First and Second Element
colList[0:2]
## Last element
colList[-1]

'white'

['white', 'yellow']

'red'

We can append, insert, and remove list elements with list methods. This is because they are iterable...

In [15]:
list2 = ["a", "b", "c"]
list2.extend(["d", "e"])
list2

['a', 'b', 'c', 'd', 'e']

In [16]:
### List functions
## Append an element
list2 = ["a", "b", "c"]
list2.append("d")
list2

['a', 'b', 'c', 'd']

In [17]:
## Insert an element
list2.insert(1, "a.5")
list2

['a', 'a.5', 'b', 'c', 'd']

In [18]:
## Remove an element
list2.remove("a.5")
list2

['a', 'b', 'c', 'd']

In [19]:
## Append another list...what happened?
list2.append(["d", "e", "f"])
list2
# Didn’t work as expected

['a', 'b', 'c', 'd', ['d', 'e', 'f']]

We can add items onto the end of a list using either append or extend as you just saw

In [20]:
# Run help(list) to view help on list module
list2.extend(["d", "e", "f"])
print(list2)

['a', 'b', 'c', 'd', ['d', 'e', 'f'], 'd', 'e', 'f']


What if we want to concatenate or repeat a list?

* Lists can be concatenated with the `+` symbol  

* Lists can be repeated with `*` *integer*

In [21]:
## Concatenate lists with +
print(colList + ["Next", "List"])

## repeat a list with * (must be int)
colList*2

['white', 'yellow', 'red', 'Next', 'List']


['white', 'yellow', 'red', 'white', 'yellow', 'red']

#### Tuples Versus Lists

* Tuples `()` are immutable, fixed length.  
* Lists `[]` are the opposite.
* This is the main functional difference.  
* Use tuples when you want implicit read-only protection on the data, implied constant.  
* Tuples are faster to iterate through.  

#### Slicing (Indexing)

* We have seen this a few times, let’s formally address it.
* We can slice (index) sequence like objects similarly
 * Use `[]`. Python starts counting at 0, so `var[0]` is the first element
 * We can get a range (a slice) using `":"` example: `var[0:2]`
 * Slices are __start__ inclusive, __finish__ exclusive

In [22]:
# Slicing: first create a list of numbers
seq1 = list(range(1, 11))
seq1

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [23]:
## Different slicing techniques
# first and second
seq1[0:2]
# Third to the end
seq1[2:]
# up to the 8 element
seq1[:8]
# only the 8 element
seq1[7:8]

[1, 2]

[3, 4, 5, 6, 7, 8, 9, 10]

[1, 2, 3, 4, 5, 6, 7, 8]

[8]

The `":"` has three arguments: *Start*, *stop*, *increment* 

So far, we are leaving the increment blank and therefore it is the default value of 1. 

*Start*/*Stop* can be negative, implying count from the end. We'll explore this below. *Increment* default value is 1, but it can be any integer (including negative values).

In [24]:
# Slice stepping 
print("Starting sequence:", seq1)

Starting sequence: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [25]:
# first from the end
seq1[-1]

10

In [26]:
# first to the 9th by 2
seq1[:9:2]

[1, 3, 5, 7, 9]

In [27]:
# 10th to the first by negative 2
seq1[9:0:-2]

[10, 8, 6, 4, 2]

In [28]:
# why does this not print anything?
seq1[-2:-5]

[]

In [29]:
# Because of the step! (increment)
seq1[-2:-5:-1]

[9, 8, 7]

In [30]:
# How would you Reverse the entire sequence? 
seq1[::-1]

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

### Dictionaries

A dictionary, or dict, is a collection of objects that have been mapped to a key (a collection of key-value pairs). Dicts are inherently unordered, so we subset only by using the key.

In [31]:
#### Dicts
d1 = {'id': [1, 2, 3, 4], 'SES': ["high", "low"]}

In [32]:
d1['id']

[1, 2, 3, 4]

In [33]:
d1['SES']

['high', 'low']

In [34]:
d1.keys()

dict_keys(['id', 'SES'])

In [35]:
d1.values()

dict_values([[1, 2, 3, 4], ['high', 'low']])

In [36]:
## print keys and values 
d1.items()

dict_items([('id', [1, 2, 3, 4]), ('SES', ['high', 'low'])])

###  Ordered Dictionaries

We may wish to track the order of items in a dictionary for some use cases. There is a subclass of dictionaries called the `OrderedDict`. This will remember the order items were added and keep them in that order during iteration. In Python < 3.6, this behaviour would be different in dicts and ordered dicts. In Python >= 3.6 regular dicts also track the order of objects, but this is a side-effect rather than intentional behaviour. Use an ordered dict if you want to maintain a specific order during iteration. 

It does require importing the collections module however. Let's see a few differences:

In [37]:
## Regular Dict
new_dict = {}
new_dict['User'] = 'John Smith'
new_dict['Age'] = 34
new_dict['Salary'] = '60k'

## print dict results
for i, value in new_dict.items():
    print(i, value)

User John Smith
Age 34
Salary 60k


In [38]:
## ordered dict
import collections
new_dict2 = collections.OrderedDict()
new_dict2['User'] = 'John Smith'
new_dict2['Age'] = 34
new_dict2['Salary'] = '60k'

## print dict results
for i, value in new_dict2.items():
    print(i, value)

User John Smith
Age 34
Salary 60k


That looked the same, although it would not have in earlier version of Python. However, we can explicitly move objects to the start or end of an ordered dict: 

In [39]:
new_dict2.move_to_end('Age')

## print dict results
for i, value in new_dict2.items():
    print(i, value)

User John Smith
Salary 60k
Age 34


## Control Flow

### If-Else flows

The `if` statement checks a condition. If true, it executes the first code block. Additional if conditions can be checked with `elif`. If none of these are true, the `else` block is executed. 

In [40]:
the_answer = 30

if the_answer == 42:
    print ("good job.")
elif the_answer == 0:
    print ("nice try.")
else:
    print ("better luck next time.")

better luck next time.


### While Loop

The while loop in Python works similarly to other languages. You start with a condition, once it evaluates to false the loop is terminated.  


In [41]:
## While Loops
counter = 5
while counter > 0:
    print("Still Positive...i = " + str(counter))
    counter = counter - 1
    print(counter)

Still Positive...i = 5
4
Still Positive...i = 4
3
Still Positive...i = 3
2
Still Positive...i = 2
1
Still Positive...i = 1
0


### For Loops

The Python for loop looks almost like Pseudo-code! Whitespace is used to indicate the interior of the loop. Indentation matters! In fact, the loop contents are defined by having similarly indented code. 

In [42]:
#### Control flow
## For Loops
for i in range(10):
    print(i)
    print('Next!')

0
Next!
1
Next!
2
Next!
3
Next!
4
Next!
5
Next!
6
Next!
7
Next!
8
Next!
9
Next!


We often want to enumerate within the for loop (establish a counter). You can do this using tuple unpacking and the `enumerate()` function. `enumerate()` returns a generator that will yield a tuple of length two, with the values of the iterable and a counter. We haven't discussed generators yet...think of them like a lazy version of an object. In this case, instead of making the entire tuple at once, `enumerate()` creates a generator that will give us the elements of the tuple as we call them.

In [43]:
## Create a list of names
name_list = ['Ra', 'John', 'Kerry']
enumerate(name_list) # this is a generator

<enumerate at 0x7f53946f3240>

In [44]:
## ask for elements of the generator:
for i in enumerate(name_list):
    i

(0, 'Ra')

(1, 'John')

(2, 'Kerry')

We typically use tuple unpacking to catch the counter part like this:

In [45]:
## enumeration within for loop
for counter,element in enumerate(["The", "Dog", "Cat"]):
    if counter == 0:
        print("First Word: " + element)
    else:
        print("Next Word: " + element)

First Word: The
Next Word: Dog
Next Word: Cat


You should use `enumerate()` to keep a counter in a for loop. Open the Python enumerate help document to view more information by submitting `?enumerate`

### List Comprehension

List comprehension is a popular part of Python. Essentially, it's a for loop that returns a list. If we didn't have list comprehension, we would need to first declare the new list, then add in the elements as we iterated through something. That would look like this:

In [46]:
## Create a new variable that is x1 squared
x1 = list(range(10))
y1 = []
for i in x1:
    y1.append(i*i)
x1
y1

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Instead, we use list comprehension to accomplish this all at once. 

In [47]:
## List comprehension
[i*i for i in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

We can add a filter like this:

In [48]:
[i*i for i in range(10) if i > 5]

[36, 49, 64, 81]

We can also add in a conditional return in the list comprehension:

In [49]:
## If we want a conditional we move it up front
[i*i if i > 0 else i for i in range(-5, 5)]

[-5, -4, -3, -2, -1, 0, 1, 4, 9, 16]

## Functions introduction

We should avoid writing the same code many times. *Computers are good at iteration :)*  
Thus, we can wrap code into functions for reuse.  

* Functions in Python:
 * Use `def()` to start
 * We define the function name, the parameters (inputs)
 * Whitespace is significant as always
 * Use `return` statement to return values (multiple allowed)


This is the general syntax to define a new function:

>`Def new_func(par1,par2,some_other_par="default:"):
    _do_some_stuff_
    return some_related_stuff`
    
Let's try a simple function!


In [50]:
# First Function
def mult_by_3(x, weight=1):
    """Takes in a number and multiplies it by 3.
    The result is weighted by the optional weight parameter."""
    result = x * weight * 3
    return result


mult_by_3(4)
mult_by_3(4, 0.5)
mult_by_3(weight=0.25, x=4)

12

6.0

3.0

Functions can return multiple values (and also unpack them!)

In [51]:
# Multiple Values
def multi_func(a, b, c):
    """Takes in three numbers and returns the 
    increment of the first, adds three to the second, and 
    takes three from the third number."""
    return (a + 1, b + 3, c - 3)
    
a1, b1, c1 = multi_func(1, 1, 1)

a1

2

## Anonymous Functions

These are so called `"lambda"` functions.  
One line function expressions (useful construct in data science):  
>`new_func = lambda par [,par2,…]:  expression`

Let's try using a `lambda` function

In [52]:
# Lambda
lam_func = lambda x1, x2: x1 + x2
lam_func(3, 4)

# Multiple returns
lam_func2 = lambda x1, x2: (x1 + x2, x1 - x2)
lam_func2(3, 4)

7

(7, -1)

### Iterables and Generators

You have already been exposed to many objects in Python that can be considered 'iterable'. An iterable in python is any object that is capable of returning an iterator, which will be capable of returning all the elements of the object. Here are a few examples of iterable objects:

In [53]:
# range() is an iterable object
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


In [54]:
# retrieve the iterator from the iterable
iter(range(10))
range(10).__iter__() # same thing

# get iterator
range_iterator = iter(range(10))
# ask for next object
next(range_iterator)
next(range_iterator)
next(range_iterator)
next(range_iterator)
next(range_iterator)

<range_iterator at 0x7f5394631db0>

<range_iterator at 0x7f5394631bd0>

0

1

2

3

4

In [55]:
## Lists are iterables
x1 = [3,5,9]
x1_iter = iter(x1)
next(x1_iter)
next(x1_iter)

3

5

An iterable is essentially a data generating factory, lazily yielding one piece of itself at a time when we ask for it, until it has been exhausted.


### Making your own iterable class

Iterable are classes in Python. The defining trait is they have an __iter__ method. If you want to create your own iterable, you need to establish a new class with an __iter__ method. Here is a simple class that will generate numbers from start to stop by 10, then stop iteration once reaching the max.


In [56]:
class range10:
    def __init__(self,start,stop):
        self.start = start
        self.stop = stop
        self.current = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current > self.stop:
            raise StopIteration
        self.value = self.current
        self.current += 10
        return self.value

In [57]:
## Test class
range10(0,50)
list(range10(0,50))

<__main__.range10 at 0x7f53945d1978>

[0, 10, 20, 30, 40, 50]

That is nifty, but seems like a lot of work. A beautiful part of Python is something called a generator, which is a simple way to create an iterator.

### Generators in Python

If you wish to get the behaviour of an iterator, but you do not want to go to the effort of creating a new class and assigning the __iter__ and __next__ methods, you can easily create a generator by using a function combined with a yield statement. Here is an example of doing the same thing we did above, but expressing it as a generator through a function:


In [58]:
## Create generator 
def range10(start,stop):
    current_value = start
    while current_value < stop:
        yield current_value
        current_value +=10

In [59]:
## test the result
list(range10(0,150))
x1 = range10(0,150)
next(x1)
next(x1)

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140]

0

10

Note that we can 'exhaust' a generator object by moving through all of it's values:

In [60]:
## establish generator object
x2 = range10(0,150)
## exhaust it
for i in x2:
    print(i)
## try to get one more
next(x2)

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140


StopIteration: 

As a quick check, can you tell me why this code works, even though the code above threw an error?

In [61]:
## why no error?
for i in range10(0,150):
    print(i)
## try to get one more
next(range10(0,150))

0
10
20
30
40
50
60
70
80
90
100
110
120
130
140


0

It is because we constructed a new generator when we called the generator function again.

### Generator expressions

Do you recall how list comprehension worked? It allowed use to quickly process an iterable, create a list, then populate the list with the result (optionally filtering while we go). It is an alternative to a for loop. Here is an example:


In [62]:
## list comprehension review
data1 = [-3,-2,-1,0,1,2,3]
[100*x for x in data1 if x > 0]

[100, 200, 300]

There is something called a generator expression in Python, which is an even easier way to create a generator. We can use the same sort of syntax as list comprehension, but replace the square brackets with paranthesis. Here is an example of creating the same thing, but turning it into a generator: 

In [63]:
## create generator
(100*x for x in data1 if x > 0)

<generator object <genexpr> at 0x7f53945c6888>

In [64]:
## yield values from the generator
for i in (100*x for x in data1 if x > 0):
    print(i)

100
200
300


One thing to keep in mind for all of these generator objects we have constructed: they are memory efficient! They do not store the entire object in memory upon creation, they only generate each iteration as needed (lazy evaluation). If you are working with large objects, this can be a large memory efficiency gain.

## Wrap-up!

This was a non-comprehensive review of concepts from the Python standard library.


How do you feel about the content presented here?  


We'll be expanding these base concepts further. Be sure you're comfortable with these fundamental Python concepts so you can be successful in subsequent material.

What we covered:

* Python Collections
* Slicing
* Loops / list comprehension
* Iterators
* Generators