# Python Notebook

Various programs and notes for future reference, many from the Google machine learning course

## Classes:

### General Advice:


1. Only define truly local/instance variables inside \__init\__. Otherwise they can be class objects. \__init\__ is not a constructor, it’s an initializer.

2. Don't overuse classes. A class should not exist if it only has one public method.

@classmethod gives more options for controlling constructor behavior. In C++, we are able to overload the constructor, so we can initialize a class with a variety of different types of arguments. In Python, we can declare a method a classmethod, which can be used to instantiate a new intance of the class. For example, if we want our Date class to accept a data in the format 'dd-mm-yyyy' as a string, we can define a class method which performs the conversion.

In [43]:
class Date(object):

    def __init__(self, day=0, month=0, year=0):
        self.day = day
        self.month = month
        self.year = year
        
    @classmethod
    def from_string(cls, date_as_string):
        day, month, year = map(int, date_as_string.split('-'))
        date1 = cls(day, month, year)
        return date1

date = Date.from_string('11-09-2012')

print('Day: %d, Month: %d, Year: %d' % (date.day, date.month, date.year))

Day: 11, Month: 9, Year: 2012


The Property method can be used to allow exposed variables to be accessed and modified safely without forcing the user to write or call explicit getter and setter methods.

In [15]:
class Celsius:
    def __init__(self, temperature): # our class takes one argument, but we want to make sure it doesn't go below abs zero.
        self.temperature = temperature

    def to_fahrenheit(self): # class method (unit conversion)
        return (self.temperature * 1.8) + 32

    def get_temperature(self): # getter method, allowing the user to access the local variable through a method.
        print("Getting value")
        return self._temperature

    def set_temperature(self, value): # setter method which will raise a value error if the temperature is below absolute zero.
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self._temperature = value

    temperature = property(get_temperature,set_temperature) # the property function makes the first argument a getter, the second a setter, and the third a deleter.

In [None]:
c = Celsius(20) # initialize class
c.temperature # retrieve class variable, which is intercepted by property and calls getter instead
c.temperature = 30 # change class variable, but call setter instead
c.temperature = -300 # raises error because it is below absolute zero.

This can also be implemented more explicitly/readably using decorators, for example:

In [44]:
class Celsius:
    def __init__(self, temperature): # our class takes one argument, but we want to make sure it doesn't go below abs zero.
        self.temperature = temperature

    def to_fahrenheit(self): # class method (unit conversion)
        return (self.temperature * 1.8) + 32

    @property
    def temperature(self): # getter method, allowing the user to access the local variable through a method.
        print("Getting value")
        return self._temperature
    
    @temperature.setter
    def temperature(self, value): # setter method which will raise a value error if the temperature is below absolute zero.
        if value < -273:
            raise ValueError("Temperature below -273 is not possible")
        print("Setting value")
        self._temperature = value

In [46]:
c = Celsius(20) # initialize class
c.temperature # retrieve class variable, which is intercepted by property and calls getter instead
c.temperature = 30 # change class variable, but call setter instead
c.temperature = -300 # raises error because it is below absolute zero.

Setting value
Getting value
Setting value


ValueError: Temperature below -273 is not possible

### Operator Overloading

Like C++, Python allows classes to control their own behavior as operands to the build in operators like +, -, *, /, +=, print( ), int( ), etc. These are controlled using class methods like \__str\__, \__add\__, and \__int\__. Here is some example usage.

In [29]:
class Book:
    def __init__(self, name, serial, cost):
        self.name = name
        self.serial = serial
        self.cost = cost

    def __eq__(self, other):
        return self.serial == other

    def __ne__(self, other):
        return self.serial != other

    def __str__(self):
        return "The name of this book is {}".format(self.name) # str(self.main_var)

    def __add__(self, other): # adding two instances of this class
        return self.cost + other

    def __radd__(self, other): # adding obj to some other class/built-in type (reverse add)
        return self.cost + other

    def __gt__(self, other):
        if isinstance(other, Book):
            return self.name > other.name # other.name is not strictly necessary
        else:
            raise ValueError("Invalid Operand")

    def __lt__(self, other):
        if isinstance(other, Book):
            return self.name < other.name 
        else:
            raise ValueError("Invalid Operand")

    def __ge__(self, other):
        if isinstance(other, Book):
            return self.name >= other.name
        else:
            raise ValueError("Invalid Operand")

    def __le__(self, other):
        if isinstance(other, Book):
            return self.name <= other.name
        else:
            raise ValueError("Invalid Operand")

    def __int__(self):
        return int(self.cost)

    def __iadd__(self, other):
        if isinstance(other, (int, float)):
            self.cost += other
            return self
        else:
            raise ValueError("Invalid Operand")

book1 = Book("The Lord of the Rings", "A93265GD2", 23.54) # creating objects
book2 = Book("A Game of Thrones", "F3254GS20", 14.65)
book3 = Book("Harry Potter", "A352GFD2A", 16.54)

Now the following statements are valid. Otherwise they would all throw errors since operations on these classes is undefined.

In [32]:
print(book1) # these are now valid operations
print(book1 + book2)

print(book1 + 3) # reverse addition (Python knows addition is commutative)
print(3 + book1)

print(book1 == book3)
print(book1 < book3)

book1 += 3

print(book1.cost)

The name of this book is The Lord of the Rings
44.19
32.54
32.54
False
False
32.54


## Loops:

Much of this code is taken from Raymond Hettinger's lectures, in particular [this transcribed talk](https://gist.github.com/JeffPaine/6213790).

### Enumerate

A python program should try to invoke indices as rarely as possible. Python is object oriented/vectorized, not iterative. For example, it is possible to access the indices and objects in a list without iterating over the indices, using enumerate( ).

In [2]:
colors = ['green', 'red', 'blue']

for i, color in enumerate(colors):
    print(i, '--->', color)

0 ---> green
1 ---> red
2 ---> blue


### Reversed

It is possible to loop backwards over a list in Python using the reversed( ) command instead of the usual C style iterators.

In [4]:
colors = ['green', 'red', 'blue']

for color in reversed(colors):
    print(color)

blue
red
green


### Zip

In order to iterate over several collections, the zip( ) command can be used. Note that in Python 3, zip( ) is equivalent to izip( ), which is an iterator, and doesn't generate a new, expensive set of tuples.

In [None]:
names = ['raymond', 'rachel', 'mathew']
colors = ['green', 'red', 'blue', 'yellow']

for name, color in zip(names, colors):
    print(name, '--->', color)

### Else

For loops can also be equipped with else clauses instead of flag variables, which will be called if the for loop terminates without a break clause. For example, this is how the find command is implemented.

In [21]:
def find(sequence, target):
    for i, value in enumerate(sequence):
        if value == target:
            break
    else:
        return -1
    return i

sequence = [1,2,3,4,5,6,7,8]

index = find(sequence, 9)

print('%d was found at index %d' % (9, index))

index = find(sequence, 8)

print('%d was found at index %d' % (8, index))

9 was found at index -1
8 was found at index 7


## Dictionaries

### Lookup and Hash Tables

Dictionaries in Python can be constructed and looped over in a variety of ways. They are useful whenever sorting or accessing values by reference is important. They should never be iterated over, since their only advantage over lists is speed of access. 

Dictionaries are implemented as hashtables in Python. When an entry is added to the dictionary, the key is hashed, and the (key, value) pair is stored in a collision list indexed by the hashed key. The program then iterates through the collision list until it finds the key in question. Hopefully, hashing is sparse enough that no collisions occur, and the first element is the desired key. In other words, hash tables contain tuples of all hashes and all (key, value) pairs.

The following is a pseudocode hash-lookup in Python.

In [24]:
def lookup(d, key):     
    '''dictionary lookup is done in three steps:        
        1. A hash value of the key is computed using a hash function.        
        2. The hash value addresses a location in d.data which is 
        supposed to be an array of "buckets" or "collision lists"           
        which contain the (key,value) pairs.        
        3. The collision list addressed by the hash value is searched           
        sequentially until a pair is found with pair[0] == key. The           
        return value of the lookup is then pair[1]. '''     
    h = hash(key)                  # step 1     
    cl = d.data[h]                 # step 2     
    for pair in cl:                # step 3         
        if key == pair[0]:             
            return pair[1]     
        else:         
            raise(KeyError, "Key %s not found." % key)

### Accessing and Modifying Dictionaries

There are a number of ways of accessing python dictionaries.

In [39]:
names = ['raymond', 'rachel', 'mathew']
colors = ['green', 'red', 'blue', 'yellow']

d = dict(zip(names, colors))

for k in d: # iterating over dictionary automatically returns keys, not values or tuples.
    print(k)
    
for k in list(d.keys()): # this deletes entries with keys starting with the letter r. when changes are being made to the dict, this is prefarable, because it creates a local copy of the keys
    if k.startswith('r'):
        del d[k]
        
d = {k : d[k] for k in d if not k.startswith('r')} # list comprehension for dictionaries

raymond
mathew
rachel


Tuples in dictionaries can be accessed directly without lookup using the items() method.

In [41]:
for k, v in d.items():
    print(k, '--->', v)

mathew ---> blue


### Counting

Dictionaries are often used for counting elements in a list. For example, here is a possible implementation.

In [57]:
colors = ['red', 'green', 'red', 'blue', 'green', 'red']

def basic_count(dictionary): # a good, simple way to count
    d = {}
    for color in colors:
        if color not in d:
            d[color] = 0
        d[color] += 1
    return d

def get_count(dictionary): # a slightly more complicated way that looks for color, and initializes it to 0 if not found
    d = {}
    for color in colors:
        d[color] = d.get(color, 0) + 1
    return d
    
from collections import defaultdict

def default_count(dictionary): # returns a defaultdict object. defaultdict is an object defined in collections, along with deque, intended to be high-performance, frequently modified data structures.
    d = defaultdict(int)
    for color in colors:
        d[color] += 1
    return d

default_count(colors)

defaultdict(int, {'blue': 1, 'green': 2, 'red': 3})

### Grouping

Dictionaries can also be used to group/sort objects in a list, for instance by first letter or length.

In [20]:
names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

# In this example, we're grouping by name length

def basic_grouping(names): # checks for existence before appending
    d = {}
    for name in names:
        key = len(name)
        if key not in d:
            d[key] = []
        d[key].append(name)
    return d
    
print(basic_grouping(names))

def better_grouping(names): # defaults d[key] to an empty list if it doesn't exist
    d = {}
    for name in names:
        key = len(name)
        d.setdefault(key, []).append(name)
    return d

print(better_grouping(names))

from collections import defaultdict

def best_grouping(names):
    d = defaultdict(list)
    for name in names:
        key = len(name)
        d[key].append(name)
    return d
    
print(dict(best_grouping(names)))

{5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}
{5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}
{5: ['roger', 'betty'], 6: ['rachel', 'judith'], 7: ['raymond', 'matthew', 'melissa', 'charlie']}


### dict.get( )

The dict.get( ) method retrieves a dictionary entry by key, and specifies a default value if it's false. This is similar to the DefaultDict behavior.

In [8]:
d = {'green' : 0, 'blue': 1, 'orange': 3}

x = d.get('red', None) # tried to retrieve value, and returns None when it fails

print(repr(d))

{'green': 0, 'blue': 1, 'orange': 3}


## Maps, Lambdas, and List Comprehension

There are a number of ways of applying functions to elements of lists. In general, list comprehensions are the preferred method, since they tend to avoid lambda functions.

In [15]:
nums = [1, 2, 3, 4]

squares_a = [x**2 for x in nums] # these do the same thing
squares_b = map(lambda x: x**2, nums)  # except this returns an iterable, which needs to be converted to a list for certain uses

print(squares_a, list(squares_b))

[1, 4, 9, 16] [1, 4, 9, 16]


These are both reasonable solutions, but if we need to filter the results, list comprehension is much cleaner.

In [16]:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9]

even_squares_a = [x**2 for x in nums if x % 2 == 0]
even_squares_b = map(lambda x: x**2, filter(lambda x: x % 2 == 0, nums)) # this is ugly

print(even_squares_a, list(even_squares_b))

[4, 16, 36, 64] [4, 16, 36, 64]


You should avoid list comprehensions that become unwieldly. Generally, more than two layers of depth is bad.

In [19]:
arr = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

flat_square = [[x**2 for x in row] for row in arr] # this is okay, don't do more than this

print(flat_square)

[[1, 4, 9], [16, 25, 36], [49, 64, 81]]


### Generator Comprehension

For large lists, we do not want to store the full list in memory. That's where generator expressions come in. We can slightly modify the list comprehension syntax to create a generator which returns on the subsequent value. Note that, since generators generate the next value when next is called, they are vulnerable to changes in the underlying object they're iterating over.

In [33]:
import numpy as np

arr = 2*np.ones(1000,)

squares = (x**2 for x in arr) # generator which doesn't hold everything in memory

print(next(squares))
print(next(squares))

arr[2:4] = [4, 4] # danger of generator expressions, susceptible to changes in the underlying object.

print(next(squares))
print(next(squares))

4.0
4.0
16.0
16.0


## Collections

The collections module is optimized for high-performance modification. It contains objects like deque and defaultdict, which are faster than lists or dicts if they are being frequently modified.

### Updating Sequences

This is a bad way of storing data that is changed frequently (using a list).

In [38]:
names = ['raymond', 'rachel', 'matthew', 'roger',
         'betty', 'melissa', 'judith', 'charlie']

del names[0]
# The below are signs you're using the wrong data structure
names.pop(0)
names.insert(0, 'mark')

names

['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie']

A far better way is to use the deque object from collections, with built in methods popleft and appendleft.

In [40]:
from collections import deque

names = deque(['raymond', 'rachel', 'matthew', 'roger',
               'betty', 'melissa', 'judith', 'charlie'])

# More efficient with deque
del names[0]
names.popleft()
names.appendleft('mark')

names

deque(['mark', 'matthew', 'roger', 'betty', 'melissa', 'judith', 'charlie'])

### Named Tuples

Python has a built-in class, called namedTuple, a child class of tuple, with the same functionality. This is used for code clarity/readability.

In [29]:
from collections import namedtuple
TestResults = namedtuple('TestResults', ['failed', 'attempted'])

result1 = TestResults(20, 30)

result1

TestResults(failed=20, attempted=30)

## Strings

### Concatenating Strings

A list of strings can be efficiently concatenated using the join command, for example:

In [31]:
names = ['bob', 'sylvia', 'robert', 'haley', 'anne']
print(', '.join(names))

bob, sylvia, robert, haley, anne


### Extracting numbers from strings

Consider data information encoded in a string, for instance in the format dd-mm-yyyy, e.g. 23-05-2013. To convert this into a tuple of numbers, I could split the string by '-', and cast each individually as an integer, or I can use a map. Map applies the function, in this case int, to every element in the right-hand list.

In [4]:
string_date = '25-03-2013'
day, month, year = map(int, string_date.split('-'))
print("Day: %d, Month: %d, Year: %d" % (day, month, year))

Day: 25, Month: 3, Year: 2013
