# More advanced concepts in Python!

# Behind the scenes of value assignement

This is a deep concept, that relates to how python uses memory under-the-hood, but a working knowledge of how it works is critical to avoid creating an unintentional mess!

The best way to understand what this is about is with the following example. What do you think will happen to objects a and b after these operations? 

In [1]:
a = [1,2,5]
b = a
b[2] = 10

a,b

([1, 2, 10], [1, 2, 10])

## Copy vs assignement

What happens is that really a and b point to the same place in the memory and share the same data. 

The way to create an object that will *copy* the data in a but not *share* the data with a is to do a ... copy! 

```python
b = a.copy()
```

Things become a little trickier (although it does make perfect sense!) when you deal with lists of lists; for this reason there is also the deepcopy. Try at home what happens with the following example

In [2]:
a = [1,[2,3],5,"om"] 
b = a.copy()
b[1].append(100)
## Print a, b and be surprised!
a,b

([1, [2, 3, 100], 5, 'om'], [1, [2, 3, 100], 5, 'om'])

In [6]:
def data_cleaning(a):
    b=a.copy()
    #b=a
    b[2] = 'bah'
    return b

my_raw_data = [1,2,3,4]
my_clean_data = data_cleaning(my_raw_data)

my_clean_data[2],my_raw_data[2]

('bah', 3)

In [7]:
## Try now 
from copy import deepcopy
b = deepcopy(a)
b[1].pop()
print(b)
print(a)

[1, [2, 3], 5, 'om']
[1, [2, 3, 100], 5, 'om']


In [8]:
a=[1,2,3,4]

a.pop()

a

[1, 2, 3]

# Exceptions

If we recall the Zen of Python, 2 of its 19 lines are devoted to errors:

_Errors should never pass silently._
_Unless explicitly silenced._

A program that doesn't work as expected, if there is no error raised, is very hard to debug! The problem could be anywhere. 

Errors, on the other hand, tell us _where_ things went wrong and what we need to fix. If every time something goes wrong we have an informative error, then debuging is a breeze!

In [9]:
# Errors in Python are called Exceptions. 
# Exceptions are created as follows:

e = Exception()
type(e)

Exception

In [11]:
# There is no point in creating an exception without "raising"
# the exception. 
# Exceptions are raised with the "raise" keyword: 

raise Exception('Oops!')

Exception: Oops!

In [14]:
# Every "part" of your program, for example each function, must be in charge
# of things "going as expected" inside its body. If something goes wrong, it should
# tell us what happened! 

# One way to do this is to check for possible problems before they occur: 

def age_a_person(person):
    if not hasattr(person, 'age'):
        raise Exception(f'The person must have an age attribute! Given: {person}')
    return person.age + 1

age_a_person({'age':3})
age_a_person('notaperson')


Exception: The person must have an age attribute! Given: {'age': 3}

In [15]:
# However, Python encourages a pattern referred to as "EAFP":
# Easier to Ask Forgiveness then Permission 
# This style implies that one should first try, and catch any expected
# errors that occur, handling them then. 

# So how does one catch an error?

def joiner(a,b):
    try:
        return person.age + 1
    except AttributeError as e:
        raise Exception(f'The person must have an age attribute! Given: {person}') from e

age_a_person('notaperson')

Exception: The person must have an age attribute! Given: notaperson

In [16]:
person=[1,2,3]
f'The person must have an age attribute! Given: {person}'

'The person must have an age attribute! Given: [1, 2, 3]'

# Python and Object Oriented Programming (OOP)


When people begin programming, it's natural to think procedurally, to tell the computer what to do:

1. Do one thing
2. Do the next thing
3. Do the third thing

This works for simple applications, but it does not scale.

It requires us to understand, in our heads, every step the computer should do.

To do things more complicated than we can keep in our heads at once, we need to break things down into component parts. Object Oriented Programming is a design pattern to do just that. 

## Understanding State

To understand object-oriented programming, it's important first to understand the term **state**.

You can think of state as "state of the world." 

Consider a data transformation pipeline. Your state consists of the data itself. At any given time, your data might be in any number of states between "fresh and useless" and "just-the-way-you-want-it." If your data comes in separate bits, each bit must be in the right state before it is combined with other bits, which then create new state. 

How do we keep track of all this state, when it becomes to complex to do procedurally? 

## Objects

Object oriented programming seeks to split the state of the world into individual "objects," which are both responsible for keeping track of their own state, and also responsible for knowing how to change it.

O-O reflects nature:

Consider the forest, with all its animals. No one individual keeps track of each fox in the forest: how much fur they have, how much they have eaten, how thirsty they are, etc.

Each fox is in charge of itself. Nobody can put food in the foxes belly.

The other feature of the foxes of the forest: they are all alike in their technical inner workings (they have the same type of stomach, same type of mouth). But they might be in a different state at any given moment (one might be hungry, one might be full).

In O-O programming, we reflect this pattern via "classes" and "instances". "Fox" is a class. Each fox in the forest, is an "instance" of the "Fox" class.

## Polymorphism

Object oriented programming becomes really powerful via polymorphism.

Again, consider all the animals in the forest. There are many different kind of animals! But say I want to go through the forest and give them all water.

Maybe I don't need to know the details of how their stomachs work and how the water goes from their mouths to the rest of their bodies. I just need to give water over to them, and let them take care of the rest.

This is single-dispatch polymorphism!

## Methods

Again, in O-O programming, ojects are both responsible for keeping track of their own state, and also responsible for knowing how to change it.

"Changing state," in all the programming we've seen, is done via functions.

In O-O programming, we have special functions called "methods".

Methods are functions that are defined in a class and "attached" to each instance of that class.

Methods will change or interact with the state of the instance in some way.

In [17]:
## Example of class: 

# How do we create a class? We "construct" it.
# How do we construct a class? With a "constructor" method!
# In python, the constructor method is called "__init__":

class Animal():
    def __init__(self, location):
        # location is a tuple: (int, int)
        self.location = location
        self.water = 0
    
    def give_water(self, water): 
        self.water += water

    def is_thirsty(self):
        return self.water < 1        

    def distance(self, loc):
        x,y = self.location
        xa,ya = loc
        return (x - xa)**2 + (y - ya)**2

In [28]:
# Instatiating the class

animal = Animal((10, 50))

animals=[Animal((x,y)) for x,y in [(1,1),(2,3),(3,4)]]

for animal in animals:
    print(animal.distance((4,5)))

25
8
2


In [23]:
# See the methods at work! 

print(animal.is_thirsty())

# Give it water! See if it's satiated!
animal.give_water(10)
animal.is_thirsty()



True


False

In [61]:
import random

# Child class
# You can overwrite methods with the same name, but keep them if not
# super --> things come from the parent
# Example of inheritance
class Fox(Animal):
    def __init__(self, location, slyness):
        # slyness is a float between [0., 1.]
        super().__init__(location)
        self.slyness = slyness
        
    def is_thirsty(self):
        # Sly foxes lie!
        if self.slyness > random.random():
            return False
        return self.water < 1


# Try creating a fox. 

print("Random float number is ", random.random())


fox=Fox((1,1),slyness=random.random())




type(fox)
print(float(fox.water))

b=fox

b.give_water(111) # method from the parent class

print(float(fox.water))
print(float(fox.slyness))

Random float number is  0.4187335637004892
0.0
111.0
0.058803350229082896


## Models

In data science, our models are very naturally modelled as "objects."

There are many different types of models. And in a given program, you might have both: many"classes" of models, and many "instances" of each model class.

For example: you might be testing 3 different classifiers, and several different "versions" of each classifier, with different hyperparameters.

But for each model, you want to do the same thing:

1. Create it
2. Train it
3. Test it
4. (eventually) Use it

## Polymorphism in Models

Therefore, we can expect that each model class might have the same methods. For example, a "train" method, where we give it our data (assuming each model should operate on the same data, of course!).


In [72]:
# Challenge: 

# Create a class called "ForgetfulClassifier". 

# This classifier should have two methods: fit, predict

# The "fit" method should accept arguments:  x,y (both lists of numbers)
# The "predict" method should accept arguments: x (a single number)

# This classifier is getting very old. For any x value it is expected to predict, it simply guesses the last Y value that it has seen (the last value in the list, y, passed to "fit").

# BONUS: Make the classifier throw an informative error if you try to call "predict" before "fit"


class ForgetfulClassifier():
    # Your code here
    def __init__(self):
        # pass
        self.prediction=None
        
    def fit(self,x,y): 
        self.prediction=y[-1]
        return y[-1]
    
    def predict(self,x):
        
        if self.prediction is None:
            raise Exception("Run fit before predicting!!")
        else:
            return self.prediction
        
        
        """"
        try: self.prediction is not None
            return self.prediction
        except
            print("Run fit before predicting!!")
            
        """
class ForgetfulClassifier():
    def __init__(self):
        self.last_seen=None
    
    def fit(self,x,y):
        self.last_seen=y[-1]
        return(self)
    
    def predict(self,x):
        return self.last_seen

           

clf = ForgetfulClassifier()
print(clf.predict(10))

clf.fit([1,2,3,4,5], [5,6,7,8,9])
clf.predict(10)
print(clf.predict(10))


assert(clf.predict(10) == 9)
assert(clf.predict(5000) == 9)

None
9


## Modules and imports

*Modules* are python files, recognised in the computer as filename.py

Data and methods (functions) defined in the module can become part of Python's *namespace* by using *import*

To appreciate what the name space contains lets experiment with the following

In [73]:
x = sin(5)

NameError: name 'sin' is not defined

In [74]:
from math import sin
x = sin(5)
print(x)

-0.9589242746631385


In [75]:
sin = 3
print(sin(3))

TypeError: 'int' object is not callable

## Typical import structures

```python
from math import sin # imports a single function

from math import sin as sinus # nickname, this is useful when the function imported has long and complicated name


import math # this imports the module in the name space, methods can then be accessed e.g.
math.sin(3)

from math import * #imports all methods in the namespace, not recommended!

```

In [80]:
# Another import example: importing my own functions

from omsuselessfunctions import themostuselessfunctionever as f1

f1()



hello world


ImportError: cannot import name 'col_val' from 'omsuselessfunctions' (/home/jovyan/work/module-intro-to-python/omsuselessfunctions.py)

## More advanced concepts: default values and variable number of arguments in functions

In Python we get to assign default values to inputs of functions. For example 

```python
def f(a=1, b=2):
    return a+b

# This can be validly be called in the following ways (guess the answers!)

f()
f(10)
f(b=4)
f(10,4)
f(a=10,b=4)
f(b=4,a=10)

# but NOT like this!!!
f(a=10,4)
```

## Variable number of arguments

There are instances in intermediate programming with Python that we wish a function to be able to handle an a priori unspecified number of arguments

An example: I want to write a function that returns the maximum of a set of numbers, that will be passed on as arguments, and will work regardless of how many numbers are passed on. 

I would also like the function to default to Inf if no number is passed on. 

In [83]:
# Function for maximum of a variable number of inputs

def maxmany(x=float("inf"),*extra): # this specification makes 
    #                               extra a tuple!!
    runmax = x
    for y in extra: 
        if runmax<y:
            runmax = y
    return(runmax)

# What is going on behind the scenes is that * unpacks a tuple!!

maxmany(-float('inf'),5,10,100)
maxmany(float('inf'),5,10,100,*[1,2,3])

inf

## Passing on dictionaries as inputs

The previous idea can be taken a step further by passing on dictionaries as inputs. 

Dictionaries are unpacked by ** 

This is particularly useful since often we like to specify parameters in a function and refer to them with intuitive names. 

Consider the following example (in which we also use some advanced Python constructs, such as *decorators*) 


In [84]:
from math import log
from abc import ABC, abstractmethod


class Density(ABC):
    
    # Destructuring in parameter declarations
    def __init__(self, **params):
        self.params = params
    
    @abstractmethod
    def log_density(self, some_parameter):
        pass       
    
    def dens(self, x):
        
        # Destructuring in function invocation - the opposite!
        return self.log_density(x, **self.params)
    
    
class Gaussian(Density):    
    def log_density(self, x, mu, sigmasq):
        return -((x-mu)**2)/(2*sigmasq)       
    
class Exponential(Density):
    def log_density(self, x, theta):
        return -theta*x
    
class Bernoulli(Density):
    def log_density(self, x, p):
        return x*log(p)+(1-x)*log(1-p)
    
models = [
    Bernoulli(p = .2),
    Gaussian(mu = 0, sigmasq = 1),
    Exponential(theta = .1)
]

# This is polymorphism! They are different classes, but they have
# the same INTERFACE. This means that at this point in the code,
# we don't need to know how the different models were parameterized, 
# we call the same method on all!

for m in models:
    print(m.dens(.5))

-0.916290731874155
-0.125
-0.05


## Implementing K-Nearest Neighbors

K-Neareset Neighbors is a simple algorithm you can read about [on Wikiepedia](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm). 

Let's implement two things in Python, to solidify different ways of doing things: 

1. The K-Nearest Nieghbors algorithm, as a class.
2. The K-Nearest Neighbors algorith, as a pure function.

This will be simple enough that the usefulness of the difference won't be profound, but I would like you to focus on the difference in implementation itself and how the idea of interfaces can be helpful even when they are in a sense "useless".

In [184]:
class KNearestNeighbors():
    
    def __init__(self,k):
        self.k=k


    def fit(self,x,y):
        self.predictor=x
        self.target=y
        return self
    
    def predict(self,x):
        summaS=[]
        trackerS=[]
        #index=0
        tracker=-1
        d=dict()
        for i in self.predictor:
            tracker +=1
            summa=0
            for j in range(0,len(i)):
                summa += (x[j]-i[j])**2
            summa=summa**(1/len(i))
            summaS.append(summa)
            trackerS.append(tracker)
            d.update({tracker:summa})
            final=trackerS, summaS
            #if summa<summaS:
                #index=tracker
        d=sorted(d.items(), key=lambda x: x[1])
        final=[(i[0]) for i in d]
        #return self.target[index]
        #return sorted(summaS)
        return sum(final[0:self.k])/self.k
        

In [185]:
X = [[1,1,1], [0,0,0], [5,5,5]]
y = [1,0,1]

In [187]:
# Let's test our KNearestNeighbors class: 

model = KNearestNeighbors(2)
model.fit(X,y)

print(model.predict([0,0,0]))
print(model.predict([3,3,3]))



model.predict([0,0,0])

assert(model.predict([0,0,0]) == 0.5)
assert(model.predict([3,3,3]) == 1.0)

0.5
1.0


In [None]:
def knn(K, X, y, new_y):
    pass

In [None]:
# Let's test our KNearestNeighbors functions:

assert(knn(2, X, y, [0,0,0]) == 0.5)
assert(knn(2, X, y, [3,3,3]) == 1.0)