<img src = "https://github.com/barcelonagse-datascience/academic_files/raw/master/bgsedsc_0.jpg">

# More advanced concepts in Python!

# Behind the scenes of value assignement

This is a deep concept, that relates to how python uses memory under-the-hood, but a working knowledge of how it works is critical to avoid creating an unintentional mess!

The best way to understand what this is about is with the following example. What do you think will happen to objects a and b after these operations? 

In [None]:
a = [1,2,5]
b = a.copy()
b[2] = 10

In [None]:
print(a)
b

[1, 2, 5]


[1, 2, 10]

## Copy vs assignement

What happens is that really a and b point to the same place in the memory and share the same data. 

The way to create an object that will *copy* the data in a but not *share* the data with a is to do a ... copy! 

```python
b = a.copy()
```

Things become a little trickier (although it does make perfect sense!) when you deal with lists of lists; for this reason there is also the deepcopy. Try at home what happens with the following example

In [None]:
a = [1,[2,3],5,"om"] 
b = a.copy()
b[1].append(100)
## Print a, b and be surprised!

In [None]:
print(a)
print(b)

[1, [2, 3, 100], 5, 'om']
[1, [2, 3, 100], 5, 'om']


In [None]:
## Try now 
from copy import deepcopy
b = deepcopy(a)
b[1].pop()
print(b)
print(a)

[1, [2, 3], 5, 'om']
[1, [2, 3, 100], 5, 'om']


# Regular expression for text analysis

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are used in different languages (UNIX, R, etc.) so it is useful to get used to it.

We will showcase main special caracters and methods of Python's $re$ package, but please refer to the [documentation](https://docs.python.org/3/library/re.html) for a more exhaustive exploration of functionalities.


## Motivational example

Let's pretend we have an html text. Find all words tagged as bold (`<b>example</b>`) in html, turn them to italics (`<i>example</i>`) and add the word *amazing* before them. 

In [None]:
import re

my_text='I would like the importance of proper education in <b>data science</b>, <b>maths</b> and <b>statistics</b>.'

# preffix 'r' is an indicator of regular expression
my_regex=r'<b>([a-z\s]+)</b>'
replace_by=r'<i>amazing \1</i>'

re.sub(my_regex, replace_by,my_text)

'I would like the importance of proper education in <i>amazing data science</i>, <i>amazing maths</i> and <i>amazing statistics</i>.'

Regular expression syntax can be traslated to:
- Select strings that start by `<b>`and end by `</b>` and only have lower case letters a to z or white spaces in between.
- Replace them by a string that starts by `<i>`, then ` amazing`, then the actual string, and finally `</i>` 

## Some special characters to form regular expressions


*   `^` Start of string
*   `$` End of string
*   `.` One character, no matter which (except line break)
*   `*` match 0 or more repetitions of the preceding RE
*   `+` match 1 or more repetitions of the preceding RE
*   `?` match 0 or 1 repetition of the preceding RE
*   `{m,n}` Causes the resulting RE to match from m to n repetitions of the preceding RE. The comma and the n are optional depending on the case
*   `|` OR. E.g. A|B means match RE A or B
*   `(...)` Matches whatever regular expression is inside the parentheses
*   `[]` Defines a subset of characters to match
*   `\` Either escapes special characters (permitting you to match characters like '*', '?', and so forth), or signals a special sequence (e.g. `\s` means a white space).
*   `\w` Matches a word (letters only)
*   `\W` Matches a word (letters and characters, equivalent to `[^a-zA-Z0-9_]`)





## Some methods to apply with regular expressions


*   `re.search(pattern, string)`
Scan through string looking for locations where the regular expression pattern produces a match, and return a corresponding match object.
*   `re.match(pattern, string)`
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
*   `re.split(pattern, string)`
Split string by the occurrences of pattern
*   `re.sub(pattern, repl, string)`
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl
*   `re.findall()`
Search for “all” occurrences that match a given pattern






## Final example of regular expression

Check here how the previous elements apply to one particular case, similar to the previous one. Now with a list of two texts.

In [None]:
my_text=['Data is the new black gold',
         'I would like the <span><span>importance</span> of proper education in <b>data science</b>, <b>big data</b>, <b>maths</b> and <b>statistics</b>.']


In [None]:
# Show me the matches starting with data
my_regex=r'data' # or [Dd]data if we don't use Ignorecase
for i,t in enumerate(my_text):
    aux=re.match(my_regex,t, flags=re.IGNORECASE)
    if aux:
        print((str(i),aux.group()))
# If there is any match, gives the the matched expression

('0', 'Data')


In [None]:
# re.search looks for the pattern in any position
for i,t in enumerate(my_text):
    aux=re.search(my_regex,t, flags=re.IGNORECASE)
    if aux:
        print((str(i),aux.group()))
# If there is any match, gives the the matched expression

('0', 'Data')
('1', 'data')


In [None]:
# re.findall looks for multiple matches in any position
for i,t in enumerate(my_text):
    aux=re.findall(my_regex,t, flags=re.IGNORECASE)
    if aux:
        print((str(i),aux)) 
# If there is any match, gives the the matched expression

('0', ['Data'])
('1', ['data', 'data'])


In [None]:
# Replace <b> or <span> by <i> but only starting by 'd' or 'i' 
# and removing duplicates in html tags
my_text=['Data is the new black gold',
         'I would like the <span><span>importance</span> of proper education in <b>data science</b>, <b>big data</b>, <b>maths</b> and <b>statistics</b>.']

# preffix 'r' is an indicator of regular expression
my_regex=r'(<b>)+([di]{1}[a-z\s]*)(</b>)+|(<span>)+([di]{1}[a-z\s]*)(</span>)+'
replace_by=r'<i>amazing \1</i>'

[re.sub(my_regex, replace_by,x) for x in my_text]

['Data is the new black gold',
 'I would like the <i>amazing </i> of proper education in <i>amazing <b></i>, <b>big data</b>, <b>maths</b> and <b>statistics</b>.']

# Exceptions

If we recall the Zen of Python, 2 of its 19 lines are devoted to errors:

_Errors should never pass silently._
_Unless explicitly silenced._

A program that doesn't work as expected, if there is no error raised, is very hard to debug! The problem could be anywhere. 

Errors, on the other hand, tell us _where_ things went wrong and what we need to fix. If every time something goes wrong we have an informative error, then debuging is a breeze!

In [None]:
# Errors in Python are called Exceptions. 
# Exceptions are created as follows:

e = Exception()
type(e)

Exception

In [None]:
# There is no point in creating an exception with "raising"
# the exception. 
# Exceptions are raised with the "raise" keyword: 

raise Exception('Oops!')

Exception: Oops!

In [None]:
# Every "part" of your program, for example each function, must be in charge
# of things "going as expected" inside its body. If something goes wrong, it should
# tell us what happened! 

# One way to do this is to check for possible problems before they occur: 

def age_a_person(person):
    if not hasattr(person, 'age'):
        raise Exception(f'The person must have an age attribute! Given: {person}')
    return person.age + 1

age_a_person('notaperson')

Exception: The person must have an age attribute! Given: notaperson

In [None]:
# However, Python encourages a pattern referred to as "EAFP":
# Easier to Ask Forgiveness then Permission 
# This style implies that one should first try, and catch any expected
# errors that occur, handling them then. 

# So how does one catch an error?

def joiner(a,b):
    try:
        return person.age + 1
    except AttributeError as e:
        raise Exception(f'The person must have an age attribute! Given: {person}') from e

age_a_person('notaperson')

Exception: The person must have an age attribute! Given: notaperson

# Modules and imports

*Modules* are python files, recognised in the computer as filename.py

Data and methods (functions) defined in the module can become part of Python's *namespace* by using *import*

To appreciate what the name space contains lets experiment with the following

In [None]:

x= sin(5)

NameError: name 'sin' is not defined

In [None]:
from math import sin
x = sin(5)
print(x)

-0.9589242746631385


In [None]:
sin = 3
print(sin(3))

TypeError: 'int' object is not callable

## Typical import structures

```python
from math import sin # imports a single function

from math import sin as sinus # nickname, this is useful when the function imported has long and complicated name


import math # this imports the module in the name space, methods can then be accessed e.g.
math.sin(3)

from math import * #imports all methods in the namespace, not recommended!

```

In [None]:
# Run this only if running from Collab
# Mount google drive if running from Google Collab

from google.colab import drive
drive.mount('/content/drive')

# Set current directory if running from Google Collab
import os
os.chdir('/content/drive/My Drive/Classroom/19D031 Foundations in Data Science All Programs (except DS)/FDS_materials/intro_programming/Python')

In [None]:
# Another import example: importing my own functions

# this only works if you have access to your local file and your function is in a local file called omsuselessfunctions.py
from omsuselessfunctions import themostuselessfunctionever as f1

f1()


KeyError: 'packyou.github.joanDSC'

# More advanced concepts: default values and variable number of arguments in functions

In Python we get to assign default values to inputs of functions. For example 

```python
def f(a=1, b=2):
    return a+b

# This can be validly be called in the following ways (guess the answers!)

f()
f(10)
f(b=4)
f(10,4)
f(a=10,b=4)
f(b=4,a=10)

# but NOT like this!!!
f(a=10,4)
```

# Python and Object Oriented Programming (OOP)


When people begin programming, it's natural to think procedurally, to tell the computer what to do:

1. Do one thing
2. Do the next thing
3. Do the third thing

This works for simple applications, but it does not scale.

It requires us to understand, in our heads, every step the computer should do.

To do things more complicated than we can keep in our heads at once, we need to break things down into component parts. Object Oriented Programming is a design pattern to do just that. 

## Understanding State

To understand object-oriented programming, it's important first to understand the term **state**.

You can think of state as "state of the world." 

Consider a data transformation pipeline. Your state consists of the data itself. At any given time, your data might be in any number of states between "fresh and useless" and "just-the-way-you-want-it." If your data comes in separate bits, each bit must be in the right state before it is combined with other bits, which then create new state. 

How do we keep track of all this state, when it becomes to complex to do procedurally? 

## Objects

Object oriented programming seeks to split the state of the world into individual "objects," which are both responsible for keeping track of their own state, and also responsible for knowing how to change it.

O-O reflects nature:

Consider the forest, with all its animals. No one individual keeps track of each fox in the forest: how much fur they have, how much they have eaten, how thirsty they are, etc.

Each fox is in charge of itself. Nobody can put food in the foxes belly.

The other feature of the foxes of the forest: they are all alike in their technical inner workings (they have the same type of stomach, same type of mouth). But they might be in a different state at any given moment (one might be hungry, one might be full).

In O-O programming, we reflect this pattern via "classes" and "instances". "Fox" is a class. Each fox in the forest, is an "instance" of the "Fox" class.


## Methods

Again, in O-O programming, ojects are both responsible for keeping track of their own state, and also responsible for knowing how to change it.

"Changing state," in all the programming we've seen, is done via functions.

In O-O programming, we have special functions called "methods".

Methods are functions that are defined in a class and "attached" to each instance of that class.

Methods will change or interact with the state of the instance in some way.

In [None]:
## Example of class: 

# How do we create a class? We "construct" it.
# How do we construct a class? With a "constructor" method!
# In python, the constructor method is called "__init__":

class Animal():
    def __init__(self, initial_water = 0): 
        # water is an integer (liters!)
        self.water = initial_water
    
    def give_water(self, water): 
        self.water += water

    def is_thirsty(self):
        return self.water < 1        

In [None]:
# Instatiating the class

animal = Animal()

In [None]:
# See the methods at work! 

animal.is_thirsty()

# Give it water! See if it's satiated!

In [None]:
import random

# Example of inheritance
class Fox(Animal):
    def __init__(self, slyness, initial_water = 0):
        # slyness is a float between [0., 1.]
        super().__init__(initial_water)
        self.slyness = slyness
        
    def is_thirsty(self):
        # Sly foxes lie!
        if self.slyness > random.random():
            return False
        return self.water < 1


# Try creating a fox. 

## Models

In data science, our models are very naturally modelled as "objects."

There are many different types of models. And in a given program, you might have both: many"classes" of models, and many "instances" of each model class.

For example: you might be testing 3 different classifiers, and several different "versions" of each classifier, with different hyperparameters.

But for each model, you want to do the same thing:

1. Create it
2. Train it
3. Test it
4. (eventually) Use it

### Exercise

Create a class called "ForgetfulClassifier". 

This classifier should have two methods: fit, predict

The "fit" method should accept arguments:  x,y (both lists of numbers)
The "predict" method should accept arguments: x (a single number)

This classifier is getting very old. For any x value it is expected to predict, it simply guesses the last Y value that it has seen (the last value in the list, y, passed to "fit").

HINT: You may not even need a constructor! 


In [None]:
# Challenge: 




# -------------------------------------------
# Your code here - define ForgetfulClassifier!
# -------------------------------------------



#test it ...

clf = ForgetfulClassifier()

clf.fit([1,2,3,4,5], [5,6,7,8,9])

assert(clf.predict(10) == 9)
assert(clf.predict(5000) == 9)