# Python/R Basics


## Python Basics
 




### Define Functions

Note that type hint will not force type conversion (unlike cython or other statically typed language).

In [1]:
def myfunc(a:float, *args, **kwargs) -> str:
    return str(a)

In [2]:
# This should not work, but it DOES!
import numpy as np
x = np.array([1,1])
myfunc(x)

'[1 1]'

`*args` is called list unpacks. Inside the function, the `args` are just like lists. 

In [4]:
def mysum(*args):
    result = 0
    for x in args:
        result += x
    return result
mysum(1,2,3) # Works
mysum(2,3,5,6) #Works

16

On the other hand, `**kwargs` is called keyword argument. It is basically a python dictionary. 

In [5]:
def my_concat(**kwargs):
    result = ""
    
    for k, v in kwargs.items():
        result += v
    return result
my_concat(x="a",y="b") # works
my_concat(fff = 1, bsr=2) # not really

TypeError: ignored

### Exception Handling

---
The most commonly used ways for exception handling is to raise an exception (also called throw). 

In [6]:
def raise_exception(x):
    raise Exception("I am an EXCEPTION!!!") # Something bad has happend

def catcher(x):
    try:
        raise_exception(x) # This will run the code. If everything is fine, it will return normally.
    except (TypeError, NameError):  # If a specific error occurs, this will follow the behavior. 
        print("I am ok with this!")
    except Exception as e: # This is often used to handle unknown exception, by letting someone else do the job
        raise e
    finally: # This will always execute no matter what
        print("Let us swallow everything when exception occurs!")
    
    

In [7]:
catcher(1)

Let us swallow everything when exception occurs!


Exception: ignored

There are quite some problem with the following approach.



*   It breaks the program, as long as one exception is not handled.
*   This is ok if we are testing our code. However, if this is a production system, you don't want a night time call to restart the system.
*   Once one function throws an exception, everyone else that calls the function has to modify their code by addding `try-except` blocks.
*   Many exception will be passed all the way to the top, and then handled. However, the top function does not know the details of each function! Therefore, it is extremely hard to devise a complete plan. 



An alternative way is to use log. There are many logging options and we will not delve into the details. The idiom is to log what goes wrong and specify the bevahior. 

The advantage is that you will keep the program warning, and by adjusting the log level, you can adjust the behavior. However, **someone still have to handle the exceptions!**

In [8]:
import logging
logging.info("This is some useful information.")
logging.warning("This is some warning!")
logging.error("Something went wrong!")


ERROR:root:Something went wrong!


A final option, that is very popular is to use a Monad. Monad is quite complex to explain. So let us see an example. 

In [9]:
class Failure():
    def __init__(self, value, failed=False):
        self.value = value
        self.failed = failed
    def get(self):
        return self.value
    def is_failed(self):
        return self.failed
    def __str__(self):
        return ' '.join([str(self.value), str(self.failed)])
    def __or__(self, f):
        if self.failed:
            return self
        try:
            x = f(self.get())
            return Failure(x)
        except:
            return Failure(None, True)

In [10]:
# This will work.
from operator import neg
x = '1'
y = Failure(x) | int | neg | str
print(y)

-1 False


In [11]:
# This will not
from operator import neg
x = 'hahaha'
y = Failure(x) | int | neg | str
print(y)

None True


A beautiful collection of functional programming primitives can be found [here](https://github.com/jasondelaat/pymonad.git). Use the following command to install.


### Python Class

In [12]:
class MyClass(object):
    def __init__(self, x):
        self.x = x
    def __del__(self): # WARNING: Perhaps a very bad idea!
        print("I am gone")

In [13]:
my_class = MyClass(1)

In [14]:
del my_class

I am gone


In [15]:
my_class

NameError: ignored

In [16]:
my_class_a = MyClass(1)
my_class_b = my_class_a
my_class_c = MyClass(1)

In [17]:
my_class_b.x= 2
print(my_class_a.x) # Note that this is a reference to the class, therefore, they are pointing to the same thing which is why it changes. 

2


In [18]:
my_class_b == my_class_a

True

In [19]:
my_class_a = MyClass(1)
my_class_c = MyClass(1)
my_class_a == my_class_c

I am gone


False

In [20]:
from copy import deepcopy
my_class_a = MyClass(1)
my_class_b = deepcopy(my_class_a)
my_class_b == my_class_a

I am gone
I am gone


False

In [21]:
my_class_b.x= 2
print(my_class_a.x)

1


### The Ghost Bus Incidence

---
Is it usually a terrible idea to use mutable variables as default argument. The following snippets illustrate the point. 

In [22]:
class GhostBus:
    def __init__(self, passengers=[]):
        self.passengers = passengers
    
    def pick(self, name):
        self.passengers.append(name)
        
    def drop(self, name):
        self.passengers.remove(name)

In [23]:
# Run this several times
ghost_bus = GhostBus()
ghost_bus.pick('A Ghost')
ghost_bus.passengers

['A Ghost']

What goes wrong here? Note that self.passengers is a reference to passengers, and passengers is a refernece to `[]` (which is global). Note when you mutate self.passengers, you are mutating `[]` as well. So please use `None` instead. 

### Common Data Structures: List

---
Python list is a little bit like C++ vector, except it can hold any type of object. It is ordered. 

In [38]:
a = []
# a = list()
b= [1,a,'2']

In [39]:
b[0]

1

In [40]:
b[:1]

[1]

In [41]:
b[1:]

[[], '2']

In [42]:
b[2:3]

['2']

In [43]:
b[-1]

'2'

In [44]:
b[:-2]

[1]

In [45]:
b.append(5)
b

[1, [], '2', 5]

In [46]:
b.extend([1,2])
b

[1, [], '2', 5, 1, 2]

In [47]:
b.insert(1,'haha')
b

[1, 'haha', [], '2', 5, 1, 2]

In [48]:
del b[0]
b

['haha', [], '2', 5, 1, 2]

In [49]:
b.remove(1)

In [50]:
matrix  = [[1,2],[3,4],[5,6],[7,8]]
matrix

[[1, 2], [3, 4], [5, 6], [7, 8]]

In [53]:
tranpose =[[row[i] for row in matrix] for i in range(2)]

To understand what happens. Note that we have used a syntax. In short.

```
x = [i*2 for i in range(10]
```

is the same as

```
x = list()
for i in range(10):
    x.append(i)
```

### Common Data Structures: Set

---
This is essentially a hashset, basically means it is unordered. The 'equivalent' in C++ will be unordered_set. Also, there are no duplicate element. 

In [60]:
a = {1,2,3}

In [61]:
my_set = {1, 3}
print(my_set)
my_set.add(2)
print(my_set)
my_set.update([2, 3, 4])
print(my_set)
my_set.update([4, 5], {1, 6, 8})
print(my_set)

{1, 3}
{1, 2, 3}
{1, 2, 3, 4}
{1, 2, 3, 4, 5, 6, 8}


In [62]:
my_set.add(1)
my_set

{1, 2, 3, 4, 5, 6, 8}

In [63]:
my_set.remove(1)
my_set

{2, 3, 4, 5, 6, 8}

In [64]:
set_a = {1,2,3}
set_b = {3,4,5}

Here are some set operations. Pretty self-explanatory. 

In [65]:
print(set_a|set_b)
print(set_a - set_b)
print(set_b - set_a)
print(set_a.union(set_b))
print(set_a.intersection(set_b))
print(set_a^set_b)

{1, 2, 3, 4, 5}
{1, 2}
{4, 5}
{1, 2, 3, 4, 5}
{3}
{1, 2, 4, 5}


### Common Data Structures: Dict

---
Dict is basically a hashmap. Its 'equivalent' in C++ will be unordered map. Therefore it is not with an order. To avoid pain, if you need order, use OrderedDict. 

In [None]:
a = dict()
a = {'x':'1', 'y':'2'}

In [None]:
print(a['x'])
print(a['not_here'])

In [None]:
a['new_element'] = 'haha'
print(a)

In [None]:
print(a.keys())
print(a.values())

In [None]:
del a['new_element']

In [None]:
a

In [None]:
keys = ['a','b','c']
values = [1,2,3]
dict_from_zip = dict(zip(keys, values))
print(dict_from_zip)

In [None]:
def my_concat(**kwargs):
    result = ""
    
    for k, v in kwargs.items():
        result += v
    return result
my_concat(x="a",y="b")

In [None]:
my_concat(**a)

In [None]:
# You can also use dict comprehension to shorten your code. 
odd_squares = {x: x*x for x in range(11) if x % 2 == 1}
print(odd_squares)

### Common Data Structure: NamedTuple

In [66]:
from collections import namedtuple

In [67]:
employee = namedtuple('Employee', ['age','place', 'education'])

In [68]:
tom = employee(age=10, place='beijing', education='none')

In [69]:
print(tom)

Employee(age=10, place='beijing', education='none')


### Common Data Structure: dataclass

---

Data class is a great way to pass many parameters to a function. It helps with documentation, with range check, so people won't just stack anything into it. 

In [70]:
from dataclasses import dataclass, field
from typing import Optional

In [71]:
@dataclass
class MyDataClass:
    name : str = field(
    default='tom',
    metadata={'help':"Name of the person"})
    
    age: Optional[int] = field(
    default = None,
    metadata={'help':"Age of the pesson. Optional."})
    
    vip: int = field(
    default = 100,
    metadata = {'help':"Some very important field."})
        

    def __post_init__(self): # This function will help you to handle ilegal argument. 
        if self.vip <= 0:
            raise Exception("That important thing has to be larger than 0")
            
    @property
    def age_type(self):
        if self.age >= 100:
            return 'You are old'
        else:
            return 'You are still young' 

In [72]:
my_data_class = MyDataClass(name='jerry', age = 20)
print(my_data_class)

MyDataClass(name='jerry', age=20, vip=100)


In [73]:
print(my_data_class.age)
print(my_data_class.age_type)

20
You are still young


A word about docs. 

In general, using [Spinx](https://www.sphinx-doc.org/en/master/) to generate a documentaion is a pretty good idea. Therefore, some command should be given to functions. In general, for public api's, the docstring should include at least 

1.   Functionality
2.   Argument type and explanation.
3.   Return type.
4.   (Optional) A use case. 

Note that if a function will change some of the input parameter. This **MUST** be highlighted in the doc. 


## R 

---
Before we venture into more advanced staff. Let us introduce very briefly what R does, and magic functions. To use R, you have to activate the functionality. 

In [74]:
%load_ext rpy2.ipython

  from pandas.core.index import Index as PandasIndex


To use R, we can use `%%R` cell magic. 

In [75]:
%%R # This means 
install.packages('caret')

NULL


In [76]:
%%R 
library('caret')

Calls: <Anonymous> -> <Anonymous> -> withVisible -> library




Error in library("caret") : there is no package called ‘caret’
Calls: <Anonymous> -> <Anonymous> -> withVisible -> library


In [77]:
%%R
a  <- 1
2 -> b
c = 1
a == c

[1] TRUE


In [78]:
%%R
for (i in 1:100){
    print(i)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
[1] 21
[1] 22
[1] 23
[1] 24
[1] 25
[1] 26
[1] 27
[1] 28
[1] 29
[1] 30
[1] 31
[1] 32
[1] 33
[1] 34
[1] 35
[1] 36
[1] 37
[1] 38
[1] 39
[1] 40
[1] 41
[1] 42
[1] 43
[1] 44
[1] 45
[1] 46
[1] 47
[1] 48
[1] 49
[1] 50
[1] 51
[1] 52
[1] 53
[1] 54
[1] 55
[1] 56
[1] 57
[1] 58
[1] 59
[1] 60
[1] 61
[1] 62
[1] 63
[1] 64
[1] 65
[1] 66
[1] 67
[1] 68
[1] 69
[1] 70
[1] 71
[1] 72
[1] 73
[1] 74
[1] 75
[1] 76
[1] 77
[1] 78
[1] 79
[1] 80
[1] 81
[1] 82
[1] 83
[1] 84
[1] 85
[1] 86
[1] 87
[1] 88
[1] 89
[1] 90
[1] 91
[1] 92
[1] 93
[1] 94
[1] 95
[1] 96
[1] 97
[1] 98
[1] 99
[1] 100


In [79]:
%%R
myfunc <- function(a){
    a = a+1
    return(a+1)
}

In [80]:
%%R
myfunc(a) # It will usually make a copy

[1] 3


In [81]:
%%R
a

[1] 1


In [82]:
%%R
data(mtcars) # This is a built-in R dataset

In [83]:
%%R
summary(mtcars)

      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000  

In [84]:
%%R
mtcars$mpg

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4


# Magic Functions in Python Object


In [85]:
class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

Let us see if we can print it out in a nice way. 

In [86]:
class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return 'Vector(%r,%r)' % (self.x, self.y)
    def __str__(self):                              
        return 'Vector(%r,%r)' % (self.x, self.y)

In [87]:
v = Vector(1,2)
print(str(v))
print(v)

Vector(1,2)
Vector(1,2)


How about some arithmatics?

In [88]:
class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return 'Vector(%r,%r)' % (self.x, self.y)
    
    def __add__(self, other):
        x = self.x + other.x
        y = self.y + other.y
        return Vector(x, y)
    
    def __sub__(self, other):
        x = self.x - other.x
        y = self.y - other.y
        return Vector(x, y)
    
    def __mul__(self, scalar):
        return Vector(self.x * scalar, self.y * scalar)

In [89]:
v1 = Vector(0,0)
v2 = Vector(1,2)

v1+v2

Vector(1,2)

How about comparison

In [90]:
from math import hypot

class Vector:
    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return 'Vector(%r,%r)' % (self.x, self.y)
    
    def __add__(self, other):
        x = self.x + other.x
        y = self.y + other.y
        return Vector(x, y)
    
    def __sub__(self, other):
        x = self.x - other.x
        y = self.y - other.y
        return Vector(x, y)
    
    def __mul__(self, scalar):
        return Vector(self.x * scalar, self.y * scalar)
    
    def __abs__(self):
        return hypot(self.x, self.y)
    
    def __bool__(self):
        return bool(abs(self))
    
    def __eq__(self, other):
        return self.x == other.x and self.y == other.y
    
    def __lt__(self, other):
        return abs(self) < abs(other)
    
    def __gt__(self, other):
        return abs(self) > abs(other)

In [91]:
v1 = Vector(1,1)
v2 = Vector(1,1)
v3 = Vector(1,2)

print(v1 == v2)
print(v1 == v3)

print(v3 > v1)
print(v1 < v3)

True
False
True
True


## Basic Functional Programming in Python

### Common Higher Older Function

In [92]:
my_input = [1,2,3,4,5,6,6]
result = map(lambda x: x+1, my_input)
print(result) # map is lazy
print(list(result))

<map object at 0x7f671a066e80>
[2, 3, 4, 5, 6, 7, 7]


In [93]:
from functools import reduce
result = reduce(lambda x, y: x+y, filter(lambda x: x > 3, map(lambda x: x+1, my_input)))

In [94]:
print(result)

29


### Decorators

In [95]:
def my_decorator(func):
    def my_decorator_impl(x):
        result = x if x > 0 else 0
        return func(result)
    return my_decorator_impl

@my_decorator
def myfunc(x):
    return np.sqrt(x)

In [96]:
myfunc(-1)

0.0

In [97]:
from functools import partial
def decor_impl(fun, argument):
    def impl(x):
        result = x if x > argument else argument
        return fun(result)
    return impl

decor = partial(decor_impl, argument = 2)

@decor
def myfunc(x):
    return np.sqrt(x)

In [98]:
myfunc(-1)

1.4142135623730951

In [99]:
def para(dec):
    def layer(*args, **kwargs):
        def repl(f):
            return dec(f, *args, **kwargs)
        return repl
    return layer

@para
def decor(f, n):
    def impl(x):
        result = x if x > n else n
        return f(result)
    return impl

@decor(0)
def myfunc(x):
    return np.sqrt(x)

In [100]:
myfunc(-1)

0.0