# Intro to Python for Data Science

_Milan Raicevic_, 14/10/2016

_milan.raicevic@accenture.com_

## General overview of the talk:
#### 1. Basics of Python language
#### 2. Differences between Python and R
#### 3. Data science toolset in Python

## So why Python ?

* __Simple__ : Python reads like pseudo-code  
* __Easy to learn__: high level language, not many complicated concepts  
* __Interpreted__: run things interactively (good for analytics)  
* __Easily extendable__: a million great libraries to do pretty much anything
    - many of these libraries are aimed at data analysis  


## Basics of Python language
* __Variables__: _numbers, strings, lists, dictionaries, tuples_
* __Program flow__: _for, if, while, try-except_
* __Functions__
* __Classes__
* __ Modules__

## Variables: base types in Python
### 1) Numbers

In [11]:
a = 10     # integer
b = 12.3   # real
c = 5+4j   # complex

__Note__: no static typing, all types are infered automatically

### 2) Strings

In [12]:
s = "This is a test string"
print(s)

This is a test string


_There are many built-in useful string functions:_

In [13]:
print ("1: ", s.endswith("python") )      # check if string ends with string
print ("2: ", s.replace("test", "real") ) # replace a substring with string
print ("3: ", s.find("string") )          # find index of a substring 
print ("4: ", s.title() )                 # capitalize the first letter in words

1:  False
2:  This is a real string
3:  15
4:  This Is A Test String


### 3) Lists
A versitile container type. Can have elements of different types!

In [14]:
l = [1,2,3]               # list of same type elements
l = [[1,2], [3,4]]        # list of lists
l = ["string", 10, print] # list of mixed types
print(l)

['string', 10, <built-in function print>]


_Subsetting:_

In [15]:
l = [1,2,3,4,5,6,7]
l[2:4]

[3, 4]

_Replacing subsets:_


In [16]:
l[2:4] = ["a", "b"]
l


[1, 2, 'a', 'b', 5, 6, 7]

_Deleting elements:_

In [17]:
l[3:5] = []
l

[1, 2, 'a', 6, 7]

_Adding new elements:_

In [18]:
l.append("new")
l

[1, 2, 'a', 6, 7, 'new']

_Poping elements of the list:_

In [19]:
print("Extracted last element: ", l.pop())
print("Leftover list: ",l)

Extracted last element:  new
Leftover list:  [1, 2, 'a', 6, 7]


### 3) Dictionaries
An unordered container of _key - value_ pairs. Internally, it is a hash table

In [20]:
d = {"apple": 10, "orange": 20, "fig": "none"}     # the key must be hashable,
                                                   # i.e. immutable
                                                   # value can be anything

_Accessing values by key:_

In [21]:
print( d["orange"] )

20


_Adding new values by key:_

In [22]:
d["grape"]= 15
d                # output need not be ordered!

{'apple': 10, 'fig': 'none', 'grape': 15, 'orange': 20}

_Getting all keys and values:_

In [23]:
print (d.keys())
print (d.values())

dict_keys(['apple', 'fig', 'grape', 'orange'])
dict_values([10, 'none', 15, 20])


_Testing if key exists:_

In [24]:
print ("orange" in d)
print ("lemon" in d)

True
False


### 4) Tuples and sets
* Tuple is an immutable list

In [25]:
t = (1,2,3)
t[1]=10

TypeError: 'tuple' object does not support item assignment

* sets are unordered collections with unique elements. There are mathematical set operations implemented.

In [26]:
s1 = {1,2,2,2,5,6}
print(s1)                   # duplicates are automatically removed

{1, 2, 5, 6}


## Program flow control
### 1) for loop

In [27]:
for i in [1,2,3,4]:   # can loop through any iterable
    print(i)          # Note the indentation !!!

1
2
3
4


### 2) if-else statements

In [28]:
for i in [1,2,3,4]:
    if (i%2==0):       # if
        print(i)
    elif(i==1):        # else if
        print(i*20)
    else:              # final else, all other cases
        print("This must be 3 then")

20
2
This must be 3 then
4


### 3) try-except: Basics of error catching

In [29]:
for i in [1,2,3,"four"]:
    print(i+2)

3
4
5


TypeError: Can't convert 'int' object to str implicitly

In [30]:
for i in [1,2,3,"four"]:
    try:
        print(i+2)
    except TypeError:
        print("Can't add number to string!")

3
4
5
Can't add number to string!


## List comprehensions - programming inside lists
* Pythonic approach to functional programming tools: Map, Reduce, Filter
* Actually faster that doing the same with for loops
* Very readable
* There are also dictionary, tuple and set comprehensions

In [31]:
l = range(10)
[i*2 for i in l]      # map

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [32]:
[i*10 for i in l if i not in [0,9]]  # map + filter

[10, 20, 30, 40, 50, 60, 70, 80]

In [33]:
l=range(3)
[(i,j) for i in l for j in l]  # nested for loop

[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

## Functions
_Define a function:_

In [34]:
def func(a):     # function definition
    print("Number: ", a)
    return(a+10) # return of the function
func(10)

Number:  10


20

_Functions can have multiple parameters and some or all can be optional:_

In [35]:
def func2(a, b=10):        # parameter with value is optional
    print("a + b =", a+b) 
func2(10)                  # if optional parameter is not given,
                           # use the default value
func2(10,100)              # if given, use that

a + b = 20
a + b = 110


_Functions are first class citizens!_

In [36]:
tmp_func = func2    # functions can be assigned to other variables
tmp_func(20)

a + b = 30


In [37]:
def feed_func(c):
    return(c*10)

def compute_func(value, f):   # functions can be passed to other functions
    return(f(value))

print ( compute_func(10, feed_func) )

100


## Anonymous functions
* Another functional programming concept 
* define a one-shot function that does not need to be assigned a function name
* commonly used with map, reduce, filter (__this pattern can be replaced with list comprehensions__)

In [38]:
f = lambda x:x**2
f(10)

100

In [39]:
list(map(lambda x: 'Is this '+x, ['real life?', 'just fantasy?']))

['Is this real life?', 'Is this just fantasy?']

__Note__: We had to convert result of map to a list to print it. This is because map family returns _iterators_ in __Python 3__

** _iterators, generators, decorators et al. are just some of the advanced Python topics we want go into.
You don't need to know the nuts and bolts of Python to be productive!_


## Classes: object-oriented programming in Python
* Classes combine data and functions that should operate on it (functions are called methods when in a class).  
* The idea is for a class to represent a conceptual unit of a program.  
* Classes can inherit from each other, making more specialized units out of more basic ones.  

_A simplest possible class:_

In [40]:
class cls:               # define class name
    def __init__(self):  # define base constructor
        pass             # do nothing

__Note:__ _self_ is a pointer to the object itself (like _this_ in C++)

_A class that does something:_

In [41]:
class counter:
    def __init__(self, init_value = 0):  # constructor
        self.i = init_value              # a value
    def increment(self, increment_by=1): # a method
        self.i += increment_by

_Making class instances:_

In [42]:
c1 = counter()   # since init_value is optional, 
                 # we can initialize the object with no 
                 # parametes
print("c1: ", c1.i)

c2 = counter(10) # or we can use init_value
print("c2: ", c2.i)


c1:  0
c2:  10


_Calling methods:_

In [43]:
c1.increment()
print("1: ",c1.i)

c1.increment(5)
print("2: ",c1.i)


1:  1
2:  6


_Inheritance and polymorphism:_

In [44]:
class vehicle():                 # base class
    def __init__(self):
        self.number_of_wheels=None
    def get_number_of_wheels(self):
        return(self.number_of_wheels)

class bicycle(vehicle):          # child classes
    def __init__(self):
        self.number_of_wheels=2
        
class car(vehicle):
    def __init__(self):
        self.number_of_wheels=4

In [45]:
v1 = bicycle()
v2 = car()
for v in [v1, v2]:  
    print("I am riding on ", v.get_number_of_wheels(), " wheels")  
    # both car and bicycle inherited the method from vehicle

I am riding on  2  wheels
I am riding on  4  wheels


## Important: everything in Python is a pointer
* This is true in general, e.g. integer numbers are objects of class int
* More relevant for more complex classes:

In [46]:
class number:
    def __init__(self, number):
        self.value = number
        
a = number(10)
b = a

a.number = 20

# what is the value of b.number?

b.number 

20

Not the way R would decide to handle this...

## Modules: organizing the code
* modules are like libraries / packages in R
* Modules have their own namespaces

In [47]:
import matplotlib                # import a whole module
import matplotlib.pyplot         # import a submodule
from matplotlib import colorbar  # import a single function
from matplotlib import *         # import everything !!!

## Difference between Python and R


## Differences between Python and R - *philosophical*

### Python 
Object-oriented first

_VS._

### R
Functional first

## The key difference is immutability
In R, variables don't change unless assigned to  

In Python, everything is a pointer (so things change)

## Language differences:

|     | Python | R |
| --- | -------|---|
|1.   | General programming language | Statistics domain language |
|2.   | Clear and general base types | Quirky base types (_why, oh why **[ [ ] ]**_) |
|3.   | Straighforward class system  | S3, S4, RC ?
|4.   | PyPi  | CRAN |
|5.   | Better for more complex software development | Better for data analysis **?** |

## Library differences:

| task | Python | R |
| ---- | ------ | - |
| Arrays| numpy | base R|
| Data wrangling | **pandas** | base R, plyr, **dplyr**, **data.table**, tidyr, reshape2 |
| Strings | Base Python | stringr |
| Statistics | scipy.stats, statsmodels | base R |
| Machine learning | **scikit-learn** | **caret**, glmnet, e1073, randomForrest, nnet, rpart, ... |
| Deep learning | caffe, Theano, Tensorflow, Keras ... | MXNet (more?) |
| Visualization | Matplotlib, seaborn, bokeh, **Altair**, **ggplot** | ggplot2 |
| IDE| Rodeo | RStudio |


## Closing notes:

- This presentation was made in Jupyter Notebook (can do both Python and R)
- Python / R for data analytics is a matter of preference! Best option: __use both__
    * If you want to make a website / do image processing / make a game / anything but analytics, Python is a better choice!


## Questions / Comments ?