# PytzMLS2018: Python for ML and DS Research - Intro to Python

<center><b>Anthony Faustine (sambaiga@gmail.com)</b> </center>

# 1.1. Jupyter notebook

Open-source web application for interactive and exploratory computing. It allows to create and share documents that contain live code, equations, visualizations and explanatory text.We will work in **Jupyter notebooks** for all lab sessions. 

It is a platform for Data Science at scale as it covers all the life-cycle of scientific ideas :ideas to publications.

A **notebook** is a collection of `cells`, that can contain different content


### Basics notebook commands
- crete cell
- excute cell
- Change kernel

### Code

In [1]:
print("Hello world")

Hello world


In [2]:
x = 10
print(x + 5) 

15


### Text : Mark down

### Math equations

### HTML commands
You can also use HTML commands, just check this cell:

<h3> html-adapted title with &#60;h3&#62; </h3> <p></p>
<b> Bold text &#60;b&#62; </b> of <i>or italic &#60;i&#62; </i>

### Important shortcuts

- The **TAB** button is essential: It provides you all possible actions you can do after loading in a library AND it is used for automatic autocompletion


In [3]:
import math

#math.+TAB

- The SHIFT-TAB combination is ultra essential to get information/help about the current operation

In [4]:
#math.cos+SHIFT+TAB

In [None]:
import math

math

# 1.2. Python Basics

Python is a very popular general-purpose programming language.
- Open source general-purpose language
- Dynamically semantics (rather than statically typed like Java or C/C++)
- Interpreted (rather than compiled like Java or C/C++)
- Object Oriented.

Why is Python such an effective tool in scientific research ?
- Interoperability with Other Languages : You can use it in the shell on microtasks, or interactively, or in scripts, or build enterprise software with GUIs.
-  “Batteries Included” + Third-Party Modules : Python has built-in libraries and third-party liabraies for nearly everything.
- Simplicity & Dynamic Nature : You can run your Python code on any architecture.
- Open ethos well-fit to science : Easy to reproduce results with python


Why is Python such an effective tool for Data Science and Machine learning research
- Very rich scientific computing libraries (numpy, matplotlib, pandas, scipy etc) and machine learning frameworks (Pytorch, Tensorflow, keras, mxnet,  etc)
- All DS and ML tasks can be performed with Python :
 - accessing, collecting, cleaning, analysing, visualising data  modelling, evaluating models, integrating in prod, scaling etc.
 
 
**Python 2 VS. Python 3**

Two major versions of Python in widespread use : Python 2.x
and Python 3.x
- Some features in Python 3 are not backward compatible
with Python 2
- Some Python 2 libraries have not been updated to work
with Python 3
- Bottom-line : there is no wrong choice, as long as all the libraries you need are supported by the version you choose but most of libararies are phasing out python 2.x.
- In this workshop : Python3

## 1.2.1 Variable and Print Statements

In [5]:
# Variables and assignment
a = False
print (a)

False


In [6]:
# Varibale type
type(a)

bool

In [7]:
# type casting: converting the integer to a float type
int(a)

0

**Print Statements**

The **print** statement can be used in the following different ways :

    - print(variable_name)
    - print("Hello World")
    - print ("Hello", <Variable >)
    - print ("Hello" + <Variable Containing the String>)
    - print ("Hello %s" % <variable containing the string>)
    - print ("Hello" + <Variable Containing the String>)
    - print ("Hello %d" % <variable containing integer datatype>)
    - print ("Hello %f" % <variable containing float datatype>)

In [8]:
acc = 89
fs = 60.20
modal = "Random Forest"
print("The perfomance results for  %s model: Accuracy: %d ; F-score: %.3f" %(modal,acc,fs))

The perfomance results for  Random Forest model: Accuracy: 89 ; F-score: 60.200


Alternatively you can use **.format()** in print function

In [9]:
print("The perfomance results for  {0} model: Accuracy: {1} ; F-score: {2}" .format(modal,acc,fs))

The perfomance results for  Random Forest model: Accuracy: 89 ; F-score: 60.2


## 1.2.2  Python Conditional Statemnents and Loops

### Indentation

It is important to keep a good understanding of how indentation works in Python to maintain the structure and order of your code. We will touch on this topic again when we start building out functions!


### if,elif and else Statements

** if**
```python
 if some_condition:
    algorithm
```    

**If-else**
```python
    if some_condition:
       algorithm
    else:
       algorithm
```

**if-elif**
```python
    if some_condition:
      algorithm
    elif some_condition:
      algorithm
    else:
      algorithm
```

In [10]:
#Example avoid division by zero
val = 0
num = 10
if val == 0:
    val += 2e-07
result = num / val
print("{0} dived {1} = {2}".format(num, val, result))
    

10 dived 2e-07 = 50000000.0


##  For Loops

A for loop acts as an iterator in Python, it goes through items that are in a sequence or any other iterable item. 

```python
   for variable in something:
       algorithm
```

Two importants function
- range()
- enumerate()

In [11]:
for i in range(6):
    print(i)

0
1
2
3
4
5


In [12]:
for i, val in enumerate(range(2,10)):
    print("Index:{0} Value {1}".format(i, val))

Index:0 Value 2
Index:1 Value 3
Index:2 Value 4
Index:3 Value 5
Index:4 Value 6
Index:5 Value 7
Index:6 Value 8
Index:7 Value 9


### While loops
```python
while some_condition:
      algorithm
       
```

In [13]:
# Example
i = 1
while i < 5:
    print("The square root of {0} is {1}".format(i, i**2))
    i = i+1

The square root of 1 is 1
The square root of 2 is 4
The square root of 3 is 9
The square root of 4 is 16


### Progress bar

In [14]:
import tqdm
from  tqdm import tqdm_notebook as pbar

In [15]:
sum_of_n = 0
N = 1e10
for i in pbar(range(10000000)):
    sum_of_n+=i
print(sum_of_n)    

HBox(children=(IntProgress(value=0, max=10000000), HTML(value='')))


49999995000000


## 1.2.3 Data Structures

### Lists

- create list
- add item to list
- access elements of a list

In [16]:
#name, gender, height, age, weight, region, status
data = ["James John", "M", 176, 28, "Dodoma", 1]
print(data)

['James John', 'M', 176, 28, 'Dodoma', 1]


In [17]:
data[0]

'James John'

In [18]:
data[0:2]

['James John', 'M']

In [19]:
#what about 
#data[-1]

In [20]:
data.append("350K")
data

['James John', 'M', 176, 28, 'Dodoma', 1, '350K']

In [21]:
# access all elements in a list
for item in data:
    print(item)

James John
M
176
28
Dodoma
1
350K


### Dictionary

In [22]:
# create dictionary
dic = {
    'name':"James John",
    'age': 28,
    'gender':"M",
    'region':"Dodoma",
    'status':1
}
print(dic)

{'name': 'James John', 'age': 28, 'gender': 'M', 'region': 'Dodoma', 'status': 1}


In [23]:
dic.values()

dict_values(['James John', 28, 'M', 'Dodoma', 1])

In [24]:
# Access value of element by key - most important feature!
print(dic['age'])

28


In [25]:
dic['salary'] = "310K"
print(dic)

{'name': 'James John', 'age': 28, 'gender': 'M', 'region': 'Dodoma', 'status': 1, 'salary': '310K'}


In [26]:
#get list keys of a dictionary
dic.keys()

dict_keys(['name', 'age', 'gender', 'region', 'status', 'salary'])

In [27]:
#get list values in a dictionary
dic.values()

dict_values(['James John', 28, 'M', 'Dodoma', 1, '310K'])

In [28]:
#get list key, values pairs item in a dictionary
for key , value in dic.items():
    print("{0}: {1}".format(key, value))

name: James John
age: 28
gender: M
region: Dodoma
status: 1
salary: 310K


## 1.2.3 Functions and Modules

Functions will be one of our main building blocks when we construct larger amounts of code to solve problems.

- function is a useful device that groups together a set of statements so they can be run more than once. They can also let us specify parameters that can serve as inputs to the functions.

- functions allow us to not have to repeatedly write the same code again and again. 
- function in Python is defined by a ``def`` statement. The general syntax looks like this:

In [29]:
def name_of_function(arg1,arg2):
    '''
    This is where the function's Document String (doc-string) goes
    '''
    # Do stuff here
    #return desired result


In [30]:
def normalize(data=None, mean=None, std=None):
    '''
    Normalization function
    arguments:
             data: the data value you want to normalize
             mean: mean value of your data
             std: standard deviation of your data
    return:
          z-score: normalized value   
    '''
    
    return (data - mean)/ std
        

In [31]:
result = normalize(data=27.8, mean=18, std=6)
print("Normalized value is {:.2f}".format(result))

Normalized value is 1.63


**Module**

Modules are organized units (written as files) which contain functions, statements and other definitions.

* Any file ending in `.py` is treated as a module (e.g., `my_function.py`, which names and defines a function `my_function`)

* Modules: own global names/functions so you can name things whatever you want there and not conflict with the names in other modules.



In [32]:
%%writefile normalizer.py
def normalize(data=None, mean=None, std=None):
    '''
    Normalization function
    arguments:
             data: the data value you want to normalize
             mean: mean value of your data
             std: standard deviation of your data
    return:
          z-score: normalized value   
    '''
    
    return (data - mean)/ std

Overwriting normalizer.py


**Packages**
-  Packages are name-spaces which contain multiple packages and modules themselves. They are simply directories, but with a twist.
- Each package in Python is a directory which MUST contain a special file called `__init__.py.` This file can be empty, and it indicates that the directory it contains is a Python package, so it can be imported the same way a module can be imported.

To Import modules and packages 

Different options are available:

* <span style="color:green">import <i>package-name</i></span>  <p> importing all functionalities as such
* <span style="color:green">from <i>package-name</i> import <i>specific function</i></span>  <p> importing a specific function or subset of the package/module
* <span style="color:green">from <i>package-name</i> import *  </span>   <p> importing all definitions and actions of the package (sometimes better than option 1)
* <span style="color:green">import <i>package-name</i> as <i>short-package-name</i></span>    <p> Very good way to keep a good insight in where you use what package

In [33]:
import normalizer as norm
result = norm.normalize(data=27.8, mean=18, std=6)
print("Normalized value is {:.2f}".format(result))

Normalized value is 1.63


In [34]:
#Alternatively
from normalizer import normalize
result = normalize(data=27.8, mean=18, std=6)
print("Normalized value is {:.2f}".format(result))

Normalized value is 1.63


#### Import modules from another directory

In [35]:
import sys
sys.path.append('src/')
import normalizer as norm 
result = norm.normalize(data=27.8, mean=18, std=6)
print("Normalized value is {:.2f}".format(result))

Normalized value is 1.63


## 1.2.4 Python object-oriented programming 
Python supports object-oriented programming (OOP). The goals of OOP are:

* to organize the code, and
* to re-use code in similar contexts.


Here is a small example: we create a data process class, which is an object gathering several custom functions (methods) and variables (attributes), we will be able to use:

In [36]:
class DataProcess(): 
    
    def __init__(self, data, mean, std):
        """
        data processing class
        """
        self.data = data
        
        self.mean = mean
        self.std  = std
        self.max  = 100
        
    def normalize(self):
        """normalize data
        """
        
        return (self.data - self.mean)/self.std
        
    def scale(self):
        """
       scale data
        """
        return self.data/self.max

In [37]:
process = DataProcess(data=80, mean=45, std=15) 

In [38]:
result_normalize = process.normalize()
print("Normalized data: {:.2f}".format(result_normalize))

Normalized data: 2.33


In [39]:
result_scaled = process.scale()
print("Scaled data: {:.2f}".format(result_scaled))

Scaled data: 0.80


### References

## References

- [python4datascience-atc](https://github.com/pythontz/python4datascience-atc)
- [PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)
- [DS-python-data-analysis](https://github.com/jorisvandenbossche/DS-python-data-analysis)