<a href="https://colab.research.google.com/github/mickaeltemporao/reproducible-research-in-python/blob/master/notebooks/20191025_rrip.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reproducible Research in Python
Workshop McMaster University, October 25th 2019

This notebook was created as part of a workshop on *Reproducible Research in Python*. You can check the Workshop here: [Reproducible Research in Python](XXX)

### Important Note

This is a hands on workshop. It is better if you start coding along with me during the workshop, experiment bugs and try to understand your errors. 
Learn by doing and do not copy/paste code, instead type anything. 

Feel free to ask me questions at any time during the workshop.

### Prerequisites
Prior to the workshop, users need to:
- [ ] Account on [Google Colab](https://colab.research.google.com/) 
- [ ] Account on [GitHub](https://github.com/)

### Structure
This workshop is divided into three parts. The first part is an introduction the [Python](https://www.python.org/) programming language where you will learn the basics of the language and how to use built-in libraries. The second part of the workshop you will learn how to acquire, explore, and transform data (Pandas, Seaborn). In the third and last part, you will learn how to create pipelines to train models from the data and create a python package to share your code (sci-kit learn, poetry).

### Software
- [ ] [Python 3.6.8+](https://docs.python-guide.org/starting/installation/) 

### Resources
- The Python Package Index: https://pypi.org/
- Installing Python Pakcages: https://packaging.python.org/tutorials/installing-packages/
- Minimal Package Structure: https://python-packaging.readthedocs.io/en/latest/minimal.html
- Guidelines to document your code: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html

### License and credit
XXX

### Contact
For any follow-up questions:
- twitter: [@mickaeltemporao](https://twitter.com/mickaeltemporao)
- email: mickael.temporao@gmail.com

# Introduction

![Draw the Owl](http://www.forimpact.org/wp-content/uploads/2014/01/HowToDrawOwl.jpg)

## Agenda

- [Python Basics]() 
- [Data Acquisition, Analysis, & Transformation]()
- [Mdeling & Sharing]()






# Python Basics

**Learning Objective:** 
- Learn how code is executed in an Interactive Python Environment
- Get familiar with Python and some of its data types 
- Learn how to use functions, modules, packages.


## Python 

- Open Source 
- General Purpose Programming Language
- Created by Guido Van Rossum
- Interpreted
- Large Community 


## IPython Shell

- Run Python commands interactively

Magic Commands

```
- %lsmagic
- %who
- %history
- %save
- %run
- %?
```

## Hello World


In [182]:
print("Greetings!")

Greetings!


If you run the code below, what is the output?


In [183]:
# Press CTRL/CMD+ENTER to run this cell
print("5" + "3")

53


In python you can sum *STRINGS*. As you are doing the sum of two *STRINGS* the result is 53.

If you run the code below, what do you see?


In [184]:
print(" _____")
# This is a comment.
print("|     |")
# Here's another comment.
print("|     |")
"This is a string"
print("|_____|")


 _____
|     |
|     |
|_____|


Instructions are executed sequentially.


### Hack Time

In [0]:
# You code here.
# Print your first and last name?


## Arithmetic with Python

In its most basic form, Python can be used as a simple calculator. Consider the following arithmetic operators:

- Addition: +
- Subtraction: -
- Multiplication: *
- Division: /
- Exponentiation: **
- Modulo: % 

### Hack Time 

In [0]:
# Your code here.
# Divide 7 by 3.

# Raise 2 to the 5 power.


## Variables assignment

Variables are containers that allow you to store a value (e.g. 5) or an object(e.g. a function).

Python uses the symbol **"="** as the assignment statement.


In [185]:
x = 24
x

24

### Hack Time



In [0]:
# Your code here.
# Assign numerical values to two variables named `day_1` and `day_2`.

# Add these two variables together.

# Create a `my_total` variable containing the sum of the day `day_1` and `day_2` variables.

# Print the contents of the `my_total` variable.


## Basic data types in Python

Python works with numerous data types. Some of the most basic types to get started are:

- Natural numbers like 2 are called integers (*int*). 
- Decimal values like 2.5 are called floating point (*float*).
- Textual values like "orange" or 'bananas' are called strings (*str*).
- Logical values (True or False) are called boolean (*bool*).
- Lists are like variables but can contain any Python type (*list*).

In [186]:
day_1 = 20
type(day_1)


int

In [187]:
day_2 = 30
type(day_2)


int

In [188]:
description = "Voting Intentions"
type(description)


str

In [189]:
increasing = True
type(increasing)


bool

In [190]:
data = [description, increasing, "Tuesday", day_1, "Wednesday", day_2]
type(data)

list

### Hack Time

In [0]:
# Your Code Here.
## Create a variable containinig the average of voting intentions.

## What is it's type?


## List Manipulation

You can select, slice or edit elements in a list.

Note that Python is 0 indexed.


In [191]:
# Select an element in a list
data
data[3]


20

In [192]:
# Slicing lists: list[begin:end]
data[2:]


['Tuesday', 20, 'Wednesday', 30]

In [193]:
# Editing a list
data[0] = "Monday"
data[1] = 25.6
data


['Monday', 25.6, 'Tuesday', 20, 'Wednesday', 30]

In [194]:
# Adding to a list
day_3 = ["Thursday", 40]
day_3
data = data + day_3
data


['Monday', 25.6, 'Tuesday', 20, 'Wednesday', 30, 'Thursday', 40]


### Unpacking from lists


In [195]:
# Unpack contents of a list into multiple variables
a, b = range(2)

print("a:", a)
print("b:", b)



a: 0
b: 1


In [196]:
# You can use the asterisk to unpack multiple elements
a, b, *c = range(20)

print("a:", a)
print("b:", b)
print("c:", c)




a: 0
b: 1
c: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


## Functions and Methods
We have already used some functions until now (e.g. `print()`, `type()`, `range()`).

- A function is a group of related statements that perform a specific task.
- Help break our program into smaller and modular chunks.
- Make your code more organized and manageable. 
- Avoids repetition and makes code reusable.


The general form that functions take is:

```
output = function_name(input)
```



In [197]:
result = type(day_3)
result


list

In [198]:
# Note that help is also a function!
help(help)


Help on _Helper in module _sitebuiltins object:

class _Helper(builtins.object)
 |  Define the builtin 'help'.
 |  
 |  This is a wrapper around pydoc.help that provides a helpful message
 |  when 'help' is typed at the Python interactive prompt.
 |  
 |  Calling help() at the Python prompt starts an interactive help session.
 |  Calling help(thing) prints help for the python object 'thing'.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwds)
 |      Call self as a function.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)



In [0]:
# Alternatively IPython offers a shortcut
?print


In [200]:
data[1]


25.6

In [201]:
round(data[1])


26

### Defining your own functions

#### The syntax of Function
```
def function_name(parameters):
    """The summary line for a function docstring should fit in one line."""
    tmp = first_statement(s)
    output = second_statement(tmp)
    return output
```
A function definition consists of following components:
- The Keyword def marks the start of a function header.
- A function name to uniquely identify it.
- Parameters (arguments) through which we pass values to a function. They are optional.
- A colon (:) to mark the end of function header.
- "Optional" documentation string (docstring) to describe what the function does.
- One or more valid python statements that make up the function body. Statements must have same indentation level (usually 4 spaces).
- An optional return statement to return a value from the function.


### Hack Time


In [0]:
# Your code here.
## Let's create a function that returns the mean of its items.


### Methods
Methods are functions that belong to objects.

The general form that methods take is:
```
`object.method(input)`
```


In [202]:
data.index("Tuesday")


2

In [203]:
help(data.index)


Help on built-in function index:

index(...) method of builtins.list instance
    L.index(value, [start, [stop]]) -> integer -- return first index of value.
    Raises ValueError if the value is not present.



Each type of data has its own set of methods.



In [204]:
print(description)
type(description)


Voting Intentions


str

In [205]:
description.upper()


'VOTING INTENTIONS'

In [206]:
description.count("i")


2

You can also chain methods.


In [207]:
description.lower().count("i")


3

In [208]:
day_4 = ['Friday', 35]
data.extend(day_4)
data


['Monday', 25.6, 'Tuesday', 20, 'Wednesday', 30, 'Thursday', 40, 'Friday', 35]

## Modules, and Packages

A module is a set of python commands that are saved in a script (eg. script.py).
You can load a module and access all its contents at anytime using the command `import module`.

Packages are standardized way of organizing code and usually consist of multiple modules.
    - Minimal Package Structure: https://python-packaging.readthedocs.io/en/latest/minimal.html

Python, comes with pre-installed packages that you can directly load.



In [210]:
import math
pi = math.pi
pi


3.141592653589793

### Interacting with the file-system and paths.


In [228]:
import os
# Execute a shell command
os.system("echo 'Hello Again!'")


0

In [229]:
# Return the current working directory
os.getcwd()


'/content'

In [230]:
# List all of the files and sub-directories in a particular folder
os.listdir()


['.config', 'tmp_script.py', 'sample_data']

In [231]:
# Create folders recursively
my_path = "my_tmp_project/test1/test2/test3"
os.makedirs(my_path)
os.listdir()


['.config', 'my_tmp_project', 'tmp_script.py', 'sample_data']

In [232]:
# Delete directories recursively.
os.removedirs(my_path)
os.listdir()


['.config', 'tmp_script.py', 'sample_data']

In [233]:
# Create and Rename a file or folder
os.system("touch test_script.py")
os.listdir()


['.config', 'test_script.py', 'tmp_script.py', 'sample_data']

In [234]:
os.rename("test_script.py", "tmp_script.py")
os.listdir()


['.config', 'tmp_script.py', 'sample_data']

In [235]:
## Delete a file
os.remove("tmp_script.py")
os.listdir()


['.config', 'sample_data']

In [237]:
# Handling slashes / in file paths
file = "process.py"
folder = "Documents/project1"
full_path = os.path.join(folder, file)
full_path


'Documents/project1/process.py'

In [239]:
# Get the directory and file name from a full path
file = os.path.basename(full_path)
folder = os.path.dirname(full_path)
print(file, folder)


process.py Documents/project1


In [240]:
# Check if a file or folder exists
os.path.exists(full_path)


False

In [241]:
# Get the extension of a file
name, extension = os.path.splitext(file)
print(name, extension)


process .py


### Install package

To install a package you use the command `pip install package_name` in your terminal.

There are thousands of packages available such as:
    - matplotlib
    - numpy
    - pandas
    - pytorch
    - sci-kit learn
    - ...

For more packages see:
    - The Python Package Index: https://pypi.org/


In [223]:
# We will rely on some IPython magic to directly interact with the terminal.
!pip install pandas




# Data Acquisition, Analysis, & Transformation
## Pandas
## Seaborn


# Mdeling & Sharing
## sci-kit learn
## poetry