# Basic Programming in Python
Presented at [ISMRM 2021](https://www.ismrm.org/21m/) by [Saige Rutherford](https://www.beingsaige.com/).

This notebook presents a very brief overview of the Python programming language, with a particular slant on tools and applications relevant for data science. It's assumed that the reader has at least a little bit of prior programming experience; the emphasis is primarily on (a) demonstrating how basic things are done in Python, and (b) reviewing the many strengths of Python (and okay, also a few weaknesses). This notebook was forked from [Tal Yarkoni's teaching materials](https://github.com/neurohackademy/introduction-to-python), and has been adapted for this course. 

## Install Instructions:

Windows, Mac, Linux install options for setting up python on your own computer can be found [here](https://python.land/installing-python).

In this tutorial we will not be using a python install on our own computer, because we are not in-person and therefore I cannot help you troubleshoot individual errors that can come up when installing on your own machine. We will be using python in the browser via [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb). Hopefully you have followed the instructions for launching this notebook and are reading this message in Google Colab! 

## What is Python?

* Python is a programming language
* Specifically, it's a widely used, very flexible, high-level, general-purpose, dynamic programming language
* That's a mouthful! Let's explore each of these points in more detail...

### Widely-used
* Python is the fastest-growing major programming language
* Top 3 overall (with JavaScript, Java) [source of these rankings](https://redmonk.com/sogrady/2021/03/01/language-rankings-1-21/).

<img src="https://redmonk.com/sogrady/files/2021/03/lang.rank_.0121.wm_.png" width="800px" style="margin-bottom: 10px;">

### High-level
Python features a high level of abstraction
* Many operations that are explicit in lower-level languages (e.g., C/C++) are implicit in Python
* E.g., memory allocation, garbage collection, etc.
* Python lets you write code faster

#### File reading in Java
```java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
 
public class ReadFile {
    public static void main(String[] args) throws IOException{
        String fileContents = readEntireFile("./foo.txt");
    }
 
    private static String readEntireFile(String filename) throws IOException {
        FileReader in = new FileReader(filename);
        StringBuilder contents = new StringBuilder();
        char[] buffer = new char[4096];
        int read = 0;
        do {
            contents.append(buffer, 0, read);
            read = in.read(buffer);
        } while (read >= 0);
        return contents.toString();
    }
}
```

#### File-reading in Python
```python
open(filename).read()
```

### General-purpose
You can do almost everything in Python
* Comprehensive standard library
* Enormous ecosystem of third-party packages
* Widely used in many areas of software development (web, dev-ops, data science, etc.)

### Dynamic
Code is interpreted at run-time
* No compilation process*; code is read line-by-line when executed
* Eliminates delays between development and execution
* The downside: poorer performance compared to compiled languages

## Variables and data types
* In Python, we declare a variable by assigning it a value with the = sign
    * Variables are pointers, not data stores!
* Python supports a variety of data types and structures:
    * booleans (True or False)
    * numbers (ints, floats, etc.)
    * strings
    * lists 
    * dictionaries
    * many others!
* We don't specify a variable's type at assignment—Python uses [duck typing](https://en.wikipedia.org/wiki/Duck_typing)

### Basic types

Variable naming convention: use lower case, separate words with an underscore

In [2]:
# An integer. Notice the variable naming convention.
age_in_years = 28

In [3]:
# A float
almost_pi = 3.14

In [4]:
# A string
proton = "P is for proton"

In [5]:
# A boolean takes on only the values True or False
enjoying_tutorial = True

### Data structures
* Most code requires more complex structures built out of basic data types
* Python provides built-in support for many common structures
    * Many additional structures can be found in the [collections](https://docs.python.org/3/library/collections.html) module

#### Lists
* An ordered, heterogeneous collection of objects
* List elements can be accessed by position
* The syntax for creating a list is square brackets --> list = []
    * Technically you can also declare a list like this: list = list() --> but the square brackets method is more common

In [1]:
random_stuff = [] # Fill the list with stuff!

In [2]:
# We index lists by numerical position--starting at 0


In [3]:
# We can also slice lists


In [10]:
# Append an element


#### Tuples
* Very similar to lists
* Key difference: tuples are *immutable*
    * They can't be modified once they're created
* Syntax for declaring a tuple is with parentheses --> my_tuple = ()

In [12]:
random_tuple = ()

#### Dictionaries (dict)
* Unordered collection of key-to-value pairs
* dict elements can be accessed by key, but *not* by position
* Syntax for creating a dictionary is curly brackets --> my_dictionary = {}
    * you could declare it like this: my_dictionary = dict() --> but again, the curly brackets method is more common

In [13]:
# A dictionary is an unordered mapping from keys to values
fruit_prices = {
    'apple': 0.65,
    'mango': 1.50,
    'strawberry': '$3/lb',
    'durian': 'unavailable'
}

In [5]:
# What's the price of a mango?


In [None]:
# Add a new entry for pears


### Everything is an object in Python
* All of these 'data types' are actually just objects in Python
* *Everything* is an object in Python!
* The operations you can perform with a variable depend on the object's definition
* E.g., the multiplication operator * is defined for some objects but not others

In [None]:
# Multiply an int by 2

In [None]:
# Multiply a float by 2

In [None]:
# What about a string?

In [None]:
# A list?

In [None]:
# A dictionary?

## Control structures
* Language features that allow us to control how code is executed
* Iteration (e.g., for-loops, while statements...)
* Conditionals (if-then-else statements)
* [Etc](https://docs.python.org/3/tutorial/controlflow.html)...

In [None]:
# Write an if-elif-else statement...

In [None]:
# Loop over the random_stuff list we created earlier and print each value.
# Alternatively, loop over integers and index into the random_stuff list.

In [None]:
# Now do the same thing as above, but with a list comprehension

## Namespaces and imports
* Python is very serious about maintaining orderly namespaces
* If you want to use some code outside the current scope, you need to explicitly "import" it
* Python's import system often annoys beginners, but it substantially increases code clarity
    * Almost completely eliminates naming conflicts and confusion
    * If you know R, consider the horrors wreaked by liberal use of `attach()`

In [None]:
# Three different ways to import and access the defaultdict class
from collections import defaultdict
a = defaultdict(list)

In [None]:
from collections import defaultdict as dd
b = dd(list)

In [None]:
import collections
c = collections.defaultdict(list)

In [None]:
# # Verify that the resulting objects are equivalent
a == b == c

## Functions
* A block of code that only runs when explicitly called
* Can accept arguments (or parameters) that alter its behavior
* Can accept any number/type of inputs, but always return a single object
    * Note: functions can return tuples (may *look like* multiple objects)

In [None]:
# We'll need the random module for this
import random

def add_noise(x, mu, sd):
    ''' Adds gaussian noise to the input.
    
    Parameters:
        x (number): The number to add noise to
        mu (float): The mean of the gaussian noise distribution
        sd (float): The standard deviation of the noise distribution
    
    Returns: A float.
    '''
    noise = random.normalvariate(mu, sd)
    return (x + noise)

In [None]:
# Let's try calling it...

### Positional vs. keyword arguments
* Positional arguments are defined by position and *must* be passed
    * Arguments in the function signature are filled in order
* Keyword arguments have a default value
    * Arguments can be passed in arbitrary order (after any positional arguments)

In [None]:
def add_noise_with_defaults(x, mu=0, sd=1):
    ''' Adds gaussian noise to the input.
    
    Parameters:
        x (number): The number to add noise to
        mu (float): The mean of the gaussian noise distribution
        sd (float): The standard deviation of the noise distribution
    
    Returns: A float.
    '''
    noise = random.normalvariate(mu, sd)
    return x + noise

In [None]:
# Let's call it again

## Classes
* A template for a particular kind of object
* A class defines the variables an object contains and what it can do with them
* To illustrate, let's define a `Circle` class...
* Note: object-oriented programming can be a bit hard to understand at first, and we're moving quickly

In [None]:
# We need pi!
import math

# Write a Circle class that takes a radius argument at initialization
# and has area() and copy() instance methods that return the circle's
# area and a copy of the circle, respectively.
class Circle:
    pass

In [None]:
# Now let's make use of our class. First, initialize a new Circle.

In [None]:
# Now print the circle's radius.

In [None]:
# Assign a copy of the circle instance to a new variable.

### Magic methods
* Methods padded with `__` have a variety of special functions in Python
* E.g., `__init__` and/or `__new__` are called when an object is initialized
* All operators in Python are actually just cleverly-disguised method calls
* E.g., the code `age_in_years * 2` is actually equivalent to `age_in_years.__mul__(2)`
* Any object that implements the `__mul__` method can use the `*` operator

In [None]:
# Multiply a circle by 2 and print the resulting circle's area.
# Note: we'll need to add a magic method for __mul__ to our Circle class.

# Why do data science in Python?

## Easy to learn
* Readable, explicit syntax
* Most packages are very well documented
    * e.g., scikit-learn's [documentation](http://scikit-learn.org/stable/documentation.html) is widely held up as a model
* A huge number of tutorials, guides, and other educational materials

## Comprehensive standard library
* The [Python standard library](https://docs.python.org/2/library/) contains a huge number of high-quality modules
* When in doubt, check the standard library first before you write your own tools!
* For example:
    * os: operating system tools
    * re: regular expressions
    * collections: useful data structures
    * multiprocessing: simple parallelization tools
    * pickle: serialization
    * json: reading and writing JSON

## Exceptional external libraries

* Python has very good (often best-in-class) external packages for almost everything
* Particularly important for data science, which draws on a very broad toolkit
* Package management is easy (conda, pip)
* Examples:
    * Web development: flask, Django
    * Database ORMs: SQLAlchemy, Django ORM (w/ adapters for all major DBs)
    * Scraping/parsing text/markup: beautifulsoup, scrapy
    * Natural language processing (NLP): nltk, gensim, textblob
    * Numerical computation and data analysis: numpy, scipy, pandas, xarray
    * Machine learning: scikit-learn, Tensorflow, keras
    * Image processing: pillow, scikit-image, OpenCV
    * Plotting: matplotlib, seaborn, altair, ggplot, Bokeh
    * GUI development: pyQT, wxPython
    * Testing: py.test
    * Etc. etc. etc.

# Python vs. other data science languages

* Python competes for mind share with many other languages
* Most notably, R
* To a lesser extent, Matlab, Mathematica, SAS, Julia, Java, Scala, etc.

### R
* [R](https://www.r-project.org/) is dominant in traditional statistics and some fields of science
    * Has attracted many SAS, SPSS, and Stata users
* Exceptional statistics support; hundreds of best-in-class libraries
* Designed to make data analysis and visualization as easy as possible
* Slow
* Language quirks drive many experienced software developers crazy
* Less support for most things non-data-related

### MATLAB
* A proprietary numerical computing language widely used by engineers
* Good performance and very active development, but expensive
* Closed ecosystem, relatively few third-party libraries
    * There is an open-source port (Octave)
* Not suitable for use as a general-purpose language

## So, why Python?
Why choose Python over other languages?
* Arguably none of these offers the same combination of readability, flexibility, libraries, and performance
* Python is sometimes described as "the second best language for everything"
* Doesn't mean you should always use Python
    * Depends on your needs, community, etc.

## You can have your cake _and_ eat it!
* Many languages--particularly R--now interface seamlessly with Python
* You can work primarily in Python, fall back on R when you need it (or vice versa)
* The best of all possible worlds?

# The Jupyter notebook
* "The [Jupyter Notebook](http://jupyter.org) is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text."
    * You can [try it online](http://jupyter.org/try)
* Formerly the IPython Notebook
* Supports [many different languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)
* A living document wrapped around a command prompt
* Various extensions and [widgets](http://ipywidgets.readthedocs.io/en/latest/index.html)

# Summary
* Python is the world's most popular dynamic programming language
* It's increasing dominant in the world of data science
* It's (relatively) easy to learn, performant, and has an enormous ecosystem
* "The second best language for everything"

# Resources/further reading

There are hundreds of excellent resources online for learning Python and/or data science. A few good ones:

* CodeAcademy offers interactive programming courses for many languages and tools, including [Python](https://www.codecademy.com/learn/python) and [git](https://www.codecademy.com/learn/learn-git)
* [A Whirlwind Tour of Python](http://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf) is an excellent intro to Python by [Jake VanderPlas](https://staff.washington.edu/jakevdp/); Jupyter notebooks are available [here](https://github.com/jakevdp/WhirlwindTourOfPython)
* Another excellent and free online book is Allen Downey's ["Think Python"](http://greenteapress.com/wp/think-python-2e/)
* Jake Vanderplas's [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook) is also available online as a set of notebooks
* Kaggle maintains a nice list of [data science and Python tutorials](https://www.kaggle.com/learn/overview)