# Intro to Python

* You probably need more than one language in your tool kit
* Python is a good "general purpose" language to complement R
* Very popular in "data science"
* Not bad to have on your resume, both for academic and non-academic career paths

# Today

* Overview of language features
* scipy
* numpy and pandas get special mentions
* plotting
* Interoperability with R

In [1]:
x = 1
type(x)

int

In [2]:
x = 1.0
type(x)

float

In [3]:
x = "Hello"
type(x)

str

# Tuples

In [4]:
x = (1,)
type(x)

tuple

In [5]:
x = 1,2 # Note: x = 1, makes a tuple, and can be a frustrating bug
type(x)

tuple

In [6]:
x[0]

1

In [7]:
x[0]=3

TypeError: 'tuple' object does not support item assignment

# Lists

In [8]:
x = [1, 3.0, "Hello"]
type(x)


list

In [9]:
x[0]

1

In [10]:
x[-1]

'Hello'

In [11]:
x[-2:]

[3.0, 'Hello']

In [12]:
x[0] = 13

# List traversal

In [13]:
for i in x:
    print(i)

13
3.0
Hello


In [14]:
for i in range(len(x)):
    print(x[i])

13
3.0
Hello


In [15]:
for i in range(0,len(x),2):
    print(x[i])

13
Hello


# Traversing multiple lists

In [16]:
y = ["Jane", "Mickey", "Aliah"]

In [17]:
for i,j in zip(x,y):
    print(i,j)

13 Jane
3.0 Mickey
Hello Aliah


# List comprehensions

In [18]:
i = [i for i in range(5)]

In [19]:
print(i)

[0, 1, 2, 3, 4]


# Dictionaries

In [20]:
x = {'a': 3, 4: "foo"}

In [21]:
x['a']

3

In [22]:
x[4]

'foo'

In [23]:
x[17]

KeyError: 17

# Files

* Extensive support for files.  We'll do some of this in lab
* gzip supported "out of the box"

# Functions

In [24]:
def add(x,y):
    return x + y
add(2,2)

4

In [25]:
def add(x,y=2):
    return x + y
add(2)

4

In [26]:
add(y=3,x=-7)

-4

# There is a lot more, but this covers some basics

* classes and object oriented programming
* Nice interfaces with [argparse](https://docs.python.org/3/howto/argparse.html) and [click](https://click.palletsprojects.com/en/7.x/).
* [pybind11](http://pybind11.readthedocs.io) is to Python and [Rcpp](http://www.rcpp.org) is to R.
* For those of you with some skills already, [Fluent Python](https://www.amazon.com/Fluent-Python-Concise-Effective-Programming/dp/1491946008/) is an amazing intermediate/advanced book.

# "Scientific Python"

* [scipy](http://scipy.org) (Show them the website!)
* Lots of routines for numerics and such.

# Numeric Python

Provides types that are based on C arrays and functions to act on them.

In [27]:
import numpy as np
x = np.array([i for i in range(3)], dtype=np.int32) # You can select the nummeric type. All the C types are supported
x

array([0, 1, 2], dtype=int32)

In [28]:
x = np.identity(2)
x[1,0]=0.5
x

array([[1. , 0. ],
       [0.5, 1. ]])

In [29]:
np.linalg.cholesky(x)

array([[1.       , 0.       ],
       [0.5      , 0.8660254]])

# Pandas

Provides the analog to R's `data.frame` *and* a lot of the `dplyr`-like aggregation functionality

In [30]:
import pandas as pd
x = pd.DataFrame({'x':[1.,2.,3.,1.,2.,3.],'y':['A','A','A','B','B','B']})

In [31]:
x.groupby(['y']).mean()

Unnamed: 0_level_0,x
y,Unnamed: 1_level_1
A,2.0
B,2.0


# Plotting

Lots of options.  WE will play with some in lab.

* [matplotlib](http://www.matplotlib.org) is the standard plotting library.  Complex.  Extremely powerful.  Idiosyncratic
* [seaborn](https://seaborn.pydata.org/) is AWESEOME.  Based on matplotlib, but higher-level.
* There area ports of ggplot2's grammar.  [plotnine](https://github.com/has2k1/plotnine) has been recommended to me.
* [holoviews](http://holoviews.org/) and [bokeh](https://bokeh.pydata.org/en/latest/) make nice web-centric graphics powered by Javascript.