# Introduction to Python for Scientific Computing

### PEARC19 / Tutorial for Student Modeling Challenge

<hr></hr>

**If you haven't yet installed Anaconda Python Distribution (Python 3.7) on your laptop, do so now:**

https://www.anaconda.com/distribution/<br/>

**You can also download these slides and related material in a zip file at:**

https://tinyurl.com/PEARC19Py


**We have Anaconda Python and course materials on USB drives if you cannot download.**

# Python Intro

### Based on slides from PEARC19 

#### Chris Myers
*Center for Advanced Computing, Cornell University*

#### Kate Cahill 
*Ohio Supercomputing Center*

#### Aaron Weeden
*Shodor Education Foundation*
    
<br>

*based in part on previous PEARC Python tutorials developed by Steve Lantz (Cornell CAC) and others*

# Downloads

### Anaconda Python Distribution (Python 3.7) :  
https://www.anaconda.com/distribution/

### These slides and associated material in a zip file : 
https://tinyurl.com/PEARC19Py

* Unzip to create folder named: **PEARC19_IntroPython**
* Within that folder, find:
    * Jupyter notebook for these slides: **PEARC19_IntroPython.ipynb**
    * Notebook as static HTML page: **PEARC19_IntroPython.html**
    * Notebook as static Reveal.js HTML slides: **PEARC19_IntroPython.slides.html**
    * python files for exercises in subdirectory: **pyfiles**
    * associated images and data files in subdirectory: **otherfiles**
    

# Outline

1. Introduction to Python
3. Input / output
4. Control flow
5. Functions
6. Objects, classes, and built-in datatypes
7. Modules, import, and the Python ecosystem
9. Plotting
10. A concluding exercise

# Questions

* How many of you have prior experience programming in Python?
* If not Python, what other programming languages do you have experience with?

# Questions?

* If something is confusing, not working, or piques your curiosity, please let us know.
* Feel free to ask your neighbors for help, too.  Computing should be a communal activity.

# Python as a programming language

### General-purpose
* Python was not designed as a language for scientific/technical computing (unlike MATLAB, R, Mathematica)
* Additional functionality is provided by external libraries that can be imported

### Interpreted
* Code is executed by another program (an interpreter), not compiled into an executable
* Can be run standalone (<code>python filename.py</code>) or in an interactive mode

# Python as a programming language

### Dynamically typed
* Variables acquire the type of whatever object is assigned to them, do not need to be declared
* Code will run if objects are able to carry out the operations asked of them ("duck typing")

### Object-oriented
* Everything\* is an object, with an associated namespace (\*except for keywords that are part of language)
* Programming support for defining new classes of objects

# Python as an ecosystem

### Flexible and expressive programming language

### Rich set of external packages
* Python Standard Library
* Third-party libraries and tools for scientific computing, data science, visualization, web programming, etc.
* numpy, scipy, pandas, matplotlib, seaborn, scikit-learn, sympy, networkx, ipython, jupyter, etc.

### Enthusiastic and productive developer and user communities
* https://www.python.org

# Development and execution environments

We need to be able to write code, and to run code.  There are many ways to do this.

### Code Editors
* too many to name, but you want one that provides support for Python programming

### Python Interpreters
* python: the default interpreter (also known as CPython)
* ipython: enhanced interpreter to facilitate interactive exploration and development

### Notebooks and integrated development environments (IDEs)
* Jupyter: browser-based notebooks (integration of ipython, markdown and graphics)
* Spyder: IDE with editor, ipython interpreter, various explorer panes
* many others

# Development and execution environments (continued)

### Standalone editor + python/ipython in terminal
* Flexibility to use editor of choice and lightweight environment for running code
* ipython for exploratory data analysis or algorithm development / python for command line jobs

### Integrated development environment (IDE)
* Useful integration of editor and interpreter with additional support (e.g., file/variable explorers)
* Similar to environments developed for other languages (e.g., MATLAB GUI, RStudio)

### Jupyter notebooks
* Integrate code, documentation and graphics -- great for presenting computations and results
* Less useful as a development environment for creating large libraries or programs (but can run/load code developed elsewhere)


# Editors, terminals and applications

* If no editor and terminal, jupyter notebook provides these too
* Notebook will open in your default browser

### Starting jupyter notebook from the command line

* ```cd PEARC19_IntroPython``` before starting jupyter notebook
* enter ```jupyter notebook``` on the command line

### Starting jupyter notebook from Anaconda-Navigator

* select jupyter notebook from graphical panel
* navigate to ```PEARC19_IntroPython``` once you've started up jupyter notebook

![](otherfiles/anaconda-navigator-img.png)

# Writing and running our first program

<code># Our first program
print('Hello, world!')
</code>

<hr>

* Standalone editor + python/ipython in terminal
    * (either use your own editor and terminal, or...)
    * from jupyter Home, select ```New > Text File```
    * navigate to file, rename as ```hello.py```, add code, and save file
    * from jupyter Home, select ```New > Terminal```
    * run ```python hello.py``` in terminal
    * start ipython and enter ```%run hello.py```
    

# Running our program in spyder

### Starting spyder from the command line

* ```cd PEARC19_IntroPython```
* enter ```spyder``` on the command line

### Starting spyder from Anaconda-Navigator

* select spyder from graphical panel
* navigate to ```PEARC19_IntroPython``` once you've started up spyder

### Working in spyder

* Open existing file ```hello.py```  (or create a new file if want)
* Click "Run file" button (green right triangle) and observe output in the console window

# Running our program in a Jupyter notebook

* We're running in one now... 
* ( rendered as live slides via https://github.com/damianavila/RISE )
* After starting jupyter notebook, select **PEARC19_IntroPython.ipynb** and scroll down to here
* This cell is a **Markdown** cell, for text and documentation
* The cells below are **Code** cells
* Cell mode can be changed with the buttons/menus at the top of the web page
* A cell is **executed** via ```Shift+Enter```
* Any line starting with ```%``` or ```%%``` is a "magic" function in ipython/jupyter, not part of Python language

In [1]:
%run hello.py

Hello, World!


In [2]:
2 + 2

4

# Help

* Online Python documentation: https://docs.python.org/3/
* Interactive help function and ipython/jupyter ?/??

In [3]:
#help()

In [4]:
#print?

# The Python Scientific Computing Ecosystem

* Python Standard Library: system/os interaction, many utilities
* Third-party libraries (many wrappers around libraries written in **compiled** languages)
    * numpy: multidimensional arrays & array syntax; linear algebra; random numbers
    * scipy: numerical routines for many common algorithms
    * pandas: DataFrames and Series for dealing with tabular data
    * matplotlib: plotting and data visualization
    * etc.

* Anaconda and other distributions come bundled with many packages
* use ```conda``` package manager for installing /updating packages in Anaconda
* use ```pip``` package manager for installing from Python Package Index (PyPI) with any distribution

# NumPy

In [None]:
import numpy as np   # imports numpy, but calls it np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print( a )
print()

print( a[1,1] )
print( a[2] )
print( a[1:,1:] )
print()
print( a.shape, a.dtype )

# Array operations

* element-wise arithmetic operations
* equivalent to looping over all the elements and executing the operation
* array-level operations are cleaner and faster
* avoid for loops over arrays if you can

In [None]:
b = np.array([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]])
print( a, '\n')
print( b, '\n')

c = a + b    # a and b must be of the same shape

print( c, '\n')

d = a * b    # element-wise array multiplication: NOT matrix multiplication
print( d, '\n')

print( 100 * b )

# Array operations (continued)

* Mathematical functions (e.g., trig, log/exp, etc.)
* Utility functions

In [None]:
x = np.linspace(0., 1., 11)
print( x, '\n')
print( np.sin(x), '\n')

y = np.ones((4,4))
print( y )

# Array operations (continued)

* methods on arrays (over entire array, or along a specific axis, e.g., row or column)
* <code>sum(), mean(), std(), min(), max(), etc.</code>

In [None]:
print(a, '\n')

print( a.sum() )
print( a.sum(axis=0) )  # axis = 0: rows
print( a.sum(axis=1) )  # axis = 1: columns

print()

print( a.mean() )
print( a.mean(axis=1) )

# Random numbers

In [None]:
x = np.random.random(10)

print( x )
print( x.mean() )
print( np.mean(x) )

In [None]:
# generate N random steps in x-y plane (D=2)
N = 10000
steps = np.random.random((N,2))-0.5

# make a walk by cumulatively summing each step
walk = np.cumsum(steps, axis=0)

# plot the walk in the x-y plane
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(walk[:,0], walk[:,1])

# Exercise: more fun with random numbers

For values of N equal to [1, 10, 100, 1000, 10000, 100000, 1000000]:

compute the mean of N random numbers

print the value of N and the computed mean

In [None]:
# YOUR CODE HERE

In [None]:
# SOLUTION

for N in [1,10,100,1000,10000,100000,1000000]:
    rmean = np.mean(np.random.random(N))
    print(N, rmean)

In [None]:
# ALTERNATIVE SOLUTION

for N in 10**np.arange(0,6,1):
    rmean = np.mean(np.random.random(N))
    print(N, rmean)

# SciPy

In [None]:
# Root finding using fsolve

from scipy.optimize import fsolve

# Finds solutions of f(x)=0 (x can be array of values)

def myfunc(x):
    return 0.8 + 3.2*x - 2.5*x**2

root1 = fsolve(myfunc, x0=1.)
print( root1 )

root2 = fsolve(myfunc, x0=-1.)
print( root2 )

In [None]:
import matplotlib.pyplot as plt
xs = np.arange(-1, 2, 0.1)
plt.plot(xs, myfunc(xs))   # myfunc works on arrays too!
plt.plot(root1[0], 0., 'ro')
plt.plot(root2[0], 0., 'ro')

In [None]:
# ODE integration using odeint

from scipy.integrate import odeint

# Integrates general differential equation of the form dy/dt = f(y,t)
# y can have multiple components, stored in an array

def f(y,t):
    return -y

trajectory = odeint(f, y0=[1.0], t=np.linspace(0.,5.,51))

plt.plot(np.linspace(0.,5.,51), trajectory)

# Pandas

### DataFrames (and Series)

* Tabular data as you would find in a spreadsheet or csv-formatted file
* Each column is a Series, with a particular type (like a NumPy array)
* Columns can be different types
* Row and column labels (df.index and df.columns)
* Rows and columns can be indexed (accessed) by labels or position
* Follows similar logic as NumPy: axis=0 (rows) and axis=1 (columns)

In [30]:
import pandas as pd; import numpy as np
dates = ['2019-06-01', '2019-06-02', '2019-06-03', '2019-06-04', '2019-06-05', '2019-06-06', \
         '2019-06-07', '2019-06-08', '2019-06-09', '2019-06-10']
observers = ['Bob', 'Carol', 'Ted', 'Alice', 'Bob', 'Alice', 'Ted', 'Alice', 'Bob', 'Carol']
temperatures = np.round(list(70 + (10.*(np.random.random(10)-0.5))), 1)
rainfall = [0.,0.12,0.11,0.,0.51,0.43,0.02,0.,np.nan,0.32]
data_tuples = list(zip(dates,observers,temperatures,rainfall))
df = pd.DataFrame(data_tuples, columns=['Date', 'Observer', 'Temperature', 'Rainfall'])
df['Date'] = pd.to_datetime(df['Date'])
print( df )
df

        Date Observer  Temperature  Rainfall
0 2019-06-01      Bob         74.4      0.00
1 2019-06-02    Carol         67.7      0.12
2 2019-06-03      Ted         66.3      0.11
3 2019-06-04    Alice         74.5      0.00
4 2019-06-05      Bob         69.5      0.51
5 2019-06-06    Alice         66.7      0.43
6 2019-06-07      Ted         73.6      0.02
7 2019-06-08    Alice         66.9      0.00
8 2019-06-09      Bob         67.3       NaN
9 2019-06-10    Carol         73.4      0.32


Unnamed: 0,Date,Observer,Temperature,Rainfall
0,2019-06-01,Bob,74.4,0.0
1,2019-06-02,Carol,67.7,0.12
2,2019-06-03,Ted,66.3,0.11
3,2019-06-04,Alice,74.5,0.0
4,2019-06-05,Bob,69.5,0.51
5,2019-06-06,Alice,66.7,0.43
6,2019-06-07,Ted,73.6,0.02
7,2019-06-08,Alice,66.9,0.0
8,2019-06-09,Bob,67.3,
9,2019-06-10,Carol,73.4,0.32


In [33]:
print(df['Temperature'])

0    74.4
1    67.7
2    66.3
3    74.5
4    69.5
5    66.7
6    73.6
7    66.9
8    67.3
9    73.4
Name: Temperature, dtype: float64


In [32]:
df[['Date', 'Temperature']]

Unnamed: 0,Date,Temperature
0,2019-06-01,74.4
1,2019-06-02,67.7
2,2019-06-03,66.3
3,2019-06-04,74.5
4,2019-06-05,69.5
5,2019-06-06,66.7
6,2019-06-07,73.6
7,2019-06-08,66.9
8,2019-06-09,67.3
9,2019-06-10,73.4


In [34]:
df.loc[3:7, 'Rainfall']    # pandas slicing is inclusive of stop, unlike lists and numpy arrays

3    0.00
4    0.51
5    0.43
6    0.02
7    0.00
Name: Rainfall, dtype: float64

In [35]:
df.groupby('Observer').mean()   # an example of split-apply-combine

Unnamed: 0_level_0,Temperature,Rainfall
Observer,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,69.366667,0.143333
Bob,70.4,0.255
Carol,70.55,0.22
Ted,69.95,0.065
