# Getting Started with Python for Earth Sciences: Jupyter Notebooks and Numpy
## Eviatar Bach
PhD student, Department of Atmospheric and Oceanic Science, University of Maryland, College Park

[My website](http://eviatarbach.com/) | [My email](mailto:eviatarbach@protonmail.com)

Notes prepared with Rebekah Esmaili and Kriti Bhargava

# Introduction

---

## Why Python?

Pros

* General-purpose, cross-platform
* Free and open source
* Reasonably easy to learn
* Expressive and succinct code, forces good style
* Being interpreted and dynamically typed makes it great for data analysis
* Robust ecosystem of scientific libraries, including powerful statistical and visualization packages
* Large community of scientific users and large existing codebases
* Major investment into Python ecosystem by Earth science research agencies, including NASA, NCAR, UK Met Office, and Lamont-Doherty Earth Observatory. See [Pangeo](https://pangeo.io/collaborators.html).
* Reads Earth science data formats like HDF, NetCDF, GRIB

Cons

* Performance penalties for interpreted languages, although many libraries are wrappers for compiled languages. Avoid large loops in favor of matrix/vector operations when possible.
* Multithreading is limited due to the Global Interpreter Lock, but other parallelism is available
* See [Julia](https://julialang.org/) for a modern scientific language which is trying to overcome these challenges

---

## Objective: working with Earth science datasets

* You won't learn how to code in Python
* You will learn to:
	* Read/write ASCII data
	* Basic plotting and visualization
	* Perform data filtering
    
---

Python is an interpreted language, so you will need as a minimum to have Python on your computer.

## What is Anaconda?

* Conda is a package manager
* Anaconda comes with conda, as well as Python, a lot of useful scientific/mathematical packages, and development environments.
* Easiest place to start if you're new

## Development environments

* Spyder: most Matlab-like
* Jupyter notebooks: web based, runs code inline. Can also be run remotely over SSH; see [here](https://fizzylogic.nl/2017/11/06/edit-jupyter-notebooks-over-ssh/).
* Text editor + run with command line for scripting ([IPython interpreter](https://ipython.org/) highly recommended)

---
## Launching Jupyter Notebook

### Linux/Mac

* Open terminal, **cd to the directory where you have your notebooks and data**, and type:
```
jupyter notebook    
```

### Windows

* Start &rarr; Anaconda3 &rarr; Jupyter Notebook


## Jupyter Home Screen

* This will launch your default web browser with a local webserver that displays the contents of the directory that you're working in.

* Note: in all the examples, the path assumed that Jupyter is launched from the notebook directory. You will need to change the path to point to your data if this is different.

* Click on New on the top right.

![](figures/jn-screenshot.png)

<div class="alert alert-block alert-info">

<b> Exercise 1: Set up your environment and create a notebook </b>

* For your operating system, launch Jupyter Notebooks
* Create a new notebook
* Change the name from "untitled" to something better
* Save in the __same directory as the data folder__ that we provided (or move the data directory to the same place at the file because we'll need it later!).

</div>

## Basic Python commands


In [2]:
# This is how we comment and below is how we print
print("Hello, world!")

Hello, world!


In [3]:
# for loop
for i in range(5):
    print(i)

0
1
2
3
4


In [17]:
# iterating over list elements
print("List of hurricanes in 2019:")

# hurricanes is a list (in computer science terminology, a linked list)
hurricanes = ["Barry", "Dorian", "Humberto", "Jerry", "Lorenzo", "Pablo"]

for idx, name in enumerate(hurricanes): # notice the colon at the end
    print(idx + 1, name) # because index starts from zero in python

List of hurricanes in 2019:
1 Barry
2 Dorian
3 Humberto
4 Jerry
5 Lorenzo
6 Pablo


In [6]:
a = [1, 2, 3, 4, 5, 6]
b = a*2

In [7]:
# Is this what you expected to happen?
b

[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]

* Python's default list structure is for any type, not designed specifically for numeric operations
* Need to use an additional package to do vector/matrix operations

## Importing packages
Packages give us additional functionality, saving us the trouble of writing procedures ourselves. There are ~6000 packages in the [conda-forge repository](https://anaconda.org/conda-forge/repo) alone!

We'll now import **NumPy**. NumPy provides high-performance multidimensional arrays and linear algebra operations (similar to Matlab). It is a fundamental package for scientific computing with Python.

In [9]:
# Importing NumPy
import numpy as np # np becomes the alias for numpy

In [10]:
# NumPy arrays
a = np.array(a) # a is now a NumPy array
b = a*2

print(b)

[ 2  4  6  8 10 12]


In [11]:
# Reshaping arrays
a_reshaped = a.reshape(3, 2)
print(a_reshaped)

[[1 2]
 [3 4]
 [5 6]]


In [12]:
# Sum vertically downwards across rows (axis 0)
a_reshaped.sum(axis=0)

array([ 9, 12])

In [14]:
print(a_reshaped.sum(axis=1))  # sum across columns
print(a_reshaped.sum())  # sum entire array

[ 3  7 11]
21


In [15]:
# Get the minimum horizontally across columns (axis 1)
a_reshaped.max(axis=1)

array([2, 4, 6])

In [16]:
# Boolean operations
a_reshaped > 1

array([[False,  True],
       [ True,  True],
       [ True,  True]])

In [22]:
# Has many capabilities; for example, the code below creates a random linear system and solves it

# Standard Gaussian distributed 10-by-10 matrix (A) and 10-element vector (b)
A = np.random.randn(10, 10)
b = np.random.randn(10)

# Solve A*x == b
x = np.linalg.solve(A, b)

print(x)

[ 1.3154697   0.02241344  0.97221131  0.57315349 -1.30022562 -0.02177485
  0.00327059 -2.22128829  0.28494407  0.19229305]


In [27]:
# Verify that x is an (approximate) solution. Note @ operator for matrix multiplication; np.dot works too.
A@x - b

array([-2.22044605e-16,  6.66133815e-16, -1.11022302e-16, -5.55111512e-16,
       -9.15933995e-16,  0.00000000e+00, -1.27675648e-15, -3.33066907e-16,
        5.75928194e-16, -8.88178420e-16])

For more examples, work through the [NumPy quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html)