# <center> Getting Started with Python for Earth Sciences: Jupyter Notebooks and Numpy </center>
## <center> Kriti Bhargava and Eviatar Bach </center>
<center>Postdoctoral associate, UMD | kritib@umd.edu</center>
<center>PhD student, UMD | eviatarbach@protonmail.com</center>


# Introduction

## Why Python?

Pros

* General-purpose, cross-platform
* Free and open source
* Reasonably easy to learn
* Expressive and succinct code, forces good style
* Being interpreted and dynamically typed makes it great for data analysis
* Robust ecosystem of scientific libraries, including powerful statistical and visualization packages
* Large community of scientific users and large existing codebases
* Major investment into Python ecosystem by Earth science research agencies, including NASA, NCAR, UK Met Office, and Lamont-Doherty Earth Observatory. See [Pangeo](https://pangeo.io/collaborators.html).
* Reads Earth science data formats like HDF, NetCDF, GRIB

Cons

* Performance penalties for interpreted languages, although many libraries are wrappers for compiled languages. Avoid large loops in favor of matrix/vector operations when possible.
* Multithreading is limited due to the Global Interpreter Lock, but other parallelism is available
* See Julia for a modern scientific language which is trying to overcome these challenges

### Why we use Python 3?
* As of January 2020 Python 2 would come to end of life
    * No more updates or bugfixes
    * No further official support
* Subtle differences: https://www.geeksforgeeks.org/important-differences-between-python-2-x-and-python-3-x-with-examples/

---

## Objective

* You won't learn how to code in Python
* You will learn to:
	* Read/write ascii data
	* Basic plotting and visualization
	* Saving files and data
* By the end of this class, you should be able to __analyze and visualize satellite datasets.__
    
---

Python is an interpretted language, so you as minimum you need to have Python on your computer.

## Packages

Packages give us additional functionality, saving us the trouble of writing procedures ourselves. There are ~6000 packages in the [conda-forge repository](https://anaconda.org/conda-forge/repo) alone!

In this workshop, along with others, we'll discuss:

* [NumPy](http://www.numpy.org/) Fast mathematical operations on large datasets.
* [xarray](http://xarray.pydata.org/) Makes working with multidimensional data eay and effient. Introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays
* [Matplotlib](https://matplotlib.org) Primarily python plotting/visualization package. You can generate plots, histograms, scatterplots, etc., with just a few lines of code.
* [Cartopy](https://scitools.org.uk/cartopy/) Package designed for geospatial data processing in order to produce maps and other geospatial data analyses.

---

## What is Anaconda?

* Anaconda is a package manager
* Comes bundled with Python, a lot of useful scientific/mathematical packages, and development environments.
* Easiest place to start if you new

---

## Development environments

* Spyder: most Matlab-like
* Jupyter notebooks: web based. Similar to Mathematica, runs code inline
* Text editor + run with command line for scripting

---
## Launching Jupyter Notebook

### Linux/Mac

* Open terminal, **cd to the directory where you have your notebooks and data**, and type:
```
jupyter notebook    
```

### Windows

* Start &rarr; Anaconda3 &rarr; Jupyter Notebook


## Jupyter Home Screen

* This will launch your default web browser with a local webserver that displays the contents of the directory that you're working in.

* Note: in all the examples, the path assumed that jupyter is launched from the notebook directory. You will need to change the path to point to your data if this is different.

* Click on New on the top right.

<div class="alert alert-block alert-info">

<b> Exercise 1: Set-up your environment and create a notebook </b>

* For your opertating system, launch Jupyter Notebooks
* Create a new notebook
* Change the name from "untitled" to something better
* Save in the __same directory as the data folder__ that we provided (or move the data directory to the same place at the file because we'll need it later!).

</div>

## Very Basic Python Commands


In [2]:
# This is how we comment nad below is how we print
print ("Hello, World!")

Hello, World!


In [12]:
# for loop
max=5
for i in range(5):
    print (i)
    

0
1
2
3
4


In [8]:
# iterating over list elements
print ("List of hurricanes in 2019.")
list_name=["Barry","Dorian","Humberto", "Jerry","Lorenzo", "Pablo"]
for idx, name in enumerate(list_name): #notice the colon at the end
    print (idx+1, name) #because index starts from zero in python

List of hurricanes in 2019.
1 . Barry
2 . Dorian
3 . Humberto
4 . Jerry
5 . Lorenzo
6 . Pablo


In [21]:
# lists vs array
a= [1,2,3,4,5,6]
b=a*2
print (b)


[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]


* Math didn't "happen" because it was a list, not a number array or matrix.
* Need to use an additional package to do matrix operation



## Numpy
* Provides a high-performance multidimensional array object and tools for working with these arrays.
* fundamental package for scientific computing with Python
* Mostly comes with anaconda installation

For more examples than presented below, please refer https://numpy.org/devdocs/user/quickstart.html


In [18]:
# Importing numpy
import numpy as np # np becomes the alias for numpy


In [22]:
# Numpy arrays
a=np.array(a) # a is now a numpy array
b=a*2
print (b)


[ 2  4  6  8 10 12]


In [23]:
# Reshaping arrays
a_reshaped = a.reshape(3,2)
print (a_reshaped)

[[1 2]
 [3 4]
 [5 6]]


In [30]:
# Sum of array elements
print (np.sum(a_reshaped,axis=0)) # Sum of elements along axis 0
print (a_reshaped.sum(axis=1))
print (a_reshaped.sum())          # Sum of all elements

[ 9 12]
[ 3  7 11]
21


In [36]:
# Other basic math functions
print (a_reshaped.max(axis=0))

[5 6]


### Numpy masked arrays
* For doing numpy like operations on arrays with missing or invalid values
* Operations with masked arrays generally slower than numpy.ndarray
* A masked array has two componenents
    * numpy.ndarray
    * mask
    
For details refer:
https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

In [39]:
import numpy.ma as ma

# lets mask values smaller ot equal to 2
masked_a = ma.masked_array(a, mask=[1, 1, 0, 0, 0,0]) # with an explicit mask
print (masked_a)



[-- -- 3 4 5 6]


In [41]:
# Create mask using conditions
masked_a2 = ma.masked_where(a <= 2, a)
print (masked_a2)

[-- -- 3 4 5 6]
