![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)
![DEVELOP](../../DEVELOP_logo.png)

---

# Basic Python Continued

### Goddard Space Flight Center

#### September 25, 2017

---

### Recap

---

* Anaconda installation - virtual environment - using the Jupyter notebook (not the only way to execute Python code)
* Python as a calculator (simple math)
* Imports and the power they possess
* Strings, formatting, and printing
* Data types
* Conditionals
* Loops

### Getting this lecture...

---

You can download this lecture here by copying all the text and then saving it in an ASCII file (using a text editor) with the .ipynb extension. We will be using the code in this notebook interactively and you will probably want to run it yourself.

# Important

Please install the _netCDF4_ Python package first from your Anaconda command prompt via:

```bash
conda install netCDF4
```

### File I/O

---

File types:
* __ASCII/Binary__ - simple (binary isn't if you don't know the format)
* __CSV/JSON files__ - need specific format reader/writer package
* __Earth Science Structured data - HDF, netCDF4, etc.__ - more complex

### ASCII/Binary

---

Old:
```python
f = open('filename.ascii', 'w')
f.write('Hi there.')
f.close()
```

New:
```python
with open('filename.ascii', 'w') as f:
    f.write('Hi there.')
```

__Note:__ Binary read/write is simply just adding a _'b'_ after the mode of opening the file (eg. 'wb' for writing binary).  
> __File modes:__ r, w, a, + versions, b versions

Although I'm not going to cover them, CSV and JSON file content types are very useful in applications today. Most web applications (GET/POST requests) use JSON for data transactions. The __[csv](http://docs.python.org/2/library/csv.html)__ and __[json](http://docs.python.org/2/library/json.html)__ packages are very useful for dealing with these data.

> _pickles..._ Python has this thing called pickles where you "temporarily" store data. It's a binary file, but is only for small storage that is needed for a short time.

### NumPy: Multidimensional Arrays in Python

---

__[numpy](https://docs.scipy.org/doc/numpy-1.13.0/reference/)__ is short for numerical Python, and is a very powerful and well-supported package that adds multidimensional arrays and numerous mathematical and statistical operations to Python. An even more feature-rich and powerful option is __[scipy](https://docs.scipy.org/doc/scipy/reference/)__, which expands on __```numpy```__.

The core element of __```numpy```__ is the n-dimensional array, or ndarray. To create an array, one has multiple options, including entering numbers directly:

In [1]:
import numpy as np
a = np.array([[1,5,8],[5,7,2]])
print(a)

[[1 5 8]
 [5 7 2]]


...converting from a list or other iterable:

In [2]:
l = [5,6,8,3,8,4]
a = np.array(l)
print(a)

[5 6 8 3 8 4]


...or using one of the specialized constructor functions to make arrays full of ones, random values, or, in this case, zeros:

In [4]:
a = np.zeros((3,3))
print(a)

[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]


There are also ways to read text files directly into NumPy arrays (```loadtxt``` or ```genfromtext```).

Once you have an array, there are numerous operations available to you, both element-wise (i.e. performed on each value of the array):

In [9]:
a = np.array([[1,5,8],[5,7,2]])
b = np.array([[5,3,9],[0,3,7]])
c = a + b
print(c)
d = b * 3
print(d)

[[ 6  8 17]
 [ 5 10  9]]
[[15  9 27]
 [ 0  9 21]]


...and across an entire array:

In [7]:
total = np.sum(a)
print(total)
avg = np.mean(b)
print(avg)

28
4.5


NumPy has many other cool abilities, like performing math along specific axes of an array (e.g. summing all of the columns of a 2D array) and using masks. Finally, particularly enthusiastic geospatial analysts can apply NumPy's extensive capabilities to raster images by reading said rasters into NumPy and then manipulating to their heart's content. Two ways to perform this conversion are 1) __[GDAL](https://github.com/edmondb/developython/blob/master/Archive/Lectures_201702/Week_05/week_5.ipynb)__, a powerful raster analysis program covered in a lecture from a previous term (see link) or 2) ```arcpy.RasterToNumPyArray```, a simpler but more constrained option.

However, for analysis problems where NumPy or SciPy's mathematical and statistical capabilities are not necessary, processing rasters directly in ArcPy (to be covered in the next lecture), will be most straightforward.

### Earth Science Structured Data

---

The __[h5py](http://docs.h5py.org/en/latest/)__ and __[netcdf4](http://unidata.github.io/netcdf4-python/)__ Python packages are very useful for reading structured data (multidimensional, multivariate, time-series, etc.). Here, we are going to look at reading and visualizing some structured data.

#### ISS RapidScat Data - netCDF4

---

- Make sure you have installed the _netCDF4_ Python package.
- Retrieve the data via FTP:

[Link for manual download via FTP](ftp://podaac-ftp.jpl.nasa.gov/allData/rapidscat/L2B12/v1.3)

In [None]:
import ftplib

ftp = ftplib.FTP('podaac-ftp.jpl.nasa.gov')
ftp.login()
ftp.cwd('allData/rapidscat/L2B12/v1.3/2016/232')
ftp.retrbinary('RETR rs_l2b_v1.3_10827_201609290531.nc.gz', open('ISS.nc.gz', 'wb').write)
ftp.quit()

- Manually unzip/uncompress this file.
- Let's read the data now...

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import netCDF4 as nc

f = nc.Dataset('ISS.nc', 'r')
print(f.variables.keys())

In [None]:
z = f.variables['retrieved_wind_speed']

In [None]:
z.dimensions

### Visualizing

---

Matplotlib is basically Python's replacement for Matlab's plotting capabilities. Here is an example of plotting data from our file.

In [None]:
fig = plt.figure(figsize = (20,20))
ax = fig.add_subplot(111)
img = ax.imshow(f.variables['retrieved_wind_speed'][:].transpose(), interpolation=None)

We could extend this to actually visualize the data plotted on a map (using the Basemap package) and manipulate the NumPy array to give us statistics or further insight into what the data shows us and how it is characterized.