# Python interpreter

The standard interpreter of Python can be started by running ```python``` on terminal. Scripts can be developed interactively by. This helps developers to code faster and immediately see the errors. 

Hovewer, this interpreter has some limitations compared to other options (see [Ipython](#Ipython-interpreter)).

<center>
    
![](images/python.png)

</center>

Python scripts can also be executed by running 

```
python analysis_script.py
```

# Ipython interpreter

Ipython is the improved version of the Python interpreter. It has the following extra features compared to the standard Python interpreter.

- tab completation
- explore your objects (type: object_name?)
- has magic functions
  - %timeit range(1000)
- history
- %pastebin 3 18-20 ~1/1-5

<center>

![Download NetCDF](images/ipython.png)

</center>

# Jupyter notebooks

Jupyter notebooks have many advantages over Python and Ipython interpreters

- In-browser editing for code, with automatic syntax highlighting, indentation, and tab completion/introspection.

- The ability to execute code from the browser, with the results of computations attached to the code which generated them.

- result of computation using as HTML, LaTeX, PNG, SVG, etc.

- results and figures can be included inline.

- markdown markup language support for documentation

- include mathematical notation using LaTeX

- easy to share results ( including computation results and generatedfigures)

<center>

![Download NetCDF](images/jupyter.png)

</center>

# Jupyterlab

Jupyterlab is a web-based IDE which provides more complete environment for code development.


- You can browse the files in your computer
- Open multiple notebooks, terminals in sperate tabs
- Install extensions using extension browser
- Run, save, export notebooks

<center>

| ![Slides](images/jupyterlab_slide.png) | ![Extensions](images/jupyterlab_extensions.png) |

</center>


# Reading data

Python has extensive support for various data formats. Most common scientific data formats can be used after installing relevant modules.

## Generic examples using pandas

### Load pandas

In [1]:
# Load the Pandas module with alias 'pd'
import pandas as pd

### Example - Reading CSV file

```python
# read data from 'data.csv'
data = pd.read_csv("data.csv")
```

### Reading other formats

<center>

| ![pandas](images/pandas_io.png) |

</center>

## Reading NetCDF files for this workshop

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

NetCDF is a popular data format in **climate research**.

For more info:

[https://en.wikipedia.org/wiki/NetCDF](https://en.wikipedia.org/wiki/NetCDF)

[https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/netcdf-overview](https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/netcdf-overview)

## Import modules to read NetCDF

Note: Scipy module also has NetCDF support but it only supports NetCDF version 3. We will use **netCDF4** module to load NetCDF version 4.

In [2]:
import netCDF4
from netCDF4 import num2date, date2num, date2index # to manuplate time data
import numpy as np
from datetime import datetime, timedelta

We will use the following data sets:

* Reanalysis data Era interim - now, 2-meter temperature:  http://climexp.knmi.nl/ERA-interim/erai_t2m.nc
* 1981 - now, sea ice cover: http://climexp.knmi.nl/NCEPData/iceoi_v2.nc

## Reanalysis data

In [3]:
# open a NetCDF Dataset object:
f1 = netCDF4.Dataset('data/erai_t2m.nc')

In [4]:
print(f1.variables.keys()) # get all variable names

dict_keys(['lon', 'lat', 'time', 't2m'])


In [5]:
time = f1.variables['time'] # time variable
print(time) 

<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
    standard_name: time
    units: months since 1979-01-15 00:00:00
    calendar: proleptic_gregorian
unlimited dimensions: time
current shape = (484,)
filling on, default _FillValue of 9.969209968386869e+36 used


## Sea Ice data

In [6]:
# open a NetCDF Dataset object:
f2 = netCDF4.Dataset('data/iceoi_v2.nc')

In [7]:
print(f2.variables.keys()) # get all variable names

dict_keys(['time', 'lon', 'lat', 'ice'])


In [8]:
ice = f2.variables['ice'] # time variable
print(ice) 

<class 'netCDF4._netCDF4.Variable'>
float32 ice(time, lat, lon)
    long_name: Reynolds OI ice cover
    units: 1
    _FillValue: 3e+33
unlimited dimensions: 
current shape = (454, 180, 360)
filling on


## Formatting time

In [9]:
time = f2.variables['time']
print('time shape = %s' % time.shape)
print('time units = %s' % time.units)
print('time calendar = %s' % time.calendar)
print(time[8])
print(time[15])

# convert time variable to date
dates = [datetime(1981,11,1) + n * timedelta(seconds = 365.25/12*24.0*3600.0) for n in range(time.shape[0])]
print([date.strftime('%Y-%m-%d %H:%M:%S') for date in dates[:10]]) # print only first ten elements


ice = f2.variables['ice']
print('ice dimensions = %s, ice shape = %s' % (ice.dimensions, ice.shape))

time shape = 454
time units = months since 1981-11-01
time calendar = gregorian
8.0
15.0
['1981-11-01 00:00:00', '1981-12-01 10:30:00', '1981-12-31 21:00:00', '1982-01-31 07:30:00', '1982-03-02 18:00:00', '1982-04-02 04:30:00', '1982-05-02 15:00:00', '1982-06-02 01:30:00', '1982-07-02 12:00:00', '1982-08-01 22:30:00']
ice dimensions = ('time', 'lat', 'lon'), ice shape = (454, 180, 360)


# Next notebook: 
## - reading NetCDF data using Iris
## - data visualization