# Day 2: From text files to plots
Today we'll learn how to read text files into Python, how to extract data from those files, and how to create plots.

Session outline:
1. Introduction to `numpy` (Matlab-style matrices in Python)
2. Creating plots with `matplotlib` (functionally similar to Matlab plots)
3. Loading data from text files with `numpy`
4. More plotting

## Arrays with `numpy`
Numpy is a matrix manipulation package for Python. Numpy arrays are similar to Matlab marices, although there are some notable differences, which are outlined in:
* https://numpy.org/doc/stable/user/numpy-for-matlab-users.html

In [1]:
import numpy as np # import the numpy package

# creating one-dimensional numpy arrays

# array indexing

# numpy arrays vs. Python lists

# element-wise operations

# creating 2-dimensional arrays

# matrix transposition and multiplication


## Plotting with `matplotlib`
Using `matplotlib` to plot data stored in `numpy` arrays.

In [2]:
import numpy as np
import matplotlib.pyplot as plt

# needed to use matplotlib in Jupyter notebooks
%matplotlib inline 

## Loading text files
First, let's print the contents of the file.

In [3]:
%%bash
cat D2/Dovre1-Snoheim.txt

30.06.2012 14:16:00	+6.460000e+002	+7.158069e+000	+1.118880e+001	+2.454168e+002
30.06.2012 14:26:00	+6.460000e+002	+6.973492e+000	+5.361300e+000	+2.334624e+002
30.06.2012 14:36:00	+6.460000e+002	+7.065771e+000	+5.128200e+000	+2.334624e+002
30.06.2012 14:46:00	+6.460000e+002	+7.158069e+000	+4.972800e+000	+2.310012e+002
30.06.2012 14:56:00	+6.460000e+002	+6.973492e+000	+4.662000e+000	+2.376816e+002
30.06.2012 15:06:00	+6.460000e+002	+6.881232e+000	+4.662000e+000	+2.443620e+002
30.06.2012 15:16:00	+6.460000e+002	+6.788991e+000	+4.195800e+000	+2.327592e+002
30.06.2012 15:26:00	+6.460000e+002	+6.696770e+000	+3.807300e+000	+2.341656e+002
30.06.2012 15:36:00	+6.460000e+002	+6.512383e+000	+4.584300e+000	+2.447136e+002
30.06.2012 15:46:00	+6.460000e+002	+6.696770e+000	+4.895100e+000	+2.253756e+002
30.06.2012 15:56:00	+6.460000e+002	+6.973492e+000	+4.195800e+000	+2.362752e+002
30.06.2012 16:06:00	+6.460000e+002	+7.527452e+000	+4.428900e+000	+2.183436e+002
30.06.2012 16:16:00	+6.460000e+002	+7.71

Next, let's load the data into Python with the `loadtext` function in `numpy`.
* https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

In [4]:
# loading text files with np.loadtext
filename = 'D2/Dovre1-Snoheim.txt'

That didn't work!

`numpy.loadtext` doesn't know what to do with the datetime strings in the first column. Because `numpy` arrays can only contain numbers we need to convert the strings into numbers when loading the file.

In [6]:
# converting datetime strings to and from floats
from datetime import datetime
from matplotlib.dates import num2date, date2num

# Plotting the data we loaded
1. Extract the data we want to plot
2. Convert floats back to dates
3. Create the plot

In [57]:
# make the columns available as variables


In [7]:
# create the plot
import matplotlib.pyplot as plt

# add y-axis label

# change date format
import matplotlib.dates as mdates


## Windspeed on the right y-axis

In [8]:
# create the plot

# add y-axis label

# change date format
import matplotlib.dates as mdates

# add windspeed on the right y-axis


# Exercise: Airtemp vs. windspeed scatter plot
1. Create a scatter plot with air temperature (`airtemp`) on the x axis and wind speed (`windspeed`) on the y axis. Add axis labels, set axis limits, adjust colors, etc.
2. Save the figure as a pdf using the `plt.savefig` command
    * https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.savefig.html

# Exercise: recreate the following figure
* Data: `D2/rro_Bulken.txt`
* Runoff is the 3-rd column from the right
<img src="D2/bulken.png">

In [115]:
%%bash
cat D2/rro_Bulken.txt

# D�gn-verdier for stasjon Bulken (Vangsvatnet), fra 2018.11.9 til 2018.9.11
# Dato              Vannstand          Vannf�ring                 75p              median                 25p
11092018               1.7705            186.1244              87.999              48.721              26.566
12092018               2.7034            288.1534              96.158              48.961              25.687
13092018               3.1211            335.3013             101.853              50.090              24.708
14092018               3.1829            342.3569             103.798              50.019              23.547
15092018               3.1459            338.1367             106.052              49.866              22.447
16092018               2.4483            259.8688             113.321              53.413              21.875
17092018               2.0864            220.4581             123.205              56.713              21.501
18092018               2.1872            23

In [9]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from datetime import datetime
from matplotlib.dates import num2date, date2num

def str2date_rro(s):
    date = datetime.strptime(s, "%d%m%Y")
    return date2num(date)

def missing_to_NaN(istr):
    ''' Convert a string containing a number to a float, interpreting unparsable strings as NaN '''
    try:
        val = float(istr)
    except ValueError:
        val = float('NaN')
    
    return val

data = np.loadtxt("D2/rro_Bulken.txt", encoding='latin1', converters={
    0: str2date_rro,
    1: missing_to_NaN,
    2: missing_to_NaN,
})

### YOUR CODE HERE ###

# Exercise: recreate the following figure
* Data: `D2/rr24_Bulken.txt`
* You can create bar plots with ``plt.bar``. The width of the bars can be changed by the ``width=value``-keyword argument, which with time date takes a value in the unit of days.
 
 <img src="D2/bulken_precip.png">

In [133]:
%%bash
cat D2/rr24_Bulken.txt

DØGNVERDIER

Stasjoner
  Stnr Navn   I drift fra I drift til Hoh Breddegrad Lengdegrad Kommune Fylke     Region
 51470 BULKEN jan 1895                328    60.6455     6.2220 Voss    Hordaland VESTLANDET


Elementer
 Kode Navn       Enhet
 RR   Nedbør     mm
 SA   Snødybde   cm
 SD   Snødekke   kode
 SLAG Nedbørslag kode



****************** MELDING *****************
Dataverdi merket x betyr manglende tilgang eller at kvaliteten er 'Svært usikker, modelldata' (Nivå 6 eller mer).
********************************************

    Stnr       Dato         RR         SA SD SLAG
   51470 01.08.2018       36.7          0  -    -
   51470 02.08.2018        1.2          0  -    -
   51470 03.08.2018        0.7          0  -    -
   51470 04.08.2018       11.5          0  -    -
   51470 05.08.2018        3.5          0  -    -
   51470 06.08.2018        1.5          0  -    -
   51470 07.08.2018        4.4          0  -    -
   51470 08.08.2018        0.5          0  -    -
   51470 09.08.201

In [10]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from datetime import datetime
from matplotlib.dates import num2date, date2num

def str2date_rr24(s):
    date = datetime.strptime(s, "%d.%m.%Y")
    return date2num(date)

def missing_to_NaN(istr):
    ''' Convert a string containing a number to a float, interpreting unparsable strings as NaN '''
    try:
        val = float(istr)
    except ValueError:
        val = float('NaN')
    
    return val

data = np.loadtxt("D2/rr24_Bulken.txt", encoding='latin1', usecols=(1, 2, 3), skiprows=21, converters={
    1: str2date_rr24,
    2: missing_to_NaN,
    3: missing_to_NaN,
})

### YOUR CODE HERE ###