# Day 2: From text files to plots
Today we'll learn how to read text files into Python, how to extract data from those files, and how to create plots.

Session outline:
1. Introduction to `numpy` (Matlab-style matrices in Python)
2. Creating plots with `matplotlib` (functionally similar to Matlab plots)
3. Loading data from text files with `numpy`
4. More plotting

## Arrays with `numpy`
Numpy is a matrix manipulation package for Python. Numpy arrays are similar to Matlab marices, although there are some notable differences, which are outlined in:
* https://numpy.org/doc/stable/user/numpy-for-matlab-users.html

In [None]:
import numpy as np # import the numpy package

# creating one-dimensional numpy arrays
v = np.zeros(10)
print(v)
v = np.ones(10)
print(v)
print(2*v)

# array indexing
print("array indexing")
v = np.array([1,2,3], dtype=float) # create an array from a list
print(v)
print(v[2])

# numpy arrays vs. Python lists
v[1] = 10
print(v)
# v[1] = "foo" # ValueError: np arrays have fixed type

# element-wise operations
print("element-wise operations")
v1 = np.ones(5)
v2 = 10*np.ones(5)
print(v1)
print(v2)
print(v1+v2)
print(v1*v2)

# creating 2-dimensional arrays
A = np.zeros((10, 5)) # note that argument is a tuple
print(A)
B = np.ones((10, 5))
print(A + B)

# matrix transposition and multiplication
print(A.transpose())
print(np.dot(B.transpose(), B))
print(np.eye(5)) # the identity matrix (as in Matlab)

## Plotting with `matplotlib`
Using `matplotlib` to plot data stored in `numpy` arrays.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
# plt.style.use("ggplot")

# needed to use matplotlib in Jupyter notebooks
%matplotlib inline 
ys = np.random.randn(100)
xs = np.arange(0, 100)
plt.plot(xs, ys, ".")
plt.xlim(0, 100)
plt.ylim(-3, 3)
plt.grid()
plt.title("Scatter plot")
plt.xlabel("x")
plt.ylabel("y")

## Loading text files
First, let's print the contents of the file.

In [None]:
%%bash
cat D2/Dovre1-Snoheim.txt

Next, let's load the data into Python with the `loadtext` function in `numpy`.
* https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

In [None]:
# loading text files with np.loadtext
filename = 'D2/Dovre1-Snoheim.txt'
np.loadtxt(filename, delimiter="\t", encoding="utf8")

That didn't work!

`numpy.loadtext` doesn't know what to do with the datetime strings in the first column. Because `numpy` arrays can only contain numbers we need to convert the strings into numbers when loading the file.

In [None]:
# converting datetime strings to and from floats
from datetime import datetime
from matplotlib.dates import num2date, date2num
def str2date(s):
    date = datetime.strptime(s, "%d.%m.%Y %H:%M:%S")
    return date2num(date)
data = np.loadtxt('D2/Dovre1-Snoheim.txt', encoding='utf8', delimiter='\t', converters={
    0: str2date
})

# indexing into the data
print(data.shape)
print(data[0,0], "converts to", num2date(data[0,0]))

# Plotting the data we loaded
1. Extract the data we want to plot
2. Convert floats back to dates
3. Create the plot

In [None]:
# make the columns available as variables
dates = num2date(data[:, 0])
airtemp = data[:, 2]
windspeed = data[:, 3]

In [None]:
# create the plot
import matplotlib.pyplot as plt
plt.plot(dates, airtemp)

# add y-axis label
plt.ylabel("Temperature [C]")
plt.grid()

# change date format
import matplotlib.dates as mdates
ax = plt.gca() # gca: Get Current Axes
ax.xaxis.set_major_formatter(mdates.DateFormatter("%d %b"))

## Windspeed on the right y-axis

In [None]:
# create the plot
plt.figure(figsize=(9.6, 3.2), dpi=96)
plt.plot(dates, airtemp, color="C0", alpha=0.7)

# add y-axis label
plt.ylabel("Temperature [C]", color="C0")
plt.grid()

# change date format
import matplotlib.dates as mdates
ax = plt.gca() # gca: Get Current Axes

# add windspeed on the right y-axis
ax1 = plt.gca() # pyplot graphics object

ax2 = ax1.twinx()
ax2.plot(dates, windspeed, color="C1", alpha=0.7)
ax2.set_ylabel("Windspeed [m/s]", color="C1")

ax.xaxis.set_major_formatter(mdates.DateFormatter("%d %b"))

plt.xlim(dates[0], dates[-1])

# Exercise: Airtemp vs. windspeed scatter plot
1. Create a scatter plot with air temperature (`airtemp`) on the x axis and wind speed (`windspeed`) on the y axis. Add axis labels, set axis limits, adjust colors, etc.
2. Save the figure as a pdf using the `plt.savefig` command
    * https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.savefig.html

In [None]:
plt.figure(figsize=(9.6, 9.6), dpi=96)
plt.plot(airtemp, windspeed, ".")
plt.xlim(-5, 17.5)
plt.ylim(0, 14)
plt.grid()
plt.xlabel("Air temperature [C]")
plt.ylabel("Wind speed [m/s]")

# Exercise: recreate the following figure
* Data: `D2/rro_Bulken.txt`
* Runoff is the 3-rd column from the right
<img src="D2/bulken.png">

In [None]:
%%bash
cat D2/rro_Bulken.txt

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from datetime import datetime
from matplotlib.dates import num2date, date2num

def str2date_rro(s):
    date = datetime.strptime(s, "%d%m%Y")
    return date2num(date)

def missing_to_NaN(istr):
    ''' Convert a string containing a number to a float, interpreting unparsable strings as NaN '''
    try:
        val = float(istr)
    except ValueError:
        val = float('NaN')
    
    return val

data = np.loadtxt("D2/rro_Bulken.txt", encoding='latin1', converters={
    0: str2date_rro,
    1: missing_to_NaN,
    2: missing_to_NaN,
})

# extract columns
dates = num2date(data[:, 0])
observed = data[:, 2] # observed daily flow
median = data[:, 4] # median flow
p25 = data[:, 3]
p75 = data[:, 5]

# plot
plt.figure(figsize=(9.6, 3.2), dpi=96)
plt.plot(dates, median, label="Median")
plt.plot(dates, observed, label="Observed")
plt.fill_between(dates, p25, p75, color="0.85", label="25-th to 75-th interquantile range")

# configure
plt.legend()
plt.grid()
plt.ylabel("Runnoff at Bulken [m³/s]")
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%d %b"))
plt.ylim(0, 600)
plt.xlim(dates[0], dates[-1])

# Exercise: recreate the following figure
* Data: `D2/rr24_Bulken.txt`
* You can create bar plots with ``plt.bar``. The width of the bars can be changed by the ``width=value``-keyword argument, which with time date takes a value in the unit of days.
 
 <img src="D2/bulken_precip.png">

In [None]:
%%bash
cat D2/rr24_Bulken.txt

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from datetime import datetime
from matplotlib.dates import num2date, date2num

def str2date_rr24(s):
    date = datetime.strptime(s, "%d.%m.%Y")
    return date2num(date)

def missing_to_NaN(istr):
    ''' Convert a string containing a number to a float, interpreting unparsable strings as NaN '''
    try:
        val = float(istr)
    except ValueError:
        val = float('NaN')
    
    return val

data = np.loadtxt("D2/rr24_Bulken.txt", encoding='latin1', usecols=(1, 2, 3), skiprows=21, converters={
    1: str2date_rr24,
    2: missing_to_NaN,
    3: missing_to_NaN,
})

dates = num2date(data[:, 0])
precip = data[:, 1]

fig = plt.figure(figsize=(9.6, 3.2), dpi=96)
plt.bar(dates, precip, width=0.8)

plt.ylabel("Daily precipitation [mm]")
plt.grid()
plt.ylim(0, 70)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%d %b"))