# Oklahoma Mesonet Data

This notebook is intended to process Mesonet weather data from 1996-1997, colocated in time and space with Carlson field study. 

- Carlson Data
    - Sent by Derek van der Kamp (DVDK)
    - Includes air temp for all times, where did he get this for old times??
- Oklahoma Mesonet data
    - Slapout station
    - Half-hour sensor data from 1996-1997
    - Weather averaged over the hour except rain, which is accumulated over the period
    - No air temp data, so using from DVDK data, linear interp to half hour resolution

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from src.utils import read_yml, str2time, time_intp

In [None]:
dweather = pd.read_excel("data/processed_data/dvdk_weather.xlsx")
ok_weather = pd.read_csv("data/OK_Mesonet/Slapout_96-97_weather.csv")
ok_rain = pd.read_csv("data/OK_Mesonet/Slapout_96-97_rain.csv")
nlist = read_yml("etc/nlists/ok_mesonet.yaml")

## Join 

Join rain and weather

**Data Processing**
- Standardize names, lookup tables in etc
- convert -999, -99 to NA
- Units to metric:
    - temp def F to K
    - wind, mpg to m/s
    - rain, in to mm
- Date: CDT to UTC

In [None]:
print(f"Number of weather observations: {ok_weather.shape[0]}")
print(f"Number of rain observations: {ok_rain.shape[0]}")
print()
print(f"Times match: {np.all(ok_weather.TIME == ok_rain.TIME)}")

In [None]:
df = ok_weather.merge(ok_rain, how="left", on=["STID", "TIME"])
df.replace(-999, np.nan, inplace=True)
df.replace(-99, np.nan, inplace=True)

# Rename
df = df.rename(columns=nlist)

# Units
df.temp = (df.temp - 32) * 5/9 + 273.15
df.tmax = (df.tmax - 32) * 5/9 + 273.15
df.tmin = (df.tmin - 32) * 5/9 + 273.15
df.wind = df.wind * 0.44704
df.gust = df.gust * 0.44704
df.rain = df.rain * 25.4

## Handle Times

Steps:
- Confirm time zones
    - All times from OK are CDT
    - Investigation to confirm that data from DVDK switches CST/CDT

NOTE: CDT is UTC-5

In [None]:
df["date"] = pd.to_datetime(df["date"])
print(f"Unique Time Lags 30min: {df.date.diff().unique()}")

In [None]:
# Check time series of solar radiation to see if they match
# Looks like 1hr time diff in this period
t0 = dweather.date.min()
print(f"Start time from Carlson data: {t0}")
t1 = t0 + pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.solar, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.solar, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Solar Radiation")

Starting time of observations is 1996-03-26 15:00:00, which in Oklahoma corresponds to CST. That explains the time mismatch in this period. Below we plot the solar radiation for a period that is CDT to check it is close.

In [None]:
t0 = pd.Timestamp('1996-06-15 12:00:00')
t1 = t0 + pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.solar, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.solar, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Solar Radiation")

Still off by 1hr, looks like data from DVDK was all CST. Checking a period in 1997 to be sure.

In [None]:
t0 = pd.Timestamp('1997-01-15 12:00:00')
t1 = t0 + pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.solar, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.solar, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Solar Radiation")

### Set timezones, adjust to UTC

Setting,
- OK Mesonet: CDT
- Carlson data from DVDK: CST

line up and convert to UTC

In [None]:
# OK CDT
df['date'] = df['date'].dt.tz_localize('Etc/GMT+5')
df['date'] = df['date'].dt.tz_convert('UTC')

In [None]:
# DVDK CST
dweather['date'] = dweather['date'].dt.tz_localize('Etc/GMT+6')
dweather['date'] = dweather['date'].dt.tz_convert('UTC')

In [None]:
# Check time series of solar radiation again
t0 = dweather.date.min()
print(f"Start time from Carlson data: {t0}")
t1 = t0 + pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.solar, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.solar, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Solar Radiation")

Closer than before, still a mismatch. Due to data sources?

In [None]:
t1 = dweather.date.max()
print(f"Start time from Carlson data: {t0}")
t0 = t1 - pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.wind, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.wind, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Wind")

In [None]:
t1 = dweather.date.max()
print(f"Start time from Carlson data: {t0}")
t0 = t1 - pd.Timedelta(hours=96)
x1 = df[(df.date >= t0) & (df.date <= t1)]
x2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

plt.plot(x1.date, x1.temp, label="Mesonet", linestyle='-', marker='o')
plt.plot(x2.date, x2.temp, label="Carlson", linestyle='-', marker='o')
plt.xticks(rotation=90);
plt.legend()
plt.title("Temp")

### Temp from 1996 

Air temp not in older Mesonet data, using from DVDK. NOTE: still need to reconcile data mismatches.

Interp temp data to fill in OK Mesonet gaps.

In [None]:
# All missing temps from OK Meso
df2 = df[df.temp.isna()].copy()

# Time of earliest temp record from DVDK data
t0 = dweather.date.min()
# Time of last missing OK Mesonet
t1 = df2.date.max()

# Filter to dates
df2 = df2[(df2.date >= t0) & (df2.date <= t1)]
dw2 = dweather[(dweather.date >= t0) & (dweather.date <= t1)]

In [None]:
print(f"Missing Temps in DVDK Data: {dw2.temp.isna().sum()}")

In [None]:
# Linear interp Interp predictions to exact times of FMC
temp2 = time_intp(
    t1 = dw2.date.to_numpy(),
    v1 = dw2.temp,
    t2 = df2.date.to_numpy()
)

In [None]:
df2["temp"] = temp2

In [None]:
# Copy into main df
df.set_index('date', inplace=True)
df.update(df2.set_index('date')[['temp']])  # aligns on index, touches only 'temp'
df.reset_index(inplace=True)

## Calculate Other Features and Save

In [None]:
df["hod"] = df.date.dt.hour
df["doy"] = df.date.dt.dayofyear

# Geographic features from Slapout
df["elev"] = 774
df["lon"] = -100.261920
df["lat"] = 36.597490

# Equilibria
rh = df.rh
temp = df.temp
Ed = 0.924 * rh**0.679 + 0.000499 * np.exp(0.1 * rh) + 0.18 * (21.1 + 273.15 - temp) * (1 - np.exp(-0.115 * rh))
Ew = 0.618 * rh**0.753 + 0.000454 * np.exp(0.1 * rh) + 0.18 * (21.1 + 273.15 - temp) * (1 - np.exp(-0.115 * rh))

df["Ed"] = Ed
df["Ew"] = Ew

In [None]:
df['date'] = df['date'].dt.tz_localize(None)
df.to_excel("data/processed_data/mesonet.xlsx")