# Building Samples from Irregularly Sampled Timeseries

This notebook proposes 3 ways to build samples for sequence learning tasks given an observed timeseries with irregular sampling intervals. The FMC data from the Carlson field study was sampled 2x daily, and we want to be able to train models that predict FMC hourly or even subhourly. 
1. Fixed-Length: fixed sliding window of weather data, ignore 
2. Temporal downscaling with physical model. Translate low temporal resolution observations to high resolution using physics-informed interpolation

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from src.utils import time_intp

In [None]:
weather = pd.read_excel("data/processed_data/mesonet.xlsx")
fm = pd.read_excel("data/processed_data/ok_100h.xlsx")

## Join

Line up fm data to nearest half hour

In [None]:
fm['date'] = fm['date'].dt.tz_localize('Etc/GMT+6')
fm['date'] = fm['date'].dt.tz_convert('UTC')

weather['date'] = weather['date'].dt.tz_localize('UTC')

In [None]:
## ROUNDING to half hour for now...
fm["date"] = fm["date"].dt.round("30min")

print(f"All fm dates in weather dates: {fm["date"].isin(weather["date"]).mean()}")

## Build samples

Given a set of sparsely observed FM values, get weather data at that time and a lookup period back in time.

In [None]:
def build_samples_fixed(X, y, response_col="fm10", lookback=24, features_list=["Ed", "Ew", "rain"]):
    """
    Inputs:
        - X: dataframe with date column
        - y: dataframe with date column
        - lookback: int, number of hours to get weather data back from target time
    Note:
        TODO: If incomplete samples based on lookback, fill with NA
    """
    end = y.date
    start = end - pd.Timedelta(hours = lookback)

    Xs=[]
    for t0, t1 in zip(start, end):
        Xi = X[(X.date>t0) & (X.date<=t1)][features_list].to_numpy()
        Xs.append(Xi)
        
    XX = np.stack(Xs)
    yy = y[response_col].to_numpy().reshape(-1, 1, 1)
    times = y.date.to_numpy()
    return XX, yy, times

In [None]:
XX, yy, times = build_samples_fixed(weather, fm, response_col="fm100")

In [None]:
print(f"{XX.shape=}")
print(f"{yy.shape=}")
print(f"{times.shape=}")