# FarmVibes.AI Micro Climate Prediction

This notebook demonstrates how to train a model to forecast weather. Current Notebook provided configuration to train model & inference for Temperature and WindSpeed.


### Conda environment setup
Before running this notebook, let's build a conda environment. If you do not have conda installed, please follow the instructions from [Conda User Guide](https://docs.conda.io/projects/conda/en/latest/user-guide/index.html). 

```
$ conda env create -f ./deepmc_env.yaml
$ conda activate deepmc
```

### Notebook outline
Current script in Notebook configured to train model and inference weather parameters such as Temperature and WindSpeed. To execute notebook users must provide input data downloaded from a weather station. Required weather features to execute the notebook are datetime, humidity, windspeed and temperature. For model training, minimum 2 years of input historical data is   required, for inference 552 data points of historical data is required. 


Below are the main libraries used for this example and other useful links:
- [Tensorflow](https://github.com/tensorflow/tensorflow) is used as our deep learning framework.
- [Scikit-Learn](https://github.com/scikit-learn/scikit-learn) is a Python package for machine learning built on top of SciPy. It Simple and efficient tools for predictive data analysis.
- [pandas](https://github.com/scikit-learn/scikit-learn) is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.
- [NumPy](https://github.com/numpy/numpy) is a python package that provides powerful N-dimensional array object, broadcasting functions and useful linear algebra, Fourier transform, and random number capabilities.
- [pywt](https://github.com/PyWavelets/pywt) is a python package that provides mathematical basis functions that are localized in both time and frequency.


### Code organization
The training script rely on package in directory ./notebook_lib:

- The code module (`notebook_lib/preprocess.py`) used to preprocess, transform the input data and bundle the input data into chunks that can be used for training Micro Climate Prediction model. For more information on wavelets (check [pywavelets documentation](https://pywavelets.readthedocs.io/en/latest/)).
- The code module (`notebook_lib/models.py`, `notebook_lib/transformer_models_ts.py`, `notebook_lib/post_models.py`) contains the code that design the model with a set of tensorflow layers executed Sequentially. If you want to change the design of the model, understand the tensorflow layers, this is probably where you should go to. Also check [tensorflow documentation](https://www.tensorflow.org/learn).
- The code module (`notebook_lib/train.py` and `notebook_lib/predict.py`) contains the code for running/training/evaluating the neural network: instantiating the neural network, training steps, computing metrics, and others. If you want to understand the mode performance, the loss, and generally how the model is trained, this is probably where you should go to. Also check [tensorflow documentation](https://www.tensorflow.org/learn).
- The code module (`notebook_lib/utils.py`) contain commonly used functions to read csv files, scale and split data, etc.

### Imports & Constants

In [32]:
import importlib
import pandas as pd
import os
from datetime import datetime
from matplotlib import pyplot as plt
from datetime import timedelta
import pickle

from notebook_lib import utils
from notebook_lib import prediction
from notebook_lib import preprocess
from notebook_lib import train

from enum import Enum

import warnings
warnings.filterwarnings("ignore")

In [33]:
# weather dataset filtered and model training limited to train features.
MODEL_TRAIN_FEATURES = ['humidity', 'wind_speed', 'temperature']

# Models trained to predict out features
MODEL_OUT_FEATURES = ['wind_speed', 'temperature']

# Historical data aligned using INDEX variable
INDEX = "date"

### AGWeatherNet
In this notebook, we utilize data downloaded from AGWeatherNet for a station \"Palouse\". The data used for training range from May 2020 to June 2022. For more information check [AGWeatherNet documentation](http://weather.wsu.edu/?p=92850&desktop).

In [34]:
# AGWeatherNet station
STATION_NAME = "Palouse"

### Data
The data downloaded from AGWeatherNet having the 15 minutes frequency. On data downloaded below preprocessing steps performed.

1. The index variable converted to datetime
2. The input data is interpolated to fill the missing values using the neighbors
3. The script in Notebook focused on training the model with 60 minutes frequency, hence the data grouped to convert it to 60 minutes frequency.
4. The data is scaled using the scikit-learn StandardScalar. For more information check [scikit-learn documentaion](https://github.com/scikit-learn/scikit-learn)

In [35]:
# Get csv data for a station
file_path = f"./data/{STATION_NAME}/training.csv"
predict="%s"
root_path = f"./data/model_{predict}/"
data_export_path = root_path + "train_data.pkl"

input_df = utils.get_csv_data(path=file_path)

### Training

The script in notebook configured to train Micro Climate prediction model for 24 hours and actual weather station data points with 60 minutes frequency. Below inputs vary based on number of hours of prediction and frequency of weather station data points.

1. `chunk_size` - The value of the chunk size based on frequency of weather station data points. For 60 minutes actual weather data frequency the minimum required data points are 528. If the data frequency is 15 minutes, the minimum number of data points required is 528*4 = 2112. These are minimum number of data points need to be provided as input during the inference.
2. `ts_lookahead` - The value used during the data preprocessing. It's the value used to consider weather data points ahead for a given time period while grouping the data.
3. `ts_lookback` - The value used during the data preprocessing. It's the value used to consider weather data points back for a given time period while grouping the data.
4. `total_models` - To perform a 24 hour prediction with a weather data point having a frequency of 60 minutes, requires 24 models. One model for each 60 minutes. If number of hours of prediction to be increased then total number of data points are increased. 
5. `wavelet` - Wavelet object name used to perform discrete transformation of data. The current notebook configured to use `bior3.5`. For more information check [Discrete Wavelet Transform documentation](https://pywavelets.readthedocs.io/en/latest/ref/dwt-discrete-wavelet-transform.html)

### Model Types
The training process create two different types of models 
1. `Micro climate prediction model` - Used to predict the weather forecast. 
2. `Micro climate post-prediction model`- Scale the predicted weather forecast values using the training input data and reduce the error in prediction output.

In [None]:
train_weather = train.ModelTrainWeather(
    train_features=MODEL_TRAIN_FEATURES,
    out_features=MODEL_OUT_FEATURES,
    root_path=root_path,
    data_export_path=data_export_path,
    station_name=STATION_NAME)

train_weather.train_model(input_df)

### Predicting Weather forecast
The script in notebook configured to inference Micro Climate prediction model for 24 hours and actual weather station data points with 60 minutes frequency.

In [None]:
file_path = f"./data/{STATION_NAME}/prediction.csv"

input_df = utils.get_csv_data(path=file_path)

base_data_df = input_df[MODEL_TRAIN_FEATURES]

In [None]:
forecast_start_datetime = base_data_df.index[-1]

df_output_merge = pd.DataFrame(columns=base_data_df.columns)

weather_forecast = prediction.InferenceWeather(
                        root_path=root_path,
                        data_export_path=data_export_path,
                        station_name=STATION_NAME,
                        predicts=MODEL_OUT_FEATURES)

df_out = weather_forecast.inference(base_data_df,
            start_datetime=forecast_start_datetime
            )

In [None]:
for predict in MODEL_OUT_FEATURES:
    # without using the scalar
    plt.figure(figsize=(20, 5))
    plt.plot(df_out["date"].values, df_out[predict].values)
    plt.title("24 Models Temperature Ground Truth Vs Predict")
    plt.legend(["Predict", "Ground Truth"])