<a href="https://colab.research.google.com/github/realnus/scikit_learn/blob/main/FF_Forecasting_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic Traffic Forecasting Tutorial
In this tutorial we use the Flow Forecast library to preform some basic traffic flow forecasting. In other notebooks we will go over how to use saved models and more complex parameter configurations.

 Flow Forecast is a general purpose deep learning for times series forecasting package written in PyTorch.

In [None]:
!git clone http://github.com/AIStream-Peelout/flow-forecast #-b remove_versions # You can use a custom branch
import os
os.chdir('flow-forecast')
!pip install -r requirements.txt
!python setup.py install develop
from flood_forecast.trainer import train_function

Cloning into 'flow-forecast'...
remote: Enumerating objects: 18558, done.[K
remote: Counting objects: 100% (971/971), done.[K
remote: Compressing objects: 100% (353/353), done.[K
remote: Total 18558 (delta 628), reused 826 (delta 563), pack-reused 17587[K
Receiving objects: 100% (18558/18558), 7.10 MiB | 17.35 MiB/s, done.
Resolving deltas: 100% (13333/13333), done.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting shap==0.40.0
  Downloading shap-0.40.0-cp38-cp38-manylinux2010_x86_64.whl (571 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m571.1/571.1 KB[0m [31m15.5 MB/s[0m eta [36m0:00:00[0m
Collecting tb-nightly
  Downloading tb_nightly-2.12.0a20230105-py3-none-any.whl (5.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.7/5.7 MB[0m [31m97.5 MB/s[0m eta [36m0:00:00[0m
Collecting wandb==0.13.7
  Downloading wandb-0.13.7-py2.py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━

## Step One: Install and authenticate
In this first step we need to install the library and authenticate with Weights and Biases. Additionally, our code features built in GCP integration.

In [None]:
#!pip install --upgrade --force-reinstall wandb
!wandb login
# If you want to have your weights and JSON files stashed automatically uncomment
# os.environ["MODEL_BUCKET"] = "my-gcp-bucket-name"
# os.environ["ENVIRONMENT_GCP"] = "Colab"
# os.environ["GCP_PROJECT"] = "project_id"


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [None]:
!wget -O train.csv https://raw.githubusercontent.com/xiaochus/TrafficFlowPrediction/master/data/train.csv

--2023-01-05 20:33:11--  https://raw.githubusercontent.com/xiaochus/TrafficFlowPrediction/master/data/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 199681 (195K) [text/plain]
Saving to: ‘train.csv’


2023-01-05 20:33:11 (17.8 MB/s) - ‘train.csv’ saved [199681/199681]



In [None]:
# We will preform very basic data techniques to just get the weekday.
import pandas as pd
import datetime as datetime
df = pd.read_csv("train.csv")
df["day_of_week"] = df["5 Minutes"].map(lambda x: datetime.datetime.strptime(x, '%d/%m/%Y %H:%M').weekday())
df["datetime"] = df['5 Minutes']

In [None]:
df.to_csv('train.csv')
# Truly bizzare error? WTF?

## Step Two Define the Configuration File
Now that we have everything installed and our data properly working. We need to define a configuration file. The configuration files is composed of three major required sub-parts: model_params, dataset_params, inference_params. The other major part that is required is the name of the model and the model type.

Flow Forecast uses configuration files because they enable reproducible results.With the JSON file you can easily see all the parameters that you specify to your model and the configuration is logged to W&B and/or saved locally. This is a purposeful design choice as many other libraries it becomes difficult to manage parameters which results in un-reproducible results.

In [None]:
def make_config_file(file_path, train_end, valid_end):
  run = wandb.init(project="library_demos")
  wandb_config = wandb.config
  config_default={
    "model_name": "MultiAttnHeadSimple",
    "model_type": "PyTorch",
    "model_params": {
      "number_time_series":2,
      "seq_len":wandb_config["forecast_history"],
      "output_seq_len":wandb_config["out_seq_length"],
      "forecast_length":wandb_config["out_seq_length"]
     },
    "dataset_params":
    {  "class": "default",
       "training_path": file_path,
       "validation_path": file_path,
       "test_path": file_path,
       "batch_size":wandb_config["batch_size"],
       "forecast_history":wandb_config["forecast_history"],
       "forecast_length":wandb_config["out_seq_length"],
       "train_end": train_end,
       "valid_start":int(train_end+1),
       "valid_end": int(valid_end),
       "test_start":int(valid_end) + 1,
       "target_col": ["Lane 1 Flow (Veh/5 Minutes)"],
       "relevant_cols": ["Lane 1 Flow (Veh/5 Minutes)", "day_of_week"],
       "scaler": "StandardScaler",
       "interpolate": False
    },
    "training_params":
    {
       "criterion":"MSE",
       "optimizer": "Adam",
       "optim_params":
       {

       },
       "lr": wandb_config["lr"],
       "epochs": 10,
       "batch_size":wandb_config["batch_size"]
    },
    "GCS": False,
    "sweep":True,
    "wandb":False,
    "forward_params":{},
   "metrics":["MSE"],
   "inference_params":
   {
         "datetime_start":"2016-02-24",
          "hours_to_forecast":150,
          "test_csv_path":file_path,
          "decoder_params":{
              "decoder_function": "simple_decode",
            "unsqueeze_dim": 1
          },
          "dataset_params":{
             "file_path": file_path,
             "forecast_history":wandb_config["forecast_history"],
             "forecast_length":wandb_config["out_seq_length"],
             "relevant_cols": ["Lane 1 Flow (Veh/5 Minutes)", "day_of_week"],
             "target_col": ["Lane 1 Flow (Veh/5 Minutes)"],
             "scaling": "StandardScaler",
             "interpolate_param": False
          }
      }
  }
  wandb.config.update(config_default)
  return config_default

So I'll briefly explain what is going on in this config file.  

## Step Three Define Wandb Sweep Config
Now that we have our global configuration file we define a second configuration of values we want to sweep over. You can find out more about Weights and Biases sweeps from their website. In this file we include all the parameters we want to sweep over.

In [None]:
"""
sweep_config = {
  "name": "Default sweep",
  "method": "random",
  "parameters": {
        "batch_size": {
            "values": [2, 3, 4]
        },
        "lr":{
            "values":[0.001, 0.01]
        },
        "forecast_history":{
            "values":[1, 2, 3, 5]
        },
        "out_seq_length":{
            "values":[1, 2, 3, 4]
        }
    }
}
"""

sweep_config = {
  "name": "Default sweep",
  "method": "random",
  "parameters": {
        "batch_size": {
            "values": [3]
        },
        "lr":{
            "values":[0.01]
        },
        "forecast_history":{
            "values":[5]
        },
        "out_seq_length":{
            "values":[4]
        }
    }
}

## Step Four: Run code and log results
Now that we have both config files it is time to train our model and
log the results to Weights and Biases to analyze later.

In [None]:
from flood_forecast.trainer import train_function
import wandb
sweep_id = wandb.sweep(sweep_config)
os.environ["SWEEP_ID"] = sweep_id
#!wandb agent $SWEEP_ID
os.environ['WANDB_NOTEBOOK_NAME'] = 'FF_Forecasting-Tutorial.ipynb'
wandb.agent(sweep_id, lambda: train_function("PyTorch", make_config_file("train.csv", 4500, 6000)) )
#_secretagent(sweep_id, lambda: train_function("PyTorch", make_config_file("train.csv", 4500, 6000)))

Create sweep with ID: 6ikspohz
Sweep URL: https://wandb.ai/nusretarazstudent/uncategorized/sweeps/6ikspohz


[34m[1mwandb[0m: Agent Starting Run: ivnamm9k with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	lr: 0.01
[34m[1mwandb[0m: 	out_seq_length: 4


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01667023425000025, max=1.0)…

Run ivnamm9k errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
[34m[1mwandb[0m: [32m[41mERROR[0m Run ivnamm9k errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
[34m[1mwandb[0m: Agent Starting Run: 6avli185 with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	lr: 0.01
[34m[1mwandb[0m: 	out_seq_length: 4


VBox(children=(Label(value='0.010 MB of 0.010 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

Run 6avli185 errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
[34m[1mwandb[0m: [32m[41mERROR[0m Run 6avli185 errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
[34m[1mwandb[0m: Agent Starting Run: rpmdtxhv with config:
[34m[1mwandb[0m: 	batch_size: 3
[34m[1mwandb[0m: 	forecast_history: 5
[34m[1mwandb[0m: 	lr: 0.01
[34m[1mwandb[0m: 	out_seq_length: 4


Run rpmdtxhv errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
[34m[1mwandb[0m: [32m[41mERROR[0m Run rpmdtxhv errored: TypeError("__init__() got an unexpected keyword argument 'forecast_length'")
Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: [32m[41mERROR[0m Detected 3 failed runs in the first 60 seconds, killing sweep.
[34m[1mwandb[0m: To disable this check set WANDB_AGENT_DISABLE_FLAPPING=true
