## Examining Solar Energy Data with Transformer AutoEncoder
In this notebook we will [Flow-Forecast, a deep learning for time series forecasting and classification framework built in PyTorch](https://github.com/AIStream-Peelout/flow-forecast) to create embeddings of solar energy time series data. We will then use these embeddings to cluster different part of the solar energy data. 

In [None]:
import os
!git clone https://github.com/AIStream-Peelout/flow-forecast.git
os.chdir('flow-forecast')
!pip install -r  requirements.txt
!python setup.py develop

In [None]:
the_config = {                 
    "model_name": "CustomTransformerDecoder",
    "model_type": "PyTorch",
    "model_params": {
      "n_time_series":4,
      "d_model":32,
      "seq_length":5,
      "output_seq_length": 5, 
      "n_layers_encoder": 6,
      "output_dim":4,
      "squashed_embedding": True
     }, 
    "n_targets":4,
    "dataset_params":
    {  "class": "AutoEncoder",
       "training_path": "example.csv",
       "validation_path": "example.csv",
       "test_path": "example.csv",
       "forecast_history":5,
       "forecast_length":5,
       "train_end": 100,
       "valid_start":101,
       "valid_end": 201,
       "test_start": 202,
       "test_end": 290,
       "target_col": ["DC_POWER", "AMBIENT_TEMPERATURE", "MODULE_TEMPERATURE", "IRRADIATION"],
       "relevant_cols": ["DC_POWER", "AMBIENT_TEMPERATURE", "MODULE_TEMPERATURE", "IRRADIATION"],
       "no_scale": True,
       "scaler": "StandardScaler",
       "interpolate": False
    },
    "training_params":
    {
       "criterion":"MAPE",
       "optimizer": "Adam",
       "optim_params":
       {

       },
       "lr": 0.3,
       "epochs": 1,
       "batch_size":4
    
    },
    "GCS": False,
    
    "wandb": {
       "name": "flood_forecast_circleci",
       "project": "repo-flood_forecast",
       "tags": ["dummy_run", "circleci"]
    },
   "metrics":["MSE"],
   "inference_params":
   {     
         "datetime_start":"2016-05-31",
          "hours_to_forecast":5, 
          "test_csv_path":"tests/test_data/keag_small.csv",
          "decoder_params":{
            "decoder_function": "simple_decode", 
            "unsqueeze_dim": 1},
         
   }
       
   
}

In [None]:
import pandas as pd
solar_data = pd.read_csv("../../input/solar-power-generation-data/Plant_1_Generation_Data.csv")
weather_data = pd.read_csv("../../input/solar-power-generation-data/Plant_1_Weather_Sensor_Data.csv")
weather_data["DATE_TIME"] = pd.to_datetime(weather_data["DATE_TIME"])
solar_data["DATE_TIME"] = pd.to_datetime(solar_data["DATE_TIME"])
mrged_df = solar_data.merge(weather_data, left_on="DATE_TIME", right_on="DATE_TIME", how="left")



In [None]:
solar_data.merge(weather_data, left_on="DATE_TIME", right_on="DATE_TIME", how="left").to_csv("merged_file.csv")

dropped = mrged_df[mrged_df["SOURCE_KEY_x"]=="1BY6WEcLGh8j5v7"].dropna()
dropped["datetime"] = dropped["DATE_TIME"]
dropped.to_csv("example.csv")



In [None]:
from flood_forecast.meta_train import train_function 
import wandb
from kaggle_secrets import UserSecretsClient
import os
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("WANDB_KEY")
os.environ["WANDB_API_KEY"] = secret_value_0
model = train_function("PyTorch", the_config)

## Examining Solar Data Embeddings

Now that we have an auto-encoder created to make the relevant temporal embeddings we will now examine them and see how pre-liminary effective they are for predicting future power usage. To do this we will use the inference mode in Flow Forecast to create useful representations. First let's go back and look at some of the greater periods of energy generation.

In [None]:
dropped = dropped.reset_index()
dropped.sort_values("DC_POWER", False)

Based on this w

In [None]:
def make_embedding(model, df, row_idx):
    """
    Function to generate embeddings for a trained temporal
    """
    relevant_cols = model.params["dataset_params"]["relevant_cols"]
    f_history = model.params["dataset_params"]["forecast_history"]
    print(df.iloc[row_idx:f_history][relevant_cols])
    n_vals = model.training.scale.transform(df.iloc[row_idx:f_history + row_idx][relevant_cols])
    embed = model.model.make_embedding(torch.from_numpy(n_vals.unsqueeze(0)))
    return embed

In [None]:
from flood_forecast.transformer_xl.transformer_basic import *
make_embedding(model, dropped, 653)

## Using embeddings in basic model