# Testing for ABM timeseries scaling.

This notebook takes the raw EU heating/cooling demand profiles from
[Mopo AmBIENCe2ABM Demo](https://zenodo.org/records/10518294) `ideal_demands_XXXX.csv`s
and `process_cops_XXXX.csv`s and experiments on them to see how they should be handled.

The outline of this notebook is as follows:
1. Julia environment setup to install the necessary dependencies.
2. 1st approach at normalising the demand data, picking one year for normalisation doesn't work.
3. 2nd approach at normalising the demand data using Eurostat HDD and CDD data, insufficient HDD data for the full scope.

Should the COP timeseries be scaled?


## Julia environment setup

In [None]:
using Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

using CSV
using DataFrames
using Dates
using Statistics

## Read, inspect, and reorganize the raw data. (ATTEMPT 1)

First, we'll need to define paths to the files we want to read.
If the setup is done according to the instructions in `README.md`,
then the following paths should work:

In [None]:
## Config for reading and normalising data.

norm_year = 2012 # Year using which the results are normalised.
years = [1995, 2008, 2009, 2012, 2015]
dem_paths = [
    year => "input-data/abm-raw-data/ideal_demands_$(year).csv"
    for year in years
]
cop_paths = [
    year => "input-data/abm-raw-data/process_cops_$(year).csv"
    for year in years
]
hm_path = "input-data/scen_current_building_demand/data/scen_current_building_demand.csv";

In [None]:
## Read and organise demands data.

cols = [:timestamp, :year, :country, :category, :demand, :value]
dem_data = DataFrame()
for (year, path) in dem_paths
    df = stack(DataFrame(CSV.File(path; header=[1,2])))
    s = split.(df[!, :variable], '_')
    df[!,:country] = string.(getindex.(s,1))
    df[!,:category] = string.(getindex.(s,2))
    df[!,:demand] = string.(getindex.(s,6))
    df[!,:year] .= year
    rename!(
        df,
        :building_archetype_building_process => :timestamp,
    )
    append!(dem_data, df[!, cols])
end
dem_data

In [None]:
## Calculate sums over timestamp for the normalisation year.

dem_sums = combine(
    groupby(
        filter(r -> r.year == norm_year, dem_data),
        cols[2:end-1] # Skip timestamp and value.
    ),
    :value => sum
)

In [None]:
## Normalise dem_data using the normalisation year.

dem_normalised = deepcopy(dem_data)
for (i, df) in enumerate(groupby(dem_normalised, cols[3:end-1])) # skip timestamp, year, and value
    df.value ./= dem_sums.value_sum[i]
end
dem_normalised

In [None]:
## Inspect the results of the normalisation

dem_normalised_sums = combine(
    groupby(dem_normalised, cols[2:end-1]), # skip timestamp and value
    :value => sum
)
dem_normalised_sums = unstack(dem_normalised_sums, :demand, :value_sum)
describe(dem_normalised_sums)

**Well that's bad.**

DHW and heating behave more or less as I'd have expected,
but cooling demand seems to vary quite a bit depending on the year.
This seems to be the case in reality as well, as based on Eurostat HDD
and CDD calculations, CDDs can vary between ~0-33x of the mean.
Essentially, it becomes extremely important that our normalisation coefficients
match the underlying scenario data as well as possible.


### Solution?

Perhaps we need to calculate the yearly normalisation coefficients based on historical heating and cooling degree day averages, as seems to have been done in Hotmaps?

>the HDD and CDD on the NUTS3 level are calculated based on the average HDD (18.5/18.5) and CDD (22.5/22.5) calculated from the observed daily temperatures on a 25 x 25 km grid for the period 2002-2012 (see (Haylock, M.R. et al., 2011)).

This should yield us ratios we can use to scale the profiles based on the climate year?


## Read, inspect, and reorganize the raw data. (ATTEMPT 2)

So it seems we need to be more clever with normalising our demand time series if we want them to vary reasonably with yearly weather.
We'll need to use historical HDD and CDD from Eurostat for the normalisation instead it would seem.


In [None]:
## New settings for normalising based on HDD.

norm_years = 2002:2012 # Hotmaps D5.2 year range for HDD and CDD calculations.
hdd_path = "input-data/eurostat/estat_nrg_chdd_a.tsv"; # Heating and cooling degree days for scaling weather years

In [None]:
## Read and reformat HDD and CDD data

hdd_data = DataFrame(CSV.File(hdd_path))
s = split.(hdd_data[!,1], ',')
hdd_data[!, :country] = string.(getindex.(s, 4))
hdd_data[!, :variable] = string.(getindex.(s, 3))
cols = [:country, :variable, :year, :value]
hdd_data = stack(hdd_data; variable_name=:year)[!, cols]
hdd_data[!, :year] = parse.(Int64, hdd_data[!, :year])
filter!(r -> r.year in norm_years, hdd_data)


In [None]:
## Calculate 2002-2012 averages per country

hdd_means = combine(
    groupby(
        hdd_data,
        cols[1:2] # Group by country and variable
    ),
    :value => mean
)

In [None]:
## Calculate HDD and CDD scaling factors per country per year.

hdd_scaling = deepcopy(hdd_data)
for (i, df) in enumerate(groupby(hdd_scaling, cols[1:2]))
    if hdd_means.value_mean[i] ≈ 0 # This is required to avoid issues with Irish CDDs.
        df.value .= 1.0
    else
        df.value ./= hdd_means.value_mean[i]
    end
end
hdd_scaling

In [None]:
## Inspect the scaling factors

describe(
    unstack(
        hdd_scaling,
        :variable,
        :value
    )
)

**This is more manageable.**

Heating demand seems to vary between ~ -35% and +37% percent from the 2002-2012 averages,
while cooling demand still varies considerably between -100% and +7500%
Ireland still causes some problems, as it doesn't have ANY cooling degree days during this period.

Let's look at things on country-level next.

In [None]:
## Inspect country HDD scaling ranges

describe(
    filter(
        r -> r.variable == "HDD",
        unstack(
            hdd_scaling,
            :country,
            :value
        )
    )
)

In [None]:
## Inspect country CDD scaling ranges

describe(
    filter(
        r -> r.variable == "CDD",
        unstack(
            hdd_scaling,
            :country,
            :value
        )
    )
)

### Conclusion

In order to preserve weather variability and Hotmaps compatibility,
the heating and cooling demands need to be scaled based on something like average HDDs and CDDs
instead of any particular weather year.

Unfortunately, the HDD and CDD data by Eurostat doesn't cover all the desired countries,
and thus cannot be used for scaling the demands for Mopo WP5.
Instead, I guess the second-best thing to do is just to normalise the demand time series using their overall mean values across the available data...

In [None]:
setdiff(Set(dem_data.country), Set(hdd_data.country))

## Brief inspection of COP data.

Let's also quickly check the process COP data.

In [None]:
## Read and organize COP data

cols = [:timestamp, :year, :country, :category, :process, :value]
cop_data = DataFrame()
for (year, path) in cop_paths
    df = stack(DataFrame(CSV.File(path; header=[1,2])))
    s = split.(df[!, :variable], '_')
    df[!, :country] = string.(getindex.(s, 1))
    df[!, :category] = string.(getindex.(s, 2))
    df[!, :process] = string.(getindex.(s, 5)) .* '_' .* string.(get.(s, 6, "air"))
    df[!, :year] .= year
    rename!(
        df,
        :building_archetype_building_process => :timestamp
    )
    append!(cop_data, df[!, cols])
end
cop_data

In [None]:
## Inspect process data

describe(
    unstack(cop_data, :process, :value)
)

**Honestly, these ranges seem surprisingly reasonable.**