# data_testscript.ipynb

A Jupyter notebook for testing and development of the AmBIENCe2ABM module.
First, let's simply import the module.

In [None]:
import ambience2abm as amb

## Read the raw data and assumptions

The raw data and assumptions are handled using the `AmBIENCeDataset` object,
with the constructor requiring values for the assumed `interior_node_depth` and `period_of_variations` *(explained in the docstring)*.

The following cells demonstrate `AmBIENCeDataset` functionality.

In [None]:
### Read the raw data and assumptions.

ambience = amb.AmBIENCeDataset(
    interior_node_depth=0.1,
    period_of_variations=1209600,
)
ambience.data

In [None]:
### Create unique building periods

building_periods = ambience.building_periods()
building_periods

In [None]:
### Check the building type to stock mappings.

ambience.building_type_mappings

In [None]:
### Check the building stock statistics

ambience.calculate_building_stock_statistics()

In [None]:
### Check structure type assumptions

ambience.structure_types

In [None]:
### Calculate the structure statistics

ambience.calculate_structure_statistics()

In [None]:
### Check fenestration assumptions.

ambience.fenestration

In [None]:
### Check ventilation assumptions.

ambience.ventilation

In [None]:
### Calculate ventilation and fenestration statistics

ambience.calculate_ventilation_and_fenestration_statistics()

## Check building envelope dimension data

Since the database provides us with data about the assumed dimensions of the
building envelope, we can check it against the assumptions detailed in the AmBIENCe
D4.1 deliverable. Mostly I'm worried about the ground floor, roof, and number of storeys.

In [None]:
### Check if ground floor and ceiling areas match

cols = [
    "REFERENCE BUILDING GROUND FLOOR AREA (m2)",
    "REFERENCE BUILDING ROOF AREA (m2)"
]
inds = abs(ambience.data[cols[0]] - ambience.data[cols[1]]) > 1
df1 = ambience.data.loc[inds,cols]
df1

So roughly 10% of the reference buildings don't seem to make perfect sense.
According to the deliverable:

>The roof is considered to be flat.

>The building is assumed to be a cuboid.

Which for these buildings mean that the walls can't be perpendicular to the ground.
Regardless, if we further examine the ground floor area vs the useful floor area
and the number of storeys:

In [None]:
### Check useful floor area vs ground floor area and storeys.

cols = [
    "REFERENCE BUILDING USEFUL FLOOR AREA (m2)",
    "NUMBER OF REFERENCE BUILDING STOREYS",
    "REFERENCE BUILDING GROUND FLOOR AREA (m2)",
]
inds = abs(
    ambience.data[cols[0]] / ambience.data[cols[1]] - ambience.data[cols[2]]
) > 1
df2 = ambience.data.loc[inds,cols]
df2

and almost 20% of the reference buildings don't have number of floors matching the
useful and ground floor areas, if the buildings are assumed cuboids with walls
perpendicular to the ground.

In [None]:
### Check common cases?

len(set(df1.index.to_list() + df2.index.to_list()))

so at least most of the cases seem to have this erronous geometry in common.


## Process the full ArchetypeBuildingModel.jl dataset

The `ABMDataset` object contains the final processed data compatible with `ArchetypeBuildingModel.jl`,
as well as functions for exporting the Data Package containing said processed data.
The `ABMDataset` objects are constructed based on the raw `AmBIENCeDataset` object.


In [None]:
### Process the full ABM Dataset

abmdata = amb.ABMDataset(ambience)
abmdata

In [None]:
### Inspect location ids

abmdata.location_id

In [None]:
### Inspect building periods

abmdata.building_period

In [None]:
### Inspect building stocks

abmdata.building_stock

In [None]:
### Inspect structure types

abmdata.structure_type

In [None]:
### Inspect building stock statistics

abmdata.building_stock_statistics

In [None]:
### Inspect structure statistics

abmdata.structure_statistics

In [None]:
### Inspect ventilation and fenestration statistics

abmdata.ventilation_and_fenestration_statistics

In [None]:
### Try exporting to csvs and creating the datapackage

abmdata.export_csvs()
pkg = abmdata.create_datapackage()
pkg

## Check processed data against the original.

Let's do a few comparisons to try and see that the data processing is performing as intended.
For starters, let's make sure that aggregating over the `BUILDING MATERIAL COMBINATION CODE`
and disaggregating over the `HEATING SYSTEM PREVALENCY ON BUILDING STOCK` haven't distorted the
total heated floor area.

In [None]:
### Calculate equally aggregated numbers of buildings both from the original data and the processed output.

# Declare cols to aggregate over
cols = [
    "building_type",
    "building_period",
    "location_id"
]

# Renaming and aggregation of original data.
ambience_total_numbers = ambience.data.rename(
    columns={
        "REFERENCE BUILDING USE CODE": "building_type",
        "REFERENCE BUILDING COUNTRY CODE": "location_id",
        "NUMBER OF REFERENCE BUILDINGS IN THE BUILDING STOCK SEGMENT": "number_of_buildings"
    }
).groupby(cols).agg({"number_of_buildings": "sum"})

# Aggregation of processed data.
abm_total_numbers = abmdata.building_stock_statistics.reset_index().groupby(cols).agg({"number_of_buildings": "sum"})

# Check the differences (neglecting near-floating-point level)
num_diff = ambience_total_numbers - abm_total_numbers
num_diff[num_diff["number_of_buildings"].abs() > 1e-6]

Which should be empty.
There was a bug in the raw heatsys data for `DE-OTH-2011-2021`
which required normalizing the heating system prevalencies.

Next, let's check that the reference building useful floor areas haven't been distorted.

In [None]:
### Check that reference building areas don't get distorted.

# Declare cols to aggregate over
cols = [
    "building_type",
    "building_period",
    "location_id"
]

# Renaming original dataset fields for join and resetting indices.
ambience_ufa = ambience.data.rename(
    columns={
        "REFERENCE BUILDING USE CODE": "building_type",
        "REFERENCE BUILDING COUNTRY CODE": "location_id",
    }
).reset_index().set_index(cols)

# Join with processed building stocks statistics
abm = abmdata.building_stock_statistics.reset_index().set_index(cols)
ufa = ambience_ufa.join(abm)

# Check rows where the original reference building floor area doesn't match the processed floor area.
vals = [
    "REFERENCE BUILDING USEFUL FLOOR AREA (m2)",
    "average_gross_floor_area_m2_per_building",
] 
ufa[ufa[vals[0]] != ufa[vals[1]]][vals]

Which should again be empty.

If both of the above checks produced empty dataframes,
the building stock statistics processing should be working as intended.


### Check structural data processing.

Unfortunately, the structural data is a bit different between the AmBIENCe raw data
and the format required by the `ArchetypeBuildingModel.jl`.
The final `structure_statistics` doesn't include different building materials separately,
and instead aggregates them based on their assumed prevalency on the building stock.
Furthermore, the base floors are assumed to be ground-coupled, with their effective U-value
being calculated based on the simplified method by
*Kissock. K., Simplified Model for Ground Heat Transfer from Slab-on-Grade Buildings, (c) 2013 ASHRAE*.
Still, we can compare the processed total U-values to the saved design U-values from the original data.
Similarly, interior structures are assumed to omit insulation, so their U-values won't match.

In [None]:
### Check design vs total U-values of the processed structures.

vals = ["design_U_value_W_m2K", "total_U_value_W_m2K"]

uvals = abmdata.structure_statistics[
    (
        abmdata.structure_statistics[vals[0]]
        - abmdata.structure_statistics[vals[1]]
    ).abs() > 1e-6
]
uvals[vals]

As we can see, there are considerable differences in the design U-values in the raw data
versus the processed total U-values for a significant number of rows.
However, checking the problemating `structure_type`:

In [None]:
### Check problemating structure types

uvals.reset_index()["structure_type"].unique()

We should only get `base_floor, partition_wall, separating_floor`,
as we know them to be calculated different from AmBIENCe data.


#### Test interior node depth and period of variations

The assumed values of the `interior_node_depth` and `period_of_variations` impact the processing
of the interior and exterior U-values, as well as the effective thermal mass.

1. Increasing the `interior_node_depth` should decrease the internal U-value and increase the external U-value, and vice versa. However, the total U-value should remain unaffected.
2. Increasing the `period_of_variations` should increase the effective thermal mass.

In [None]:
### Check structure properties with different assumptions.

# Assumption tuples and initialize results dictionary.
tups = [(0.0, 60*60), (0.5, 24*60*60), (1.0, 7*24*60*60)]
resd = {}

# Calculate structure statistics for different assumption tuples.
for (ind, pov) in tups:
    ss = amb.AmBIENCeDataset(
        interior_node_depth=ind,
        period_of_variations=pov,
    ).calculate_structure_statistics()
    resd[(ind, pov)] = ss

In [None]:
### Check internal U-value progression

cols = [
    "internal_U_value_to_structure_W_m2K"
]
df = resd[tups[0]][cols]
for t in tups[1:len(tups)]:
    df = df.join(resd[t][cols], rsuffix=(" " + str(t)))
df


The internal U-value should be seen to decrease as the interior node depth is increased.
Furthermore, the interior U-value should be the same regardless of country,
period, and building type when interior node depth = 0.0, as the only
thermal resistance remaining is the assumed interior surface resistance
*(which is independent of the building properties)*.

Let's check external U-values next.

In [None]:
### Check external U-value progression

cols = [
    "external_U_value_to_ambient_air_W_m2K",
    "external_U_value_to_ground_W_m2K"
]
df = resd[tups[0]][cols]
for t in tups[1:len(tups)]:
    df = df.join(resd[t][cols], rsuffix=(" " + str(t)))
df = df.reindex(sorted(df.columns), axis = 1)
df

The external U-values should be seen to increase as the interior node depth increases.
The effect on the ground U-values is noticeably smaller,
as the ground resistance accounts for a significant portion of the total resistance.

Using large interior node depths seems a bit problematinc with AmBIENCe data,
as it would seem that some exterior wall structures barely contain any thermal insulation.
With 1.0 interior node depth, the exterior U-values can be seen to skyrocket up to 25,
which is not really ideal.

Next, let's look at the total U-values.

In [None]:
### Check total U-value progression

cols = [
    "total_U_value_W_m2K",
]
df = resd[tups[0]][cols]
for t in tups[1:len(tups)]:
    df = df.join(resd[t][cols], rsuffix=(" " + str(t)))
df = df.reindex(sorted(df.columns), axis = 1)
df

The total U-values should remain the same regardless of the assumed interior node depth.
This is because we're only tweaking the relative thermal resistances
to and from the temperature node inside the structures, but not the total thermal
resistance through the structure.

Next, let's check the effective thermal resistance.

In [None]:
### Check total U-value progression

cols = [
    "effective_thermal_mass_J_m2K",
]
df = resd[tups[0]][cols]
for t in tups[1:len(tups)]:
    df = df.join(resd[t][cols], rsuffix=(" " + str(t)))
df = df.reindex(sorted(df.columns), axis = 1)
df

Which should increase as the period of variations is increased.
Personally, I'm not sure if the period of variations has a lot of meaning for this
type of building modelling, but it was a "convenient" parameter to tweak the thermal mass
of the structures.


### Internal structure total U-values

In `ArchetypeBuildingModel.jl`, internal structures like `partition_wall` and `separating_floor` use the internal U-value for one surface, and the external U-value
for the other surface. Thus, the total U-value between the internal structure and the
indoor air is the sum of the two separate U-values. Conveniently,
this sum is dependent on the interior node depth assumption:

In [None]:
### Check internal structure total U-value progression

cols = [
    "internal_U_value_to_structure_W_m2K",
    "external_U_value_to_ambient_air_W_m2K"
]
df = resd[tups[0]][cols]
df["total_interior_U_value"] = df.sum(axis=1)
df = df[["total_interior_U_value"]]
for t in tups[1:len(tups)]:
    temp = resd[t][cols]
    temp["total_interior_U_value"] = temp.sum(axis=1)
    df = df.join(temp[["total_interior_U_value",]], rsuffix=(" " + str(t)))
df

Where we can see that the `partition_wall` and `separating_floor` total interior U-value
does indeed change a bit along with the assumed interior node depth.
For internal structures, the insulation layer is neglected,
as internal structures aren't typically thermally insulated.
The interior node depth is interpreted as the depth up until the middle of the structure,
so the minimum total interior U-value should be reached with interior node depth of 1.0.

Note that the values for the external structure types don't really mean anything.