# Example Work Flow Advanced Interface


In this notebook we will go over some of the basic work flow to create a a surrogate model from an EnergyPlus simulation. We will train a neural network to find daily electricity output based on window to wall ratio and solar gain coefficient. Finally we will use this surrogate model to do an optimization of the building.

![Image](image/flow_diagram.PNG)

In [None]:
#!pip install besos --user
%matplotlib inline

import time

import numpy as np
import pandas as pd
import plotly
import tensorflow as tf
import tensorflow_docs as tfdocs
import tensorflow_docs.modeling
import tensorflow_docs.plots
from besos import eppy_funcs as ef, sampling
from besos.evaluator import EvaluatorEP, EvaluatorGeneric
from besos.parameters import FieldSelector, Parameter, RangeParameter, wwr
from besos.problem import EPProblem
from dask.distributed import Client
from matplotlib import pyplot as plt
from plotly import express as px
from tensorflow import keras
from tensorflow.keras import layers

In [None]:
# Use seaborn for pairplot
#!pip install --upgrade tensorflow --user

# Use some functions from tensorflow_docs
#!pip install git+https://github.com/tensorflow/docs --user

# (1) Set up the building from idf

The building is defined by the Information Data File (IDF) or using the new EnergyPlus format (epJSON).

In [None]:
# Open the IDF file
building = ef.get_building("Medium_Office.idf")
building.view_model()

In [None]:
# You can convert an idf to epJSON using the following code.
# !energyplus -c "Medium_Office.idf"

# (2) Evaluator
## Set up the inputs and outputs of your exploration

Defines how we will evaluate the building;
- what external weather conditions is the building experiencing,
- what properties of the building will we be changing, and
- what are some of the performance metrics of the building that we want to explore.

The weather conditions are specified in the EnergyPlus Weather File (EWP) file. The properties we will change in the building will be defined in the parameter space. In the objectives we will specify the what output performance metrics we wish to extract such that we can explore them later.

In [None]:
# building.idfobjects

In [None]:
# for materials in building.idfobjects["MATERIAL:NOMASS"]:
#     print("{} {}".format(materials.Name,materials.Thermal_Resistance))

# for materials in building.idfobjects["BUILDINGSURFACE:DETAILED"]:
#     if materials.Sun_Exposure!="NoSun": print(materials.Construction_Name )

# for materials in building.idfobjects['CONSTRUCTION']:
#     if materials.Name=="BTAP-Ext-Wall-Mass:U-0.315": print(materials)

![Image](image/setting_up_the_evaluator.PNG)

In [None]:
# Here we change all the external insulation of the building
insu1 = FieldSelector(
    class_name="MATERIAL:NOMASS",
    object_name="Typical Insulation 2",
    field_name="Thermal Resistance",
)


# Setup the parameters, Solar Heat Gain Coefficient
parameters = [
    Parameter(
        FieldSelector("Window", "*", "Solar Heat Gain Coefficient"),
        value_descriptor=RangeParameter(0.01, 0.99),
        name="Solar Gain Coefficient",
    ),
    Parameter(
        insu1, value_descriptor=RangeParameter(1, 15), name="Insulation Resistance"
    ),
]


# Add window-to-wall ratio as a parameter between 0.1 and 0.9 using a custom function
parameters.append(wwr(RangeParameter(0.1, 0.9)))


# Construct the objective
objective = ["Electricity:Facility"]


# Build the problem
problem = EPProblem(parameters, objective)

In [None]:
# setup the evaluator
evaluator = EvaluatorEP(
    problem,
    building,
    epw_file="victoria.epw",
    multi=True,
    progress_bar=True,
    distributed=True,
    out_dir="outputdirectory",
)

# (3) Generate the Dataset

1. Sample the problem space
2. Setup the parallel processing
3. Generate the Samples
4. Store and recover the expensive runs

In [None]:

# Use latin hypercube sampling to take 30 samples
inputs = sampling.dist_sampler(sampling.lhs, problem, 100)


# sample of the inputs
print(inputs.head())

In [None]:
# Setup the parallel processing in the notebook.
client = Client(threads_per_worker=1)
client

Run the samples

In [None]:
t1 = time.time()
# Run Energyplus
outputs = evaluator.df_apply(inputs)
t2 = time.time()
time_of_sim = t2 - t1

Calculate the time

In [None]:
def niceformat(seconds):
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    return hour, minutes, seconds


hours, mins, secs = niceformat(time_of_sim)

print(
    "The total running time: {:2.0f} hours {:2.0f} min {:2.0f} seconds".format(
        hours, mins, secs
    )
)
# Build a results DataFrame

In [None]:
results = inputs.join(outputs)
results.head()

## Take a look at the results

In [None]:
total_heating_use = results["Electricity:Facility"]


def norm_res(results):
    results_normed = (results - np.mean(results)) / np.std(results)
    return results_normed


plt.scatter(
    norm_res(results["Solar Gain Coefficient"]), total_heating_use, label="solar gain"
)
plt.scatter(
    norm_res(results["Window to Wall Ratio"]), total_heating_use, label="w2w ratio"
)
plt.scatter(
    norm_res(results["Insulation Resistance"]),
    total_heating_use,
    label="Insulation Resistance",
)

plt.legend()

## Store the expensive calculations

Since this can quite a big run. Lets store the results such that we don't have to rerun this problem.

In [None]:
inputs.to_pickle("inputs.pkl")
outputs.to_pickle("outputs.pkl")

In [None]:
inputs_ = pd.read_pickle("inputs.pkl")
outputs_ = pd.read_pickle("outputs.pkl")

# (5) Setup the dataset for the Surrogate Model

The outputs are packed in a single columns which will not work for tensorflow.

In [None]:
print(outputs_.head())
print(inputs_.head())

We will repack them using the following code, to get 365 different columns which will represent the output labels. Build the full dataset with inputs and outputs to easily split up the train and test data sets. The training data sets are used to train the model, while the test data set will show how general the model is.

In [None]:
dataset = inputs_.join(outputs_)
dataset.head()

# Split dataset into test and training

In [None]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

training_labels = train_dataset[outputs_.columns]
testing_labels = test_dataset[outputs_.columns]
training_labels

# Normalize the Data (Inputs of the model)

We will normalize the inputs and the outputs

In [None]:
train_stats = train_dataset[inputs_.columns]
train_stats = train_stats.describe()
train_stats = train_stats.transpose()
train_stats

In [None]:

# use the stats we calculated to do the normalization on the input.
def norm_input(x):
    return (x - train_stats["mean"]) / train_stats["std"]


def unnorm_input(x):
    return (x * train_stats["std"]) + train_stats["mean"]


normed_train_data = norm_input(train_dataset[inputs_.columns])
normed_test_data = norm_input(test_dataset[inputs_.columns])


print(test_dataset[inputs_.columns].head())
print(normed_test_data.head())
print(unnorm_input(normed_test_data.head()))

# Normalize the labels (Outputs of the model)

labels are the actual outputs that we are interested in.

In [None]:
train_mean = np.mean(training_labels)
train_std = np.std(testing_labels)
train_mean, train_std

In [None]:
def norm_output(x):
    return (x - train_mean) / train_std


def unnorm_output(x):
    return (x * train_std) + train_mean


train_labels = norm_output(training_labels)
test_labels = norm_output(testing_labels)
train_labels.head()

# (5) Build & Train Surrogate model architecture

In [None]:
def build_model():
    model = keras.Sequential(
        [
            layers.Dense(5, input_shape=[len(train_dataset[inputs_.columns].keys())]),
            layers.Dense(5),
            layers.Dense(1),
        ]
    )

    optimizer = tf.keras.optimizers.RMSprop(0.0001)

    model.compile(loss="mse", optimizer=optimizer, metrics=["mae", "mse"])
    return model

In [None]:
model = build_model()

In [None]:
model.summary()

In [None]:
EPOCHS = 1000

history = model.fit(
    normed_train_data,
    train_labels,
    epochs=EPOCHS,
    validation_split=0.2,
    verbose=0,
    callbacks=[tfdocs.modeling.EpochDots()],
)

In [None]:
plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)

In [None]:
plotter.plot({"Basic": history}, metric="loss")
plt.ylabel("loss")

# (6) Surrogate Model & Validate against the Test dataset

In [None]:
# See -> https://en.wikipedia.org/wiki/Coefficient_of_determination
# R squared score:
r_sqared_scores = []
sum_res_s = []
sum_tot_s = []
y_i = test_labels.loc[test_labels.index].values
y_m = np.mean(y_i) / y_i.size
for i in range(len(normed_test_data)):
    x_i = normed_test_data.loc[normed_test_data.index[i]].tolist()
    f_i = model.predict([x_i])[0]
    y_i = test_labels.loc[test_labels.index[i]].values
    ss_res = (f_i - y_i) ** 2
    ss_tot = (y_i - y_m) ** 2
    sum_res_s.append(f_i)
    sum_tot_s.append(y_i)
    r_sqared_scores.append(1 - ss_res / ss_tot)

plt.scatter(sum_res_s, sum_tot_s)
plt.xlabel("predicted values")
plt.ylabel("test values")
print("average R sqaured score: {}".format(np.mean(r_sqared_scores)))

# (7) Sample Surrogate Model

In [None]:
def evaluation_func(ind):
    vals = norm_input(list(ind))
    output = unnorm_output(model.predict([list(vals)])[0][0])
    return ((output.values[0],), ())


GP_SM = EvaluatorGeneric(evaluation_func, problem)

In [None]:
srinputs = sampling.dist_sampler(sampling.lhs, problem, 100)
sroutputs = GP_SM.df_apply(srinputs)
srresults = srinputs.join(sroutputs)
srresults.head()

# (8) Exploration

In [None]:
plotly.offline.init_notebook_mode(connected=True)

In [None]:
fig = px.parallel_coordinates(
    srresults,
    color="Electricity:Facility",
    dimensions=[
        "Window to Wall Ratio",
        "Insulation Resistance",
        "Solar Gain Coefficient",
        "Electricity:Facility",
    ],
    color_continuous_scale=px.colors.diverging.Tealrose,
)
fig.show()