# Schema Usage

The schema does not actually store any data.  Instead, it is an interface which allows us to interact with numpy/torch tensors in a semantic manner.  It lets us convert between storage vectors (i.e. how we store the building parameters numerically on disk), simulation objects (e.g. Archetypal Templates and PyUmi Shoeboxes) and machine learning model imports (i.e. torch tensors with full hourly schedule data).

## Notebook setup

We need some jank to get relative imports working.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
import numpy as np

## Initialize the Schema

In [3]:
from schema import Schema, ShoeboxGeometryParameter
schema = Schema()

  return warn(


Let's see what's in the schema:

In [4]:
schema.parameter_names

['id',
 'base_template',
 'base_epw',
 'width',
 'height',
 'facade_2_footprint',
 'perim_2_footprint',
 'roof_2_footprint',
 'footprint_2_ground',
 'shading_fact',
 'wwr_n',
 'wwr_e',
 'wwr_s',
 'wwr_w',
 'orientation',
 'LightingPowerDensity',
 'EquipmentPowerDensity',
 'PeopleDensity',
 'FacadeRValue',
 'RoofRValue',
 'PartitionRValue',
 'SlabRValue',
 'schedules_seed',
 'schedules']

We can access a schema parameter from the schema with list indexing:

In [5]:
print(schema["width"])
print(schema["schedules"])
print(schema["orientation"])

---width---
shape_storage=(1,), shape_ml=(1,), dtype=scalar
Width [m]
---schedules---
shape_storage=(8, 16), shape_ml=(8, 8760), dtype=matrix
A matrix in the storage vector with operations to apply to schedules; a matrix of timeseries in ml vector
---orientation---
shape_storage=(1,), shape_ml=(4,), dtype=onehot
Shoebox Orientation


We see that each parameter may have multiple different lengths in the storage vector and ML vector.

We can also print a summary of the whole schema:

In [6]:
print(schema)

-------- Schema --------
---- id ----
shape storage: (1,) / shape ml: (0,)
location storage: 0->1 / location ml: 0->0

---- base_template ----
shape storage: (1,) / shape ml: (0,)
location storage: 1->2 / location ml: 0->0

---- base_epw ----
shape storage: (1,) / shape ml: (0,)
location storage: 2->3 / location ml: 0->0

---- width ----
shape storage: (1,) / shape ml: (1,)
location storage: 3->4 / location ml: 0->1

---- height ----
shape storage: (1,) / shape ml: (1,)
location storage: 4->5 / location ml: 1->2

---- facade_2_footprint ----
shape storage: (1,) / shape ml: (1,)
location storage: 5->6 / location ml: 2->3

---- perim_2_footprint ----
shape storage: (1,) / shape ml: (1,)
location storage: 6->7 / location ml: 3->4

---- roof_2_footprint ----
shape storage: (1,) / shape ml: (1,)
location storage: 7->8 / location ml: 4->5

---- footprint_2_ground ----
shape storage: (1,) / shape ml: (1,)
location storage: 8->9 / location ml: 5->6

---- shading_fact ----
shape storage: (1,) /

We see that the length of the storage vector is significantly smaller than the length the vector the ML model will see.

## Generating new design vectors in storage space

First let's generate a new, empty design vector, and update the Roof R-Value, and then check that it updated correctly:

In [7]:
storage_vector = schema.generate_empty_storage_vector()
schema.update_storage_vector(storage_vector=storage_vector, parameter="RoofRValue", value=25)
schema["RoofRValue"].extract_storage_values(storage_vector)

25.0

If we print out the full vector, we should be able to see th 25 and a whole bunch of zeros:

In [8]:
print(storage_vector)

[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0. 25.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.]


Let's create a new batch of designs:

In [9]:
batch_size = 20
storage_batch = schema.generate_empty_storage_batch(batch_size)
storage_batch.shape

(20, 151)

Great, we see that it has 20 design vectors with 151 values each.

Let's try updating all of the facade R-values values in a batch with the same value:

In [10]:
schema.update_storage_batch(storage_batch, parameter="FacadeRValue", value=14)
schema["FacadeRValue"].extract_storage_values_batch(storage_batch)

array([[14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.],
       [14.]])

Now let's try updating an entire batch with random values.  We can also unnormalize the uniform random variable into the desired range:

In [12]:
parameter = "SlabRValue"
n = batch_size
shape = (n, *schema[parameter].shape_storage)
values = np.random.rand(*shape) # create a random sample with appropriate shape
values = schema[parameter].unnormalize(values) # schema parameter must be a numeric type with min/max defined for unnormalize to work
schema.update_storage_batch(storage_batch, parameter=parameter, value=values)
schema[parameter].extract_storage_values_batch(storage_batch)

array([[28.24246904],
       [27.64297097],
       [ 1.71066259],
       [10.89511267],
       [31.1011037 ],
       [16.069414  ],
       [ 6.88066935],
       [ 2.95355916],
       [22.77858407],
       [46.37330167],
       [42.81798686],
       [ 3.79681516],
       [31.99608393],
       [29.40154383],
       [ 7.02707961],
       [ 0.81727163],
       [43.43749833],
       [32.04873754],
       [22.61740054],
       [33.73726791]])

Finally, let's try updating just a subset of the batch by using the `index` parameter:

*nb: we can also use an int instead of a tuple for `index` to only update a single vector's parameter*

In [13]:
start = 2
n = 8
end = start + n
parameter = "PartitionRValue"
shape = (n, *schema[parameter].shape_storage)
values = np.random.rand(*shape) # create a random sample with appropriate shape

schema.update_storage_batch(storage_batch, index=(start,end), parameter=parameter, value=values)
schema[parameter].extract_storage_values_batch(storage_batch) 

array([[0.        ],
       [0.        ],
       [0.77113849],
       [0.14133498],
       [0.10039915],
       [0.95804992],
       [0.7599215 ],
       [0.96962187],
       [0.89322071],
       [0.21677888],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.        ]])

A useful technique will be to start with a small batch, and then duplicate it in concatenations along `axis=0` as we build up our mixed grid/hypercube/random samples.  Let's start by creating a new batch with a single vector.

In [14]:
storage_batch = schema.generate_empty_storage_batch(1)
storage_batch.shape

(1, 151)

Now let's say some baseline parameters (e.g. pulled from ResStock)

In [15]:
schema.update_storage_batch(storage_batch, parameter="FacadeRValue", value=20)
schema.update_storage_batch(storage_batch, parameter="RoofRValue", value=30)
schema.update_storage_batch(storage_batch, parameter="LightingPowerDensity", value=7.2)
schema.update_storage_batch(storage_batch, parameter="base_template", value=15)
storage_batch = np.concatenate([storage_batch for _ in range(4)], axis=0)
storage_batch

array([[ 0. , 15. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  7.2,  0. ,  0. , 20. , 30. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,  0. ,
         0. ,  0. ,  0. ,  0. ,  0. , 

Now let's set the orientations:

In [17]:
values = np.arange(4).reshape(-1,1)
parameter = "orientation"
schema.update_storage_batch(storage_batch, parameter=parameter, value=values)
schema[parameter].extract_storage_values_batch(storage_batch)

array([[0.],
       [1.],
       [2.],
       [3.]])

Looks good!  Now let's stack this up and begin generating some geometric variations.

In [18]:
orientations_per_base = 4
geometric_variations_per_orientation = 5

In [19]:
storage_batch = np.repeat(storage_batch, geometric_variations_per_orientation, axis=0)
storage_batch.shape

(20, 151)

In [20]:
schema["orientation"].extract_storage_values_batch(storage_batch)

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [2.],
       [2.],
       [2.],
       [2.],
       [2.],
       [3.],
       [3.],
       [3.],
       [3.],
       [3.]])

Looks good!  let's start populating this: if we wanted to use repeating values, we could do nested loops:

In [21]:
for i in range(orientations_per_base):
	n = geometric_variations_per_orientation # how many design vectors in this mini batch
	start = i*n # where this mini batch starts in the parent batch
	end = start + n # where this mini batch ends in the parent batch
	for j,parameter in enumerate(schema.parameters):
		if isinstance(parameter, ShoeboxGeometryParameter):
			name = parameter.name
			shape = parameter.shape_storage
			np.random.seed(j+20923) # arbitrary but reliable seed
			values = np.random.rand(n, *shape) 
			values = parameter.unnormalize(values)
			schema.update_storage_batch(storage_batch, index=(start,end), parameter=name, value=values)


In [22]:
schema["wwr_e"].extract_storage_values_batch(storage_batch)

array([[0.73061709],
       [0.49898089],
       [0.93358064],
       [0.31829062],
       [0.86278618],
       [0.73061709],
       [0.49898089],
       [0.93358064],
       [0.31829062],
       [0.86278618],
       [0.73061709],
       [0.49898089],
       [0.93358064],
       [0.31829062],
       [0.86278618],
       [0.73061709],
       [0.49898089],
       [0.93358064],
       [0.31829062],
       [0.86278618]])

In [23]:
schema["width"].extract_storage_values_batch(storage_batch)

array([[2.47060375],
       [1.55185543],
       [3.44850939],
       [4.77931689],
       [2.03592266],
       [2.47060375],
       [1.55185543],
       [3.44850939],
       [4.77931689],
       [2.03592266],
       [2.47060375],
       [1.55185543],
       [3.44850939],
       [4.77931689],
       [2.03592266],
       [2.47060375],
       [1.55185543],
       [3.44850939],
       [4.77931689],
       [2.03592266]])

Great, these are repeating correctly!  Now, suppose we want to just slightly perturb all of these so that they aren't perfectly repeating, but are close to repeating:

In [24]:
for i,parameter in enumerate(schema.parameters):
	n = storage_batch.shape[0]
	name = parameter.name
	shape = parameter.shape_storage
	perturbations = np.random.rand(n,*shape)*0.2 - 0.1
	values = parameter.extract_storage_values_batch(storage_batch)
	values += perturbations
	schema.update_storage_batch(storage_batch,parameter=name,value=values)

schema["width"].extract_storage_values_batch(storage_batch)

array([[2.51081291],
       [1.46387821],
       [3.36970034],
       [4.76810012],
       [1.99254176],
       [2.54458022],
       [1.59163977],
       [3.52996215],
       [4.87626705],
       [2.06258373],
       [2.3725389 ],
       [1.60041238],
       [3.47399806],
       [4.74630787],
       [2.05948589],
       [2.42965623],
       [1.51820695],
       [3.54316544],
       [4.81372785],
       [2.06793996]])

Great!  We see that they are close to their previous values, but not identical.  

Alternatively, we might prefer to simply use fully random geometric variations for all of our orientation duplicates, rather than repeating the geometry across orientations:

In [28]:
for i,parameter in enumerate(schema.parameters):
	n = storage_batch.shape[0]
	name = parameter.name
	shape = parameter.shape_storage
	values = np.random.rand(n,*shape)
	values = parameter.unnormalize(values)
	schema.update_storage_batch(storage_batch,parameter=name,value=values)

schema["width"].extract_storage_values_batch(storage_batch)

array([[3.38143363],
       [4.8801458 ],
       [3.66225697],
       [4.52423927],
       [3.03800846],
       [1.78578402],
       [2.1275881 ],
       [4.57793393],
       [4.39557918],
       [1.70377606],
       [2.42212725],
       [4.9376061 ],
       [3.38556819],
       [3.59143862],
       [4.68360025],
       [1.79530624],
       [2.73102336],
       [3.60768661],
       [2.44653576],
       [4.88556948]])

Suppose this was our finished batch.  We can save it to an HDF5 file.  Let's say this was building 23 from our ResStock database.

In [33]:
import h5py
from storage import upload_to_bucket

In [37]:
# Update the building IDs
batch_id = 23 # suppose this is the base building we are drawing from
n = storage_batch.shape[0]
building_ids = batch_id + np.arange(n)
schema.update_storage_batch(storage_batch,parameter="id",value=building_ids)

# Write to an HDF5 file
slug = f"batch_{batch_id:04d}.hdf5"
outfile = f"./data/{slug}"
with h5py.File(outfile,"w") as f:
    f.create_dataset(name="storage_vectors", shape=storage_batch.shape, dtype=storage_batch.dtype, data=storage_batch)

# upload to cloud bucket for easy backup
destination = f"demo-batch-data/{slug}"
upload_to_bucket(destination, outfile)
