Skip to content

sdgresearch/NebulaStockModel

Repository files navigation

NEBULA Pipeline

NEBULA Dataset Generation

This repository contains scripts for generating the NEBULA dataset, a postcode-level dataset for neighbourhood energy modelling.

NEBULA Pipeline

Prerequisites

Environment Setup

# Create new environment
conda create -n nebula python=3.10

# Activate environment
conda activate nebula

# Install requirements
pip install -r requirements.txt
conda install conda-forge::libgdal==3.6.4
#  libtiff==4.5.0

Required Data Sources

User-Provided Data (Non-Open License)

  • Building Stock Data (Verisk)
  • Postcode Shapefiles (Edina)

Conversations with OS indicated postcodes shapefiles are open access data but we reccomend user download them themselves from accredited sources.

Provided Data (Open Government License)

Place these files in the input_data_sources directory, or download from our Zip:

  1. Gas and Electricity Data (DESNZ, 2022)
  2. ONS UPRN to Postcode Mapping (2022)
  3. Building Floor Count Global Averages
  4. Census 2021 Statistics
  5. Census 2021 Postcode-Output Area-Region Mappings
  6. Output Areas 2011-2021 Mapping
  7. Postcode Areas: area of postcodes (derived from postcode shapefiles)
  8. Climate Data (HAD-UK Monthly Temperature, 2022)

Directory Structure

input_data_sources/                   # Input data files
├── census_2021/
├── climate_data/
├── energy_data/
├── lookups/
│   ├── oa_lsoa_2021/               # OA to LSOA mapping
│   └── oa_2011_2021/               # OA conversion lookup
├── ONS_UPRN_DATABASE/
├── postcode_areas/
└── urban_rural_2011/

batches/                         # Processing batch lists

src/                              # Source code


intermediate_data/                # Temporary processing files - sub-themes results stored here
├── age/
├── census_attrs/
├── fuel/ 
├── temp_data/         
└── type/

final_dataset/                   # Output files
├── NEBULA_data_filtered.csv
├── Unfiltered_processed_data.csv
└── attribute_logs/             # Logs for building stock batch calculations - shows counts of records in each batch 
    ├── age_log_file.csv
    ├── fuel_log_file.csv
│   └── fuel_log_file.csv


main.py                     # Process for generating whole dataset if running locally 

split_onsud.py               # If running on HPC - stage 1 generates batch files 
generate_building_stock.py   # HPC python wrapper 
nebula_job.sh                # If running on HPC - bash script to submit multiple batches 
submit_nebula.sh            # If running on HPC - slurm submit for single batch 

create_global_averages.py  #Script for generating the global averages table. We include the 2022 global averages in intermediate data. Script provded for reference.  

License

© 2024 Grace Colverd

This code is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/

For commercial use, please contact: gb669@cam.ac.uk.

The processed dataset is available under an open licence - please see the accompanying paper for details.

Usage

  1. Install dependencies from requirements.txt
  2. Place input data in appropriate directories

If running locally

  1. Configure variables in main.py as needed
  2. Run the processing pipeline:
    python main.py

If running on HPC

  1. Generate the batches of 10k
    split_onsud.py
  2. Update slurm scripts nebula_job.sh and submit_nebula.sh to run fuel, age and typology calculation
  3. Submit multiple jobs using nebula_job.sh
  4. When all themes finished calculating, update main.py to just call the post process section

Output Dataset

The pipeline generates postcode-level statistics including:

  • Building age and type distributions
  • Temperature data (HDD/CDD)
  • Census demographics
  • Building statistics and averages

Notes

  • We batch up the process of converting the building stock dataset into postcode attributes (themes: building stock, typology and age). This enables better logging and multi threading. Current set up is to process each region separately and split into batches of 10k postcodes.
  • We provide two generation routes: local and HPC generation. For one region: running locally takes an estimated 48 hours. Multi threading can speed this up.
  • When running on HPC, we submit each type / region / batch as a separate job. Using a 8GB (3 CPUS) job, each 10k batch takes approx. 1.5 hours for fuel and 20 minutes for age/type. Total run time: (152 * 1.5) + (2 * 152 * .3) = 319 hours.
  • Check overlapping_pcs.txt for postcode boundary issues
  • See global_avs/ for reference statistics
  • Intermediate files can be safely deleted after final dataset generation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors