# Accessing Data

The following code was used to generate the input data for the diffusion model. It was adjusted from https://github.com/RupaKurinchiVendhan/WiSoSuper. 

More information for accessing data from the WIND Toolkit and NSRDB can be found at the following resources:
1. WIND Toolkit: https://www.nrel.gov/grid/wind-toolkit.html
2. NSRDB: https://nsrdb.nrel.gov/
3. Stand up your own HSDS server: https://github.com/HDFGroup/hsds
4. Use the HDF groups Kita Lab (a managed HSDS service on AWS, for higher rate limits on free trial basis): https://www.hdfgroup.org/solutions/hdf-kita/
5. HSDS Wind Examples: https://github.com/NREL/hsds-examples/blob/master/notebooks/01_WTK_introduction.ipynb
6. HSDS Solar Examples: https://github.com/NREL/hsds-examples/blob/master/notebooks/03_NSRDB_introduction.ipynb

## Wind Data

Wind velocity data is comprised of northerly and easterly wind components, denoted $v$ and $u$ respectively, calculated from 100-m height wind speed and direction. The WIND Toolkit has a spatial resolution of approximately 2-km $\times$ 2-km. The training data was sampled at a 4-hourly temporal resolution, starting January 1, 2007 at 12 am.

In [2]:
%matplotlib inline
import h5pyd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random
import os

In [3]:
os.environ["HS_ENDPOINT"] = "https://developer.nrel.gov/api/hsds"
os.environ["HS_USERNAME"] = "None" 
os.environ["HS_PASSWORD"] = "None"
os.environ["HS_API_KEY"] =  "8ALikbk9fUHqWvrq5vcc9VRFy0wXLd5Sl4X5vwjY"

In [5]:
f = h5pyd.File("/nrel/wtk-us.h5", 'r', bucket="nrel-pds-hsds")

In [1]:
wind_timesteps = range(0, 61368, 4) # sample data in four hour intervals

In [6]:
dset_speed = f['windspeed_100m']
dset_dir = f['winddirection_100m']

In [7]:
hr_img_size = 64
lr_img_size = 16

## Code to extract only middle patch

### Sizes
Note that the entire windfield would have size 1600x1600. For each timestep, we extract the middle patch which is of size 64x64. The low-res images are obtained by downsampling the 64x64 images and are of size 16x16.

### Note on timesteps
For each timestep, there are two components for wind speed in westward (ua) and southward (va) wind direction. Overall, there would be over 15k timesteps, leading to over 30k images. However, as the data download was very slow and already took many hours, we only downloaded the first roughly 12k images.

In [None]:
# iterate over timesteps
for timestep in wind_timesteps:
    speed_HR = dset_speed[timestep,::,::]
    direction_HR = dset_dir[timestep,::,::]
    speed_HR = speed_HR[:1600,500:2100]
    direction_HR = direction_HR[:1600,500:2100]
    ua_HR = np.multiply(speed_HR, np.cos(np.radians(direction_HR+np.pi/2)))
    va_HR = np.multiply(speed_HR, np.sin(np.radians(direction_HR+np.pi/2)))
    
    h_HR = hr_img_size
    w_HR = hr_img_size
    h_LR = lr_img_size
    w_LR = lr_img_size
    
    # downsample to LR image - take every 4th pixel
    ua_LR = ua_HR[::4, ::4]
    va_LR = va_HR[::4, ::4]
    
    n_patches = 1
    
    ua_wind_data_HR = np.zeros(shape=(n_patches, h_HR, w_HR))
    ua_wind_data_LR = np.zeros(shape=(n_patches, h_LR, w_LR))
    va_wind_data_HR = np.zeros(shape=(n_patches, h_HR, w_HR))
    va_wind_data_LR = np.zeros(shape=(n_patches, h_LR, w_LR))
    wind_data = np.zeros((n_patches, h_HR, h_HR, 2))
    
    # take middle patch
    idx = 0
    row = 12
    col = 12
    ua_wind_data_HR[idx] = ua_HR[(col*h_HR):(h_HR+col*h_HR), (row*w_HR):(w_HR+row*w_HR)]
    ua_wind_data_LR[idx] = ua_LR[(col*h_LR):(h_LR+col*h_LR), (row*w_LR):(w_LR+row*w_LR)]
    va_wind_data_HR[idx] = va_HR[(col*h_HR):(h_HR+col*h_HR), (row*w_HR):(w_HR+row*w_HR)]
    va_wind_data_LR[idx] = va_LR[(col*h_LR):(h_LR+col*h_LR), (row*w_LR):(w_LR+row*w_LR)]
    wind_data[idx] = np.dstack([ua_wind_data_HR[idx],va_wind_data_HR[idx]])

    ua_filename = "ua_{timestep}.png".format(timestep=timestep, idx=idx)
    va_filename = "va_{timestep}.png".format(timestep=timestep, idx=idx)

    plt.imsave("train/wind/middle_patch/LR/"+ua_filename, ua_wind_data_LR[idx], origin='lower', format="png")
    plt.imsave("train/wind/middle_patch/HR/"+ua_filename, ua_wind_data_HR[idx], origin='lower', format="png")
    plt.imsave("train/wind/middle_patch/LR/"+va_filename, va_wind_data_LR[idx], origin='lower', format="png")
    plt.imsave("train/wind/middle_patch/HR/"+va_filename, va_wind_data_HR[idx], origin='lower', format="png")

Wind data file names are structure to have the format of `{component}_{timestep}_{index}.png`.