## Using HSDS

The Highly Scalable Distributed Service (HSDS) is a cloud optimized API to enable access to .h5 files hosted on [AWS](https://registry.opendata.aws). The HSDS software was developed by the [HDF Group](https://www.hdfgroup.org/) and is hosted on Amazon Web Services (AWS) using a combination of EC2 (Elastic Compute) and S3 (Scalable Storage Service). You can read more about the HSDS service [in this slide deck](https://www.slideshare.net/HDFEOS/hdf-cloud-services).


#### Setting up HSDS

To get started install the h5pyd library:

```bash
pip install h5pyd
```

Next, configure h5pyd by running ``hsconfigure`` from the command line, or by
creating a configuration file at ``~/.hscfg``:

```bash
hsconfigure

hs_endpoint = https://developer.nrel.gov/api/hsds
hs_username =
hs_password =
hs_api_key = 3K3JQbjZmWctY0xmIfSYvYgtIcM3CN0cb1Y2w9bf
```

**The example API key here is for demonstration and is rate-limited per IP. To
get your own API key, visit https://developer.nrel.gov/signup/**

**Please note that our HSDS service is for demonstration purposes only, if you
would like to use HSDS for production runs of reV please setup your own
service: https://github.com/HDFGroup/hsds and point it to our public HSDS
bucket: s3://nrel-pds-hsds**


Clone the github - https://github.com/NREL/reV.git and point the TESTDATADIR to tests/data location in your computer.

NOTE: In all of these examples, the ``sam_file`` input points to files in
the reV test directory [`TESTDATADIR`](https://github.com/NREL/reV/tree/master/tests/data) that may not be copied in your install. You may want to download the relevant SAM system configs from that directory and point the ``sam_file`` variable to the correct filepath on your computer.


Also install National Renewable Energy Laboratory's (NREL's) REsource eXtraction tool:

```bash
pip install NREL-rex
```

In [1]:
import rex
from rex import init_logger
import h5pyd
import pandas as pd
import logging
from scipy.spatial import cKDTree
import numpy as np

In [2]:
# Point this to test data dir where the rev git hub repo is cloned.
TESTDATADIR='/Users/lijjumathew/Code/SMU/reV/tests/data'

In [3]:
fp = '/nrel/nsrdb/v3/nsrdb_2019.h5'
f = h5pyd.File(fp,'r')

In [4]:
list(f.attrs)

['version']

In [5]:
f.attrs['version'] 

'3.1.1'

In [6]:
list(f)

['air_temperature',
 'alpha',
 'aod',
 'asymmetry',
 'cld_opd_dcomp',
 'cld_reff_dcomp',
 'clearsky_dhi',
 'clearsky_dni',
 'clearsky_ghi',
 'cloud_press_acha',
 'cloud_type',
 'coordinates',
 'dew_point',
 'dhi',
 'dni',
 'fill_flag',
 'ghi',
 'meta',
 'ozone',
 'relative_humidity',
 'solar_zenith_angle',
 'ssa',
 'surface_albedo',
 'surface_pressure',
 'time_index',
 'total_precipitable_water',
 'wind_direction',
 'wind_speed']

### Nearest Timeseries for given Lat/Lon

In [7]:
dset_coords = f['coordinates'][...]
tree = cKDTree(dset_coords)
def nearest_site(tree, lat_coord, lon_coord):
    lat_lon = np.array([lat_coord, lon_coord])
    dist, pos = tree.query(lat_lon)
    return pos

DallasCity = (32.77,  96.79)
DallasCity_idx = nearest_site(tree, DallasCity[0], DallasCity[1] )

print("Site index for Dallas City: \t\t {}".format(DallasCity_idx))
print("Coordinates of Dallas City: \t {}".format(DallasCity))
print("Coordinates of nearest point: \t {}".format(dset_coords[DallasCity_idx]))

Site index for Dallas City: 		 2018268
Coordinates of Dallas City: 	 (32.77, 96.79)
Coordinates of nearest point: 	 [ 51.17 179.42]


In [9]:
# Open .h5 file
with h5pyd.File('/nrel/nsrdb/v3/2019.h5', mode='r') as f:
    # Extract time_index and convert to datetime
    # NOTE: time_index is saved as byte-strings and must be decoded
    time_index = pd.to_datetime(f['time_index'][...].astype(str))
    # Initialize DataFrame to store time-series data
    time_series = pd.DataFrame(index=time_index)
    # Extract variables needed to compute generation from SAM:
    for var in ['air_temperature','alpha','aod','asymmetry','cld_opd_dcomp','cld_reff_dcomp','clearsky_dhi','clearsky_dni','clearsky_ghi','cloud_press_acha','cloud_type','dew_point','dhi','dni', 'fill_flag','ghi','ozone','relative_humidity','solar_zenith_angle','ssa','surface_albedo','surface_pressure','total_precipitable_water','wind_direction', 'wind_speed']:
    #for var in ['ghi','ozone']:
        # Get dataset
        ds = f[var]
        #print(var)
        # attr_list = [key for key in ds.attrs.keys()]
        # print("Variable:" + str(var) + ":" + str(attr_list))
        #Extract scale factor
        if 'psm_scale_factor' in ds.attrs:
            scale_factor = ds.attrs['psm_scale_factor']
            # Extract site 100 and add to DataFrame
            time_series[var] = ds[:, 1244690] / scale_factor
        else:
            time_series[var] = ds[:, 1244690]

OSError: [Errno 404] Not Found

In [30]:
time_series

NameError: name 'time_series' is not defined

In [19]:
time_series.index.name = 'timestamp_solar'
time_series.to_csv('dallas_solar_energy_2019.csv')