## Data Download

This notebook assists in downloading the relevant data.

The PVDAQ Data Archives is in an Amazon S3 bucket, and has packages that assist in accessing the data sets. These sets are public, so require no API keys. The data can also be found through https://openei.org/wiki/PVDAQ/PVData_Map and downloaded manually. Due to the size of the environment data file and limitations in default Git storage, the raw file provided in this repository is a reduced version.

The NSRDB database can be found at https://nsrdb.nrel.gov/data-viewer and requires an email to receive the chosen data. Also provided is a script that downloads the data; this requires additionally giving an API key.

The .grib comes from the ERA5 dataset found at https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download. There is an API that can assist in downloading, or a manual request can be made. Due to the size of possible data requests, cloud cover was considered a primary data point to collect.

In [None]:
#Import relevant packages and select site+download location
import os
import boto3
import botocore
from botocore.handlers import disable_signing

site = "9068"
path = './Data_9068'

In [None]:
s3 = boto3.resource("s3")
s3.meta.client.meta.events.register("choose-signer.s3.*", disable_signing)
bucket = s3.Bucket("oedi-data-lake")

#Find each target file in buckets
target_dir = site + '_OEDI'
prefix =  "pvdaq/2023-solar-data-prize/" +  target_dir + "/data/"
objects = bucket.objects.filter(Prefix=prefix)
year = "2024"

#Download chosen data files
for obj in objects:
    if year in obj.key[43:]:
        bucket.download_file(obj.key, os.path.join(path, os.path.basename(obj.key)).replace("\\", "/"))
        print("Downloaded", obj.key[43:])


Downloaded 9068_irradiance_data_20240101_20250430.csv


In [None]:
#Python code provided by NSRDB

import requests
import pandas as pd
import urllib.parse
import time

API_KEY = "{{YOUR_API_KEY}}"
EMAIL = "insert.your.email@fake.com"
BASE_URL = "https://developer.nrel.gov/api/nsrdb/v2/solar/nsrdb-GOES-conus-v4-0-0-download.json?"
POINTS = [
'1770199'
]

def main():
    input_data = {
        'attributes': 'air_temperature,alpha,aod,asymmetry,clearsky_dhi,clearsky_dni,clearsky_ghi,cloud_fill_flag,cloud_type,dew_point,dhi,dni,fill_flag,ghi,ozone,relative_humidity,solar_zenith_angle,ssa,surface_albedo,surface_pressure,total_precipitable_water,wind_direction,wind_speed',
        'interval': '5',
        
        'api_key': API_KEY,
        'email': EMAIL,
    }
    for name in ['2024','2023']:
        print(f"Processing name: {name}")
        for id, location_ids in enumerate(POINTS):
            input_data['names'] = [name]
            input_data['location_ids'] = location_ids
            print(f'Making request for point group {id + 1} of {len(POINTS)}...')

            if '.csv' in BASE_URL:
                url = BASE_URL + urllib.parse.urlencode(data, True)
                # Note: CSV format is only supported for single point requests
                # Suggest that you might append to a larger data frame
                data = pd.read_csv(url)
                print(f'Response data (you should replace this print statement with your processing): {data}')
                # You can use the following code to write it to a file
                # data.to_csv('SingleBigDataPoint.csv')
            else:
                headers = {
                  'x-api-key': API_KEY
                }
                data = get_response_json_and_handle_errors(requests.post(BASE_URL, input_data, headers=headers))
                download_url = data['outputs']['downloadUrl']
                # You can do with what you will the download url
                print(data['outputs']['message'])
                print(f"Data can be downloaded from this url when ready: {download_url}")

                # Delay for 1 second to prevent rate limiting
                time.sleep(1)
            print(f'Processed')


def get_response_json_and_handle_errors(response: requests.Response) -> dict:
    """Takes the given response and handles any errors, along with providing
    the resulting json

    Parameters
    ----------
    response : requests.Response
        The response object

    Returns
    -------
    dict
        The resulting json
    """
    if response.status_code != 200:
        print(f"An error has occurred with the server or the request. The request response code/status: {response.status_code} {response.reason}")
        print(f"The response body: {response.text}")
        exit(1)

    try:
        response_json = response.json()
    except:
        print(f"The response couldn't be parsed as JSON, likely an issue with the server, here is the text: {response.text}")
        exit(1)

    if len(response_json['errors']) > 0:
        errors = '\n'.join(response_json['errors'])
        print(f"The request errored out, here are the errors: {errors}")
        exit(1)
    return response_json

if __name__ == "__main__":
    main()

<a id='step3'></a>