The NOAA HRRR is a real-time 3km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation.

This notebook provides an example of accessing HRRR data, including (1) finding the data file corresponding to a date and time, (2) retrieving a portion of that file from blob storage which includes the surface temperature variable, (3) opening the file using the xarray library, and (4) rendering an image of the forecast.

This dataset is documented at http://aka.ms/ai4edata-hrrr.

In [4]:
import climatedata_functions as climf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
metadata = pd.read_csv("../data/metadata.csv")
ds = climf.get_ds()#only works with recent dates

### Make metadafile with gridpoints

In the next cell the grid points are calculated. This needs to be executed only once and then the file is saved and then just loaded if the notebook is executed anther time. (Note: takes 6-8 minutes)

In [3]:
#tranform coordinates for longitude
metadata['longitude_trans'] = metadata['longitude']+360 
#add columns for gridpoints
metadata['x_grid'] = ''
metadata['y_grid'] = ''

#make new metadatafile and save as cvs 
#metadata_new = climf.save_grids(metadata, ds) #(takes approx 6 minutes)
#metadata_new.to_csv('../data/metadata_grids.csv')

In [None]:
#load new metadata file that includes the gridpoints
metadata = pd.read_csv("../data/metadata_grids.csv")
metadata.head(3)

### Get temperatures for gridpoints

Make function that gets the mean/median temperature for one date for all the places sampled at that date (to save time) and the previos x days

In [8]:
metadata = pd.read_csv("../data/metadata_temp_12_01_06_complete_without_a.csv")

**Data list:**

**Radiation**
* DSWRF:surface	(Downward Short-Wave Radiation Flux [W/m^2])
* DLWRF:surface	(Downward Long-Wave Rad. Flux [W/m^2])

**Wind**
* WIND:10 m above ground (0-0 day max f	Wind Speed [m/s])
* UGRD:10 m above ground (analysis	U-Component of Wind [m/s]) --> also avalable as 0-0 day max)
* VGRD:10 m above ground (analysis	V-Component of Wind [m/s]) --> also avalable as 0-0 day max)

**Temperature**
* maybe think abut getting a different time (UTS vs time at a specific place...)

We want to get the model for hours: 6, 12, 18, and 24h

In [16]:
#define  download parameters
days = 7  #how many days to go back?
hour = 18 #which our of the day (UTS time!!!)# we want to test: 6,12,18,24
param_layer = ':WIND:10 m above ground' # options: ':TMP:surface', ' ":DSWRF:surface"'  surface temperature, #available parameters and layes: https://www.nco.ncep.noaa.gov/pmb/products/hrrr/hrrr.t00z.wrfsfcf00.grib2.shtml
forecast_param = 'si10'#for temperature: 't', for wind:10maboveground: 'si10', for dswrf: 'dswrf' for name of the column in the metadata and the grib2 data
#dswrf = Downward Short-Wave Radiation Flux [W/m^2] https://www.goes-r.gov/products/baseline-DSR.html

In [17]:
#from tqdm import tqdm
#pbar.set_description(f"Processing {start_date} --> {temp_list[0][0]}")

from datetime import timedelta
import warnings
warnings.simplefilter(action='ignore')

#make empty list to store the dates already sampled
done_list = []

metadata[forecast_param+'_'+str(hour)] = '' #create new column to store the data 
#getting the data
for idx,row in enumerate(metadata.index): #takes first element in metadata list
    #print(row, idx)
    start_date = metadata.date[idx]
    date_index_list = metadata.index[metadata.date == start_date]#list with all the indexes of dates with the same date
    #print(date_index_list)
    if start_date not in done_list:#if data for the given date was already downloaded, this row will be skipped
        temp_list = [[0] * days for i in range(len(date_index_list))]#make list of lists to store the values inside
        done_list.append(start_date)#list of dates already samples
        start_date = climf.get_start_date(start_date)#formate to time object
        print(start_date)
        for x in range(days):
            count = 0
            #x = x*2 (if we want to take only every second day)
            day_date = start_date - timedelta(days=x)
            #ds, stop = climf.get_ds_aws_array(day_date, hour,param_layer, forecast_param)#getting the temperature array for the specified date
            ds, stop = climf.get_ds_aws_array(day_date, hour,param_layer, forecast_param)#getting the temperature array for the specified date
            for index in date_index_list:
                x_grid = metadata.x_grid[index]
                y_grid = metadata.y_grid[index]
                if stop == True:
                    temp_list[count][x] = np.nan
                else:
                    temp_list[count][x] = ds[x_grid][y_grid]
                count += 1
                if x == days-1 and index == date_index_list[len(date_index_list)-1]:#if condition is met put the values in the metadata file
                    for i in range(len(temp_list)):                    
                        metadata[forecast_param+'_'+str(hour)].loc[date_index_list[i]] = temp_list[i] #not index but  
    else:
        continue 

2018-05-14
2016-08-31
2020-11-19
2016-08-24
2019-07-23
2021-08-23
2017-11-15
2020-06-10
2014-08-12
2018-06-27
2013-11-06
2015-08-24
2014-11-01
2021-10-18
2015-08-26
2019-08-26
2018-01-08
2015-08-18
2017-07-28
2015-07-07
2018-06-13
2018-01-09
2013-07-09
2020-08-13
2013-08-17
2018-08-13
2017-03-09


In [15]:
metadata.to_csv(f'../data/metadata_{forecast_param}_{str(hour)}_complete.csv', index=False)

If the download was interupted the code in the next cell can be used to continue with the download. Since the dates already dowloaded are saved in 'done_list', it can continue from that withouth starting from scratch.

In [None]:
#done_list.pop()#remove last element (in case it didn work with that)
#len(done_list)#how many dates are already inside (1637 unique dates in total)
done_list

In [None]:
import warnings
warnings.simplefilter(action='ignore')


temp_list = []
#done_list = [] deactivate because we want to continue from the old donelist
days = 7
hour = 12 #which our of the day


for idx,row in enumerate(metadata.index): #takes first element in metadata list
    #print(row, idx)
    start_date = metadata.date[idx]
    date_index_list = metadata.index[metadata.date == start_date]#list with all the indexes of dates with the same date
    #print(date_index_list)
    if start_date not in done_list:#only do if this date wasnt used before
        temp_list = [[0] * days for i in range(len(date_index_list))]#make list of lists to store the values inside
        #print(len(temp_list), len(temp_list[0]))
        done_list.append(start_date)#list of dates already samples
        start_date = climf.get_start_date(start_date)#formate to time object
        print(start_date)
        for x in range(days):
            count = 0
            day_date = start_date - timedelta(days=x)
            ds, stop = climf.get_ds_aws_array(day_date, hour,param_layer, forecast_param)#getting the temperature array for the specified date
            for index in date_index_list:
                x_grid = metadata.x_grid[index]
                y_grid = metadata.y_grid[index]
                if stop == True:
                    temp_list[count][x] = np.nan
                else:
                    temp_list[count][x] = ds[x_grid][y_grid]
                count += 1
                if x == days-1 and index == date_index_list[len(date_index_list)-1]:#if condition is met put the values in the metadata file
                    for i in range(len(temp_list)):                    
                        metadata[forecast_param+'_'+str(hour)].loc[date_index_list[i]] = temp_list[i] #not index but  
    else:
        continue 

In [None]:
#metadata.to_csv(f'../data/metadata_{forecast_param}_{str(hour)}_complete.csv', index=False)