# Pre-processing the data
This notebook will demonstrate transforming the data extracted from MASS into a tabular dataset (alomst) ready for use in machine learning. This notebook will do this for one forecast reference time and one realisation, to make the development process easier (and able to run on a smaller compute instance), and then a separate notebook will be created to do the actual "batch" processing of data.

We are using data around Storm Dennis ([Met Office](https://www.metoffice.gov.uk/weather/warnings-and-advice/uk-storm-centre/storm-dennis), [Wikipedia](https://en.wikipedia.org/wiki/Storm_Dennis))

The key steps in this process are as follows:
* Prepare radar data
  * Load in radar data files extracted from mass and agggregated into 1 file per day of data 
  * Accumuluate in 3hr accumulations to match 3 hour frequency of model data
  * Load a sample MOGREPS-UK UK cutout grid to use a regridding target
  * regrid radar data
  * transform into tabular data
* Prepare MOGREPS-G data
  * Load in data extract from MASS for the forecast reference times and leadtime of interest (IN this case around Storm Dennis 15/16 Feb 2020)
  * Extract UK Data
  * For each forecast ref time,load in all single level variables and transform to tabular (using xarray.Dataset.to_dataframe)
  * For each forecast ref time, load in variables on height levels and tranform to data frame
  * the xarray function by default puts variables on different heights on different rows, whereas we want all levels of a variable for a particular time/lat/lon/realization to be on the same row as separate features/columns. Transform by selecting different heights, renaming variables to include name and height, and merging together.
  * merge single level and height levels variables
  * concatenate different times into a single dataframe and save to disk.

In [2]:
import pathlib
import datetime
import functools
import os

In [3]:
import numpy

In [4]:
import pandas

In [5]:
import xarray
import iris
import iris.quickplot
import iris.coord_categorisation

In [6]:
import matplotlib.pyplot

# Set parameters for notebook
Set the paths and lists of things to process

In [7]:
project_name = 'precip_rediagnosis'
mogreps_g_name = 'mogreps-g'
ilab_project_dir = pathlib.Path('/project/informatics_lab/')
output_dir =  pathlib.Path('/scratch')/ os.environ['USER'] / project_name

In [8]:
root_data_dir = ilab_project_dir / project_name
mogreps_g_data_dir = root_data_dir / mogreps_g_name
radar_data_dir = root_data_dir / 'radar'

In [9]:
output_fname_template = 'prd_{lt:03d}H_{vt.year:04d}{vt.month:02d}{vt.day:02d}T{vt.hour:02d}{vt.minute:02d}Z.csv'

In [10]:
variables_single_level = [
    "cloud_amount_of_total_cloud",
    "rainfall_accumulation-PT03H",
    "snowfall_accumulation-PT03H",
    "rainfall_rate",
    "snowfall_rate",
    "height_of_orography",
    "pressure_at_mean_sea_level",
]

variables_height_levels = [
    "cloud_amount_on_height_levels",
    "pressure_on_height_levels",
    "temperature_on_height_levels",
    "relative_humidity_on_height_levels",
    "wind_direction_on_height_levels",
    "wind_speed_on_height_levels",
    
]

In [11]:
num_periods = 10
start_ref_time = datetime.datetime(2020,2,14,12)
forecast_ref_time_range = [start_ref_time + datetime.timedelta(hours=6)*i1 for i1 in range(num_periods)]
leadtime_hours = 15
realizations_list = list(range(35))

In [12]:
dataset = 'mogreps-g'
subset = 'lev1'
forecast_ref_template = '{frt.year:04d}{frt.month:02d}{frt.day:02d}T{frt.hour:02d}00Z.nc.file'
fname_template = '{vt.year:04d}{vt.month:02d}{vt.day:02d}T{vt.hour:02d}00Z-PT{lead_time:04d}H00M-{var_name}.nc'

In [13]:
variables_to_extract = variables_height_levels + variables_single_level

In [14]:
path_lists_vars = {
    var_name: [f1 for f1 in mogreps_g_data_dir.iterdir() if var_name in str(f1)]
    for var_name in variables_to_extract
}


In [15]:
uk_bounds={'latitude':(50,58), 'longitude': (-6,2)}
xarray_select_uk = {k1: slice(*v1) for k1,v1 in uk_bounds.items()}

## Create a dataset from MOGREPS-G data
Information on Met Office Ensmble forecasts - https://www.metoffice.gov.uk/research/weather/ensemble-forecasting#
Paper - https://www.metoffice.gov.uk/research/weather/ensemble-forecasting 

In [15]:
fcst_ref_time = forecast_ref_time_range[0]
real1 = realizations_list[10]
validity_time = fcst_ref_time + datetime.timedelta(hours=leadtime_hours)

In [16]:
validity_time

datetime.datetime(2020, 2, 15, 3, 0)

The file names do not match the variables names within the files, so we need to create a mapping to work with.

In [17]:
%%time
# load a cube for each variable in iris to get the actual variable name, and populate dictionary mapping from the var name in the file name to the variable as loaded into iris/xarray
file_to_var_mapping = {
    var_file_name: iris.load_cube(str(mogreps_g_data_dir / fname_template.format(vt=validity_time,
                                                                                 lead_time=leadtime_hours,
                                                                                 var_name=var_file_name))).name()
    for var_file_name in variables_single_level + variables_height_levels}
file_to_var_mapping

CPU times: user 739 ms, sys: 70 ms, total: 809 ms
Wall time: 1.77 s


{'cloud_amount_of_total_cloud': 'cloud_area_fraction',
 'rainfall_accumulation-PT03H': 'thickness_of_rainfall_amount',
 'snowfall_accumulation-PT03H': 'lwe_thickness_of_snowfall_amount',
 'rainfall_rate': 'rainfall_rate',
 'snowfall_rate': 'lwe_snowfall_rate',
 'height_of_orography': 'surface_altitude',
 'pressure_at_mean_sea_level': 'air_pressure_at_sea_level',
 'cloud_amount_on_height_levels': 'cloud_volume_fraction_in_atmosphere_layer',
 'pressure_on_height_levels': 'air_pressure',
 'temperature_on_height_levels': 'air_temperature',
 'relative_humidity_on_height_levels': 'relative_humidity',
 'wind_direction_on_height_levels': 'wind_from_direction',
 'wind_speed_on_height_levels': 'wind_speed'}

In [18]:
single_level_var_mappings = {v1: file_to_var_mapping[v1] for v1 in variables_single_level}
height_level_var_mappings = {v1: file_to_var_mapping[v1] for v1 in variables_height_levels}

In [19]:
for fcst_ref_time in forecast_ref_time_range:
    print(fcst_ref_time)

2020-02-14 12:00:00
2020-02-14 18:00:00
2020-02-15 00:00:00
2020-02-15 06:00:00
2020-02-15 12:00:00
2020-02-15 18:00:00
2020-02-16 00:00:00
2020-02-16 06:00:00
2020-02-16 12:00:00
2020-02-16 18:00:00


Create a function to load the data for a particular location (because this project is interested in the UK) and realization (to keep data manageable during development).

In [20]:
def load_realization_ds(ds_path, realization, selected_bounds):
    try:
        subset1 = dict(selected_bounds)
        subset1['realization'] = realization
        subset1['bnds'] = 0
        single_level_ds = xarray.load_dataset(ds_path).sel(**subset1)
    except KeyError as e1:
        single_level_ds = None
    return single_level_ds

In [21]:
%%time
ds_list = [load_realization_ds(
    ds_path= mogreps_g_data_dir / fname_template.format(vt=validity_time,
                                                     lead_time=leadtime_hours,
                                                     var_name=var1),
    realization=real1,
    selected_bounds=xarray_select_uk,
)
    for var1 in variables_single_level]
single_level_ds = xarray.merge([ds1 for ds1 in ds_list if ds1 is not None])


CPU times: user 2.6 s, sys: 396 ms, total: 3 s
Wall time: 3.63 s


In [22]:
single_level_ds

In [26]:
single_level_ds['air_pressure_at_sea_level'].to_iris()

Air Pressure At Sea Level (Pa),latitude,longitude
Shape,42,28
Dimension coordinates,,
latitude,x,-
longitude,-,x
Scalar coordinates,,forecast_period 15 hours
forecast_reference_time 2020-02-14 12,0:00 realization 10,0:00 realization 10
time 2020-02-15 03,0:00,0:00


First process the single level variables.

In [None]:
%%time
single_level_df = single_level_ds.to_dataframe().reset_index()
single_level_df

Load the variables on height levels and create a data frame using the xarray `to_dataframe` function.

In [None]:
%%time
ds_list1 = [load_realization_ds(
    ds_path=mogreps_g_data_dir / fname_template.format(vt=validity_time,
                                                       lead_time=leadtime_hours,
                                                       var_name=var1),
    realization=real1,
    selected_bounds=xarray_select_uk,
)
            for var1 in variables_height_levels]


In [None]:
height_levels_ds =  xarray.merge([ds1 for ds1 in ds_list1 if ds1 is not None])

In [None]:
%%time
hl_df_multirow = height_levels_ds.to_dataframe().reset_index()

In [None]:
hl_df_multirow

As shown above, the xarray method puts variables at different heights in different rows, where really we want them in the same row in separate columns e.g. `air_temperature_5m`, `air_temperature_10m` etc. So here we select each of the heights in turn and merge the resulting data frames to get what we want.

In [None]:
heights = hl_df_multirow['height'].unique()

In [None]:
print('\n'.join([f'{h1:.2f}m' for h1 in heights]))

In [None]:
merge_coords = ['latitude', 'longitude', 'time', 'realization']

In [None]:
coords = list(set(hl_df_multirow.columns) - set(height_level_var_mappings.values()))
print(coords)

In [None]:
%%time
var_df_merged = []
# heights_vars_marged = height_levels_df[height_levels_df.height==heights[0]][ merge_coords]
for var1 in height_level_var_mappings.values():
    print(var1)
    # for h1 in heights:
    #     heights_vars_marged[f'{var1}_{h1:.1f}'] = list(height_levels_df[height_levels_df.height==h1][var1])
    var_at_heights = [hl_df_multirow[hl_df_multirow.height==h1][merge_coords + [var1]].rename({var1: f'{var1}_{h1:.1f}'}, axis='columns') for h1 in heights]
    var_df_merged += [functools.reduce(lambda x,y: x.merge(y, on=merge_coords), var_at_heights)]
height_levels_df = functools.reduce(lambda x,y: x.merge(y, on=merge_coords), var_df_merged)

In [None]:
height_levels_df

Now that we have created the correct dataframe for variables on height levels, we can merge this with the dataframe for single level variables. We are merging on the following coordinates:
* location (latitude and longitude)
* time (validity time)
* realization

In [None]:
mogreps_g_single_ts_uk_df = single_level_df.merge(height_levels_df, on=merge_coords)
mogreps_g_single_ts_uk_df

In [None]:
prd_single_timestep_df = single_level_df.merge(height_levels_df, on=merge_coords)

In [None]:
prd_single_timestep_df

In [None]:
prd_single_timestep_df['time'].min(), prd_single_timestep_df['time'].max()

### Load radar data

Now we want to load the radar, to add a radar rainfall measurement to each column.

In [16]:
radar_days = [datetime.datetime(2020,2,14) + datetime.timedelta(days=d1) for d1 in range(5)]
radar_days

[datetime.datetime(2020, 2, 14, 0, 0),
 datetime.datetime(2020, 2, 15, 0, 0),
 datetime.datetime(2020, 2, 16, 0, 0),
 datetime.datetime(2020, 2, 17, 0, 0),
 datetime.datetime(2020, 2, 18, 0, 0)]

In [17]:
radar_fname_template = 'composite_rainfall_{dt.year:04d}{dt.month:02d}{dt.day:02d}.nc'

Create a single cube of all the radar for the period concerned that has been extract from MASS.
(See `extract_mass_radar.py` for details.)

In [18]:
radar_cube = iris.cube.CubeList([iris.load_cube(str(radar_data_dir / radar_fname_template.format(dt=dt1))) for dt1 in radar_days] ).concatenate_cube()

In [19]:
radar_cube

Rainfall Rate Composite (mm/h),time,projection_y_coordinate,projection_x_coordinate
Shape,1440,2175,1725
Dimension coordinates,,,
time,x,-,-
projection_y_coordinate,-,x,-
projection_x_coordinate,-,-,x
Auxiliary coordinates,,,
forecast_reference_time,x,-,-
Scalar coordinates,,,forecast_period 0 second
Attributes,,,Conventions CF-1.7 field_code 213 institution Met Office nimrod_version 2 probability_period_of_event 0 source Plr single site radars title Unknown


In [20]:
radar_cube.coord('time')

DimCoord(array([1581638400, 1581638700, 1581639000, ..., 1582069500, 1582069800,
       1582070100]), standard_name='time', units=Unit('seconds since 1970-01-01 00:00:00', calendar='gregorian'), var_name='time')

In [None]:
min([datetime.datetime(c1.point.year, c1.point.month, c1.point.day, c1.point.hour, c1.point.minute) for c1 in radar_cube.coord('time').cells()])

Radar data is instantaeous rainfall rates, measured every 5 minutes. Model data is every three hours. TO match these together, we will calculate "pseudo-accumulations" (pseudo because we're assuming that the instaneous rate represents 5 minute accumulations if we divide by 12, but the rain rate will not be constant in a 5 minute period.) Something we could consider would be some better statistical model to interpolate and do better accumulation calculations, but this is a starting point.

In [None]:
iris.coord_categorisation.add_hour(radar_cube, coord='time')
iris.coord_categorisation.add_day_of_year(radar_cube, coord='time')

In [None]:
coord_3hr = iris.coords.AuxCoord(radar_cube.coord('hour').points // 3,
                                long_name='3hr',
                                 units='hour',
                                )
radar_cube.add_aux_coord(coord_3hr, data_dims=0)

Now we aggregate the instantaneous values (which we are using as a proxy for 5 minute accumumlations) over three hours. Since the instantaneous values represent the expected accumulation over an hour if the rate stayed constant, if we want to use it as a 5 minute accumulation, we have to divide by 12. I'm doing this after the aggregation step, because this causes the lazy data to be loaded and there is less data to load after the aggregation step.

In [None]:
%%time
radar_agg_3hr = radar_cube.aggregated_by(['3hr', 'day_of_year'],iris.analysis.SUM)
radar_agg_3hr.data = radar_agg_3hr.data * (1.0 /12.0)
radar_agg_3hr

We now need to get our data on the same grid as our MOGREPS-G data. Iris has a regridding function, but this doesn't do what we want exactly, so we are going to calculate the values directly. The values we want are:
* fraction of grid box where a certain amount of precipitation (in a particular range) was record. This is essentially a histogram, but with amounts normalised to add up to 1.0, rather than total samples as in a normal histogram.
* max recorded rainfall in a grid box
* average recorded rainfaill in a grid box

To do this we will
* load in a sample of MOGREPS-G data as a target
* create latitude and longitude coordinates for the radar data, which doesn't have them initially, because it is not on a lat/lon grid.
* for each radar grid cell, calculate which mogreps-g cell it maps to
* for accumulation range, calculate which radar cells fall in that range
  * count those cells for each mogreps-g, then divide by total radar cells in that MOGREPS_G cell to get normalised histogram value
* for each MOGREPS-G cell, also calculate the max and average.



In [None]:
mogreps_g_example = iris.load_cube(
    str(mogreps_g_data_dir / fname_template.format(
        vt=forecast_ref_time_range[0] + datetime.timedelta(hours=leadtime_hours), 
        lead_time=leadtime_hours, 
        var_name=variables_single_level[0])),
    iris.Constraint(latitude=lambda cell1: uk_bounds['latitude'][0] < cell1 < uk_bounds['latitude'][1], 
                                                     longitude=lambda cell1: uk_bounds['longitude'][0] < cell1 < uk_bounds['longitude'][1], realization=0)
)


In [None]:
radar_crs = radar_cube.coord_system().as_cartopy_crs()

In [None]:
proj_y_grid = numpy.tile(radar_cube.coord('projection_y_coordinate').points.reshape(radar_cube.shape[1],1), [1, radar_cube.shape[2]])
proj_x_grid = numpy.tile(radar_cube.coord('projection_x_coordinate').points.reshape(1,radar_cube.shape[2]), [ radar_cube.shape[1],1])

In [None]:
ret_val = mogreps_g_example.coord_system().as_cartopy_crs().transform_points(
    radar_crs,
    proj_y_grid,
    proj_x_grid,
    )

In [None]:

lat_vals = ret_val[:,:,1]
lon_vals = ret_val[:,:,0]

In [None]:
lon_coord = iris.coords.AuxCoord(
    lon_vals,
    standard_name='longitude',
    units='degrees',
)
lat_coord = iris.coords.AuxCoord(
    lat_vals,
    standard_name='latitude',
    units='degrees',
)

In [None]:
radar_agg_3hr.add_aux_coord(lon_coord,[1,2])

In [None]:
radar_agg_3hr.add_aux_coord(lat_coord,[1,2])

In [None]:
radar_agg_3hr

Calaculate the mapping from radar cells to MOGREPS-G cells, to make the histogram calculations easier later.

In [None]:
lat_mog_g_index  = numpy.zeros((radar_cube.shape[1],radar_cube.shape[2]))
lon_mog_g_index  = numpy.zeros((radar_cube.shape[1],radar_cube.shape[2]))

In [None]:
%%time
for i_lon, bnd_lon in enumerate(mogreps_g_example.coord('longitude').bounds):
    print(i_lon)
    for i_lat, bnd_lat in enumerate(mogreps_g_example.coord('latitude').bounds):
        arr1, arr2 = numpy.where((lat_vals >= bnd_lat[0]) &
         (lat_vals < bnd_lat[1])&
         (lon_vals>= bnd_lon[0]) &
         (lon_vals < bnd_lon[1])
        )
        lon_mog_g_index[arr1, arr2] = i_lon
        lat_mog_g_index[arr1, arr2] = i_lat
        
        

In [None]:
def compare_time(t1, t2):
    return (t1.year == t2.year) and  (t1.month == t2.month) and  (t1.day == t2.day) and  (t1.hour== t2.hour) and  (t1.minute == t2.minute) 

To demonstrate, we pick a particular time. This would have to be done for all times 

In [None]:
validity_time

In [None]:
radar_select_time = radar_agg_3hr.extract(iris.Constraint(time=lambda c1: compare_time(c1.bound[0], validity_time )))
radar_select_time 

In [None]:
# these are modified compared to the actual improver thresholds, to remove the fuzziness of the range boundaries
improver_thresholds = {
"0.0": [0.0, 0.027],
"0.03": [0.027, 0.033],
"0.09": [0.033, 0.099],
"0.1": [0.099, 0.11],
"0.25": [0.11, 0.275],
"0.3": [0.275, 0.33],
"0.5": [0.33, 0.55],
"1.0": [0.55, 1.1],
"2.0": [1.1, 2.2],
"3.0": [2.2, 3.3],
"4.0": [3.3, 4.4],
"8.0": [4.4, 8.8],
"12.0": [8.8, 13.2],
"16.0": [13.2, 17.6],
"20.0": [17.6, 22.0],
"25.0": [22.0, 27.5],
"30.0": [27.5, 33.0],
"40.0": [33.0, 44.0],
"50.0": [44.0, 55.0],
"75.0": [55.0, 82.5],
"100.0": [82.5, 110.0],
"150.0": [110.0, 165.0],
"200.0": [165.0, 220.0]
}


In [None]:
bands_data = numpy.zeros([mogreps_g_example.shape[0], mogreps_g_example.shape[1], len(improver_thresholds)])

In [None]:
max_rain_data = numpy.zeros([mogreps_g_example.shape[0], mogreps_g_example.shape[1]])
mean_rain_data = numpy.zeros([mogreps_g_example.shape[0], mogreps_g_example.shape[1]])

In [None]:
radar_data1 = radar_select_time.data.data

In [None]:
masked_radar = numpy.ma.MaskedArray(
            radar_data1,
            radar_agg_3hr[0,:,:].data.mask)

In [None]:
%%time
for i_lat in range(mogreps_g_example.shape[0]):
    print(i_lat)
    for i_lon in range(mogreps_g_example.shape[1]):
        selected_cells = (~(radar_agg_3hr[0,:,:].data.mask)) & (lat_mog_g_index == i_lat)  & (lon_mog_g_index ==i_lon)
        masked_radar.mask = ~selected_cells
        radar_cells_in_mg = numpy.count_nonzero(selected_cells)
        if radar_cells_in_mg > 0:
            for imp_ix, (imp_key, imp_bounds) in enumerate(improver_thresholds.items()):

                num_in_band = numpy.count_nonzero((radar_data1 >=  imp_bounds[0]) & (radar_data1 <= imp_bounds[1]) & selected_cells)
                bands_data[i_lat, i_lon, imp_ix] = num_in_band / radar_cells_in_mg
            # calculate the average rain radar cell within each mogreps-g cell

            # calculate the max radar cell within each mogreps-g cell
            max_rain_data[i_lat, i_lon] = masked_radar.max()

            mean_rain_data[i_lat, i_lon] = (masked_radar.sum()) / radar_cells_in_mg

In [None]:
mg_lat_coord = mogreps_g_example.coord('latitude')
mg_lon_coord = mogreps_g_example.coord('longitude')


In [None]:
band_coord = iris.coords.DimCoord(
    [float(b1) for b1 in improver_thresholds.keys()],
    bounds=list(improver_thresholds.values()),
    var_name='band',
    units='degrees',
)

In [None]:
fraction_rain_band = iris.cube.Cube(
    data=bands_data, 
    dim_coords_and_dims=((mg_lat_coord, 0),(mg_lon_coord, 1),  (band_coord, 2)),
    units=None,
    var_name='fraction_in_band',
    long_name='Fraction radar rainfall cells in specified rain band',
)

In [None]:
max_rain_cube = iris.cube.Cube(
    data=max_rain_data, 
    dim_coords_and_dims=((mg_lat_coord, 0),(mg_lon_coord, 1),),
    units='mm',
    var_name='max_rain',
    long_name='maximum rain in radar cells within mogreps-g cell',
)

In [None]:
mean_rain_cube = iris.cube.Cube(
    data=mean_rain_data, 
    dim_coords_and_dims=((mg_lat_coord, 0),(mg_lon_coord, 1),),
    units='mm',
    var_name='mean_rain',
    long_name='average rain in radar cells within mogreps-g cell',
)

In [None]:
fig1 = matplotlib.pyplot.figure(figsize=(18,10))
ax1 = fig1.add_subplot(1,3,1, projection=mogreps_g_example.coord_system().as_cartopy_projection())
iris.quickplot.contourf(fraction_rain_band [:,:,4])
ax1.coastlines()
ax1 = fig1.add_subplot(1,3,2, projection=mogreps_g_example.coord_system().as_cartopy_projection())
iris.quickplot.contourf(fraction_rain_band [:,:,7])
ax1.coastlines()
ax1 = fig1.add_subplot(1,3,3, projection=mogreps_g_example.coord_system().as_cartopy_projection())
iris.quickplot.contourf(fraction_rain_band [:,:,10])
ax1.coastlines()

In [None]:
fig1 = matplotlib.pyplot.figure(figsize=(6,10))
ax1 = fig1.add_subplot(1,1,1, projection=mogreps_g_example.coord_system().as_cartopy_projection())
iris.quickplot.contourf(mean_rain_cube)
ax1.coastlines()

Load a sample variable from MOGREPS-G to use for regridding radar data.

In [None]:
# this is an example regrid operation, which we're not using currently
radar_mggrid = radar_agg_3hr.regrid(mogreps_g_example, iris.analysis.Linear())
radar_mggrid

In [None]:
iris.quickplot.contourf(radar_mggrid [14,:,:])
matplotlib.pyplot.gca().coastlines()

It is slightly tortuous, but having used the useful Iris functionality, we now swiotch to xarray to use the `to_dataframe` functionality.We could probably stick to one or the other rather, and that is a potential future refactoring (if possible).

In [None]:
frac_df = xarray.DataArray.from_iris(fraction_rain_band).to_dataframe().reset_index()

In [None]:
imp_bands = list(improver_thresholds.keys())

In [None]:
radar_df = frac_df[frac_df['band'] ==  float(imp_bands[0])][['latitude','longitude','fraction_in_band']]
radar_df = radar_df.rename({'fraction_in_band': f'fraction_in_band_{imp_bands[0]}'},axis='columns')
radar_df

In [None]:
for band1 in imp_bands[1:]:
    df1 = frac_df[frac_df['band'] ==  float(band1)][['latitude','longitude','fraction_in_band']]
    df1 = df1.rename({'fraction_in_band': f'fraction_in_band_{band1}'},axis='columns')
    radar_df = pandas.merge(radar_df, df1, on=['latitude', 'longitude'])

In [None]:
radar_df = pandas.merge(radar_df, xarray.DataArray.from_iris(mean_rain_cube).to_dataframe().reset_index(), on=['latitude', 'longitude'])

In [None]:
radar_df = pandas.merge(radar_df, xarray.DataArray.from_iris(max_rain_cube).to_dataframe().reset_index(), on=['latitude', 'longitude'])

In [None]:
radar_df['time'] = validity_time

In [None]:
radar_df

In [None]:
prd_single_timestep_df['time'][0]

In [None]:
# radar_df = radar_df[['latitude', 'longitude','time','rainfall_rate_composite']]

In [None]:
single_ts_ds = xarray.merge([single_level_ds, height_levels_ds])

Now we have the radar data, we can merge with model data and save to disk.

In [None]:
merged_dataset = pandas.merge(prd_single_timestep_df, radar_df, on=['latitude', 'longitude', 'time'])
merged_dataset

# Output as daily parquet files

In [None]:
output_fname = output_fname_template.format(lt=leadtime_hours, vt=validity_time)
output_path = output_dir / output_fname
prd_single_timestep_df.to_csv(output_path)

Currently I don't have parquet set up in our environments, so just using CSV for now (this is a small data set so far).