# Final Report
##### Kaleb Guillot
##### May 30, 2023
---
This report details the work done and code produced in the FA2022 - SP2023 semesters. Throughout these two semesters, the work I have produced was done under the supervision of Dr. Tzeng. Much of the code produced caters towards the research of Fudong Lin in his deep neural network and dataset projects.

My work has focused around understanding and working with WRF-HRRR GRIB2 files. These are files in a highly compressed format containing meterological data. Their significance lies in their use by the machine learning models produced for the NSF funded [PREFER](https://prefer-nsf.org/) project. In order to decompress and format these files, I have used both C and Python based implementations.

<div style="display:flex; justify-content:center;">
    <div style="flex:1; margin-right:50px; margin-left:200px">
        <img src="figs/grib_cube.png" alt="Grib Cube Structure" style="width:80%">
    </div>
    <div style="flex:1; margin-left:50px; margin-right:100px">
        <img src="figs/pic_grib.png" alt="Grib Parameter Structure" style="width:58%">
    </div>  
</div>

This report will include:
- [HRRR-WRF GRIB2 File Overview](#overview)
- [Basics of Downloading / Extracting GRIB2 Files](#basics)
    - [Python Based Implementation](#python-based)
    - [C Based Implementation](#C-based)
- [CUDA](#cuda)
    - [ECCODES-CUDA](#eccodes-cuda)
- [Outline of customExtraction](#outline-of-ce)


### HRRR-WRF GRIB2 File Overview <a class="anchor" id="overview"></a>
---
The Weather Research and Forecasting (WRF) Model is a numerical weather prediction system developed by a collaboration of research organizations for simulating and forecasting weather conditions. It is an open-source model, allowing for users to modify the model to fit their applications. The High Resolution Rapid Refresh (HRRR) model is a derivative of the WRF model, providing a higher resolution of predictions generated every hour. The numerical weather observations of the HRRR model span across the entire continental United States, having the ability to not only forecast the weather, but also capturing rapidly evolving weather phenonma and severe weather predictions. 

The numerical weather measurements and predictions from the HRRR model are stored in GRIdded Binary version 2 (GRIB2) file format, a highly processed and compressed file format. The steps involved to minimize the space taken up by each hourly WRF-HRRR file include preprocessing, packing, spatial differencing, quantization, compression, and encoding. The initial preprocessing removes any incorrect or irrelevant data, including data measured from outside the continential United States. During packing, the data is encoded into a binary format and the headers are added to the meta data. Spacial differencing minimizes space by encoding the differences between adjacent points rather than storing a given point's full value. Quantization lessens the precision of the data, allowing values to take up fewer bits. Compression methods are then used to further shrink the file's size. These methods include a combination of both LZ77 and Huffamn encoding. Then finally, the file is encoded into a binary-to-text scheme, represing the binary data in ASCII format. 

The data in this file structure is often represented as a 3D cube, containing cartesian points on an XYZ coordinate plane. For each GRIB2 file, the file X and Y coordinates correspond to lattitudes and longitudes across the continential United States, while points in the Z plane represent the weather parameters that the HRRR model has measured. These weather parameters are sometimes called Grib Messages; they contain the name of the parameter, unit of measurement, and level above the Earth's surface. The current version of the models stores approximately 150 weather parameters.

IMPORTANT NOTE: WRF-HRRR data contains some parameters that are marked as "accumulation" measurements, such as `Total Precipitation`. This means that the real-time (files that are observed measurements, not predicted) values for that parameter will always be 0. A complete list of GRIB2 parameters can be found on [the eccodes webpage](https://codes.ecmwf.int/grib/param-db/?filter=grib2), where accumulation measurements are marked as such. 


### Basics of Downloading / Extracting GRIB2 Files <a class="anchor" id="basics"></a>
The python module Herbie was used to download the WRF-HRRR GRIB2 files from a number of repositories. The files were then stored on canpc39 (now canpc40) under the directory `/mnt/wrf/`. The file used is mentioned later in the report under [Outline of customExtraction](#outline-of-ce). 

The Herbie module was used due to its flexibility and ease of use. Through using this library, a single command could be used to download a file, and if need be, only download a subset of the file as specified by a regex string. An example of using Herbie is as follows:  


In [None]:
from herbie import Herbie

date_time = "20220101 10:00" # format as yyyymmdd hh:MM
save_dir = "/directory/to/save/wrf/files/"

herb_obj = Herbie(date_time, model="hrrr",
                  product="sfc", save_dir=save_dir,
                  priority=['pando', 'pando2', 'aws', 'nomads',
                            'google', 'azure'],
                  fxx=0).download(":(?:TMP|RH):2 m")

The first argument is a string of the date, formatted as "yyyymmdd hh:MM". The next argument, the model, specifies that the HRRR contiguous United states model will be downloaded. The `product` argument will download the 150 parameters of surface fields. The `priority` argument gives the list of repositories to search for the files. Each time that the specified file is not found for the given date, the next repository in the array is searched. The `fxx` is the forecast lead time in hours, specifying how many hours in the future the selected file will predict. Fxx is set to 0 to download the real-time values. The argument passed to the `download()` is a regex string which specifies that only a subset of the file will be downloaded. This subset specifically, are the two layers `2 metre temperature` and  `2 metre relative humidity`. 

Important Note: Herbie is still a very new module and is prone to problems. Oftentimes when downloading the data with Herbie, execution will halt while Herbie tries to communicate with the repository. When downloading data with this model, set an automatic timer which specifies that if a time limit exceeds, the file's download should restart and try again. 