# AMOS 2024 - Introducing loaddata Python module

##### In this notebook we demonstrate the use of the LOADDATA module. 
##### The module is to simplify locating and loading the data files from BARRA2 regional reanalysis and BARPA regional projections in NCI Data Collection

##### BARPA:  https://opus.nci.org.au/pages/viewpage.action?pageId=264241161

##### BARRA2: https://opus.nci.org.au/pages/viewpage.action?pageId=264241166

##### Before using this notebook, users must join ob53 and py18 projects via,

##### To access BARRA2 data: https://my.nci.org.au/mancini/project/ob53/join

##### To access BARPA data: https://my.nci.org.au/mancini/project/py18/join
        

In [1]:
import os, sys
os.chdir("/g/data/hd50/chs548/BARRA2_evaluation/jt/notebooks/")
import loaddata
from datetime import datetime as dt

In [2]:
# Print documentation, which lists the available methods
help(loaddata)

Help on module loaddata:

NAME
    loaddata

DESCRIPTION
    NAME
        loaddata
      
    DESCRIPTION
        loaddata is a Python module for interfacing with the BARRA2 and BARPA
        data sets in the NCI data collection.
        
    PREREQUISITE
        Users must join ob53 and py18 projects via,
          https://my.nci.org.au/mancini/project/ob53/join
          https://my.nci.org.au/mancini/project/py18/join
          
    AUTHOR
        Chun-Hsu Su, chunhsu.su@bom.gov.au, Bureau of Meteorology

FUNCTIONS
    get_barpa_files(rcm, gcm, scenario, freq, variable, version='*', tstart=None, tend=None)
        Returns all the matching BARPA files in the NCI data collection.
        
        Parameters:
            rcm (str): Regional model, e.g. BARPA-R, BARPA-C
            gcm (str): Driving GCM name, e.g. ACCESS-CM2
            scenario (str): GCM experiment, e.g. historical, ssp370, ssp126, evaluation
            freq (str): Time frequency of data, e.g., 1hr, day, mon
        

In [3]:
# List the available BARRA2 experiments published so far
loaddata.list_experiments("BARRA2")

# Do you know?
# domain=AUS-11 says it is over Australia at 0.11 deg
# era5_mem=hres says the experiment is nested in ERA5 HRES reanalysis
# model=BARRA-R2 is one of the BARRA2 systems

{domain}   {era5_mem}   {model}
AUS-11   hres   BARRA-R2


In [4]:
# List the available BARPA experiments published so far
loaddata.list_experiments("BARPA")

# Do you know?
# domain=AUS-15 says it is over Australia at around 0.15 deg
# gcm indicates the driving global model from CMIP6
# scenario distinguishes whether this is historical or ssp* or ERA5-evaluation run
# ens indicates which global model ensemble member
# rcm=BARPA-R is one of the BARPA systems

{domain}   {gcm}   {scenario}   {ens}   {rcm}
AUS-15   CESM2   historical   r11i1p1f1   BARPA-R
AUS-15   NorESM2-MM   historical   r1i1p1f1   BARPA-R
AUS-15   ACCESS-CM2   historical   r4i1p1f1   BARPA-R
AUS-15   CMCC-ESM2   historical   r1i1p1f1   BARPA-R
AUS-15   ERA5   evaluation   r1i1p1f1   BARPA-R
AUS-15   MPI-ESM1-2-HR   historical   r1i1p1f1   BARPA-R
AUS-15   EC-Earth3   historical   r1i1p1f1   BARPA-R
AUS-15   ACCESS-ESM1-5   historical   r6i1p1f1   BARPA-R


In [5]:
# Printing BARRA-R2 files for a given variable and time period
files = loaddata.get_barra2_files('BARRA-R2',
                     '1hr', 
                     'tasmean', 
                     tstart=dt(2010, 1, 5), 
                     tend=dt(2010, 10, 10))
print("\n".join(files))

# Do you know?
# help(loaddata.get_barra2_files)
# to see how to use this method

/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS-11_ERA5_historical_hres_BOM_BARRA-R2_v1_1hr_201001-201001.nc
/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS-11_ERA5_historical_hres_BOM_BARRA-R2_v1_1hr_201002-201002.nc
/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS-11_ERA5_historical_hres_BOM_BARRA-R2_v1_1hr_201003-201003.nc
/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS-11_ERA5_historical_hres_BOM_BARRA-R2_v1_1hr_201004-201004.nc
/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS-11_ERA5_historical_hres_BOM_BARRA-R2_v1_1hr_201005-201005.nc
/g/data/ob53/BARRA2/output/reanalysis/AUS-11/BOM/ERA5/historical/hres/BARRA-R2/v1/1hr/tasmean/v20231001/tasmean_AUS

In [6]:
# Listing BARPA-R files for a given BARPA-R experiment, for a given variable and time period
files = loaddata.get_barpa_files('BARPA-R', 
                    'ACCESS-CM2', 
                    'historical', 
                      'day', 
                     'pr', 
                     tstart=dt(2010, 1, 5), 
                     tend=dt(2011, 2, 10))
print("\n".join(files))

/g/data/py18/BARPA/output/CMIP6/DD/AUS-15/BOM/ACCESS-CM2/historical/r4i1p1f1/BARPA-R/v1-r1/day/pr/v20231001/pr_AUS-15_ACCESS-CM2_historical_r4i1p1f1_BOM_BARPA-R_v1-r1_day_201001-201012.nc
/g/data/py18/BARPA/output/CMIP6/DD/AUS-15/BOM/ACCESS-CM2/historical/r4i1p1f1/BARPA-R/v1-r1/day/pr/v20231001/pr_AUS-15_ACCESS-CM2_historical_r4i1p1f1_BOM_BARPA-R_v1-r1_day_201101-201112.nc


In [7]:
# Loading BARRA-R2 data over some subdomain.
# The data is returned as xarray.Dataset object.
ds = loaddata.load_barra2_data('BARRA-R2',
                    'day', 
                    'pr', 
                     tstart=dt(2010, 1, 5), 
                     tend=dt(2014, 10, 10),
                    latrange=(-40,-30), lonrange=(110, 124))

print(ds)

<xarray.Dataset>
Dimensions:    (time: 1739, lat: 91, lon: 127, bnds: 2)
Coordinates:
  * time       (time) datetime64[ns] 2010-01-05T12:00:00 ... 2014-10-09T12:00:00
  * lat        (lat) float64 -39.93 -39.82 -39.71 -39.6 ... -30.25 -30.14 -30.03
  * lon        (lon) float64 110.0 110.1 110.3 110.4 ... 123.6 123.7 123.8 123.9
  * bnds       (bnds) float64 0.0 1.0
Data variables:
    pr         (time, lat, lon) float64 dask.array<chunksize=(27, 91, 127), meta=np.ndarray>
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(27, 2), meta=np.ndarray>
Attributes: (12/56)
    axiom_version:             0.1.0
    axiom_schemas_version:     0.1.0
    axiom_schema:              cordex-1D.json
    Conventions:               CF-1.10, ACDD-1.3
    activity_id:               reanalysis
    source:                    Data from Met Office Unified Model (UM) and Jo...
    ...                        ...
    creator_institution:       Bureau of Meteorology
    keywords:                  Ear

In [8]:
# Loading BARPA-R data over some subdomain.
# The data is returned as xarray.Dataset object.
ds = loaddata.load_barpa_data('BARPA-R', 
                    'ACCESS-CM2', 
                    'historical', 
                    'day', 
                    'pr', 
                     tstart=dt(2010, 1, 5), 
                     tend=dt(2014, 10, 10),
                    latrange=(-40,-30), lonrange=(110, 124))

print(ds)

<xarray.Dataset>
Dimensions:    (time: 1739, lat: 65, lon: 90, bnds: 2)
Coordinates:
  * time       (time) datetime64[ns] 2010-01-05T12:00:00 ... 2014-10-09T12:00:00
  * lat        (lat) float64 -39.98 -39.83 -39.67 -39.52 ... -30.4 -30.25 -30.09
  * lon        (lon) float64 110.1 110.3 110.4 110.6 ... 123.4 123.6 123.7 123.9
Dimensions without coordinates: bnds
Data variables:
    pr         (time, lat, lon) float64 dask.array<chunksize=(361, 65, 90), meta=np.ndarray>
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(361, 2), meta=np.ndarray>
Attributes: (12/57)
    axiom_version:             0.1.0
    axiom_schemas_version:     0.1.0
    axiom_schema:              cordex-1D.json
    Conventions:               CF-1.10, ACDD-1.3
    activity_id:               RCM
    title:                     Bureau of Meteorology Atmospheric Regional Pro...
    ...                        ...
    creator_institution:       Bureau of Meteorology
    keywords:                  Earth Scien

In [9]:
# But what is the variable pr? 
_ = loaddata.whatis('1hr', 'pr')

Short name: pr
long_name: Precipitation
standard_name: precipitation_flux
units: kg m-2 s-1
cell_methods: time: mean (interval: 1 hour)


In [10]:
# Repeating the same but for a variable in BARPA experiment
# Generally the meaning will be the same between BARRA2 and BARPA
_ = loaddata.whatis('1hr', 'tas', model="BARPA")

Short name: tas
long_name: Near-Surface Air Temperature
standard_name: air_temperature
units: K
cell_methods: time: point (interval: 1H)


In [11]:
# How do I know what time frequency is available for a given experiment? 
# For BARRA-R2...
_ = loaddata.list_barra2_freqs('BARRA-R2')

fx, mon, 3hr, 1hr, day


In [12]:
# Then we can drill down to see what variable are available for this time frequency
_ = loaddata.list_barra2_variables('BARRA-R2', '1hr')

rsut, rluscs, ta1500m, va400, pr, ta200m, ta700, clwvi, hus500, tas, uasmean, mrsos, ua400, zg925, ta250m, rlus, prc, va150m, ua300, zg300, zg400, zg850, ua850, va250m, va200m, ta600, rsdscs, wa300, ua50m, hus200, ps, ta50m, omega500, prsn, va600, tasmean, cll, hus400, vasmean, wa700, wsgsmax, uasmax, vasmax, ua250m, CAPE, va850, ua700, ua1000, hus1000, ua200m, va700, hus950, ua150m, ua1500m, prw, va300, wa1000, ua600, ua100m, ua925, zg200, wa600, va100m, va200, rsutcs, rsds, evspsblpot, hus600, zg1000, ua500, sfcWind, mrfsos, hfls, ta850, ta100m, ua200, rsdt, clh, va925, zg600, hus300, zg500, hurs, psl, ts, vas, huss, rldscs, ta925, ta150m, wa500, va1500m, uas, hus700, ta500, zg700, clivi, wa200, va50m, ta200, rlutcs, tasmin, ta300, clm, CIN, ta400, rlds, wa400, zmla, va1000, rsdsdir, rsuscs, hfss, rlut, wa850, ta1000, va500, clt, ta950, wa925, tasmax, hus850, rsus, hus925


In [13]:
# But what is the variable hfss? 
_ = loaddata.whatis('1hr', 'hfss')

Short name: hfss
long_name: Surface Upward Sensible Heat Flux
standard_name: surface_upward_sensible_heat_flux
units: W m-2
cell_methods: time: mean (interval: 1 hour) time: mean (interval: 1H)


In [14]:
# There is also few static variables!
_ = loaddata.list_barra2_variables('BARRA-R2', 'fx')

orog, sftlf


In [15]:
# Repeating the same for BARPA
_ = loaddata.list_barpa_freqs('BARPA-R', 'ERA5', 'evaluation')

day, fx, 1hr, mon, 6hr


In [16]:
# What variables in the BARPA-R experiment?
_ = loaddata.list_barpa_variables('BARPA-R', 'ERA5', 'evaluation', 'mon')

snm, ta600, hus300, ta50, hus850, tsl, ua250m, rsut, ta10, ta50m, ta200m, rsds, ua20, wa150, ta100m, va850, va925, mrsos, hfss, ta500, rluscs, zg600, va100, ua1500m, ta150m, clivi, zmla, wa30, clm, va100m, sund, sfcWind, zg100, wa250, va150m, ua500, ua700, ta200, ta30, rlus, hus20, ta400, tasmax, ts, ta100, wa10, va250, hus500, zg400, va30, ua100, ua850, tauu, va1000, hfls, ua200, prhmax, ta300, rsuscs, zg20, clh, hus30, zg200, ua300, rsdscs, wa1000, ua400, va300, ua1000, hus150, wa50, mrfso, va70, wa20, ta1000, va10, rldscs, rsutcs, zg70, prw, uas, wsgsmax, va20, ta20, CAPE, hus10, rsdsdir, hus600, mrfsol, hus700, mrros, psl, va500, evspsblpot, va400, hus250, ua10, wa100, rlut, tas, zg1000, rlutcs, va50m, ta250m, mrso, wa600, cll, hus1000, ta1500m, wa700, zg30, mrsol, ps, wa400, ua50m, zg300, rlds, ua150m, ta150, wa200, zg700, hus70, hus100, va700, va1500m, hus50, omega500, zg250, zg925, va250m, wa500, va200, ua50, ua70, zg10, hus925, tauv, hus400, ua600, tasmin, ta70, ua150, wa300, p

In [17]:
# So what is this variable?
_ = loaddata.whatis('mon', 'mrfso')

Short name: mrfso
long_name: Soil Frozen Water Content
standard_name: soil_frozen_water_content
units: kg m-2
cell_methods: time: point (interval: 3H) depth: sum time: mean (interval: 1M)


In [18]:
# Getting more information about how to use a method...
help(loaddata.list_barpa_variables)

Help on function list_barpa_variables in module loaddata:

list_barpa_variables(rcm, gcm, scenario, freq)
    Prints a listing of the variables available for BARPA model.
    
    Parameters:
        rcm (str): Regional model, e.g. BARPA-R, BARPA-C
        gcm (str): Driving GCM name, e.g. ACCESS-CM2
        scenario (str): GCM experiment, e.g. historical, ssp370, ssp126, evaluation
        freq (str): Time frequency of data, e.g., 1hr, day, mon

