# Concatenation of output files

**Author:** Michael Needham, US EPA Region 7 Air and Radiation Division

**Contact:** needham.michael@epa.gov

**Description:** Auxiliary example showing basic concatenation of files.

<div class="alert alert-block alert-warning">
This notebook is provided for information purposes only as the necessary files to include for this example are too numerous to hold on github. 
</div>

In [3]:
import numpy as np
import xarray as xr
from pathlib import Path

In [19]:
from src.utils.cmaq import get_cmaq_metadata

In [18]:
case = 'DDM_2022_36US3.011'

data_dir = Path("/work/REGIONS/users/mneedham/cmaq/analysis/examples/example_data/")

files = sorted(data_dir.glob("*HR2DAY*"))
files = np.sort([x for x in files if case in x.name])

# Display information about the files
for f in files[:3]:
    print(f.name)
print("...")
for f in files[-3:]:
    print(f.name)

print(f"\nNUMBER OF FILES: {len(files)}")

DDM_2022_36US3.011.HR2DAY.2022120.nc
DDM_2022_36US3.011.HR2DAY.2022121.nc
DDM_2022_36US3.011.HR2DAY.2022122.nc
...
DDM_2022_36US3.011.HR2DAY.2022215.nc
DDM_2022_36US3.011.HR2DAY.2022216.nc
DDM_2022_36US3.011.HR2DAY.2022217.nc

NUMBER OF FILES: 98


In [6]:
# read in each file, format the metadata, and concat to a single xr dataset

dset_list = []
for file in files:
    dset_tmp, proj = get_cmaq_metadata(xr.open_dataset(file),return_proj=True)
    dset_list.append(dset_tmp)
    
dset = xr.concat(dset_list,dim='time').isel(LAY=0)

dset

In [8]:
print(f"Total size of the dataset: {dset.nbytes / 1e6} MB")

Total size of the dataset: 19.758744 MB


In [13]:
start_date = files[0].name.split(".")[-2]
end_date = files[-1].name.split(".")[-2]

print(f"{start_date=}")
print(f"{end_date=}")

start_date='2022120'
end_date='2022217'


In [20]:
fname = f"./tutorial_data/DDM_2022_36US3.011.HR2DAY.{start_date}-{end_date}.nc"
dset.to_netcdf(fname)