### Fire history
First lets ge the MTBS data and beat that into shape. Data for he Trinity and South Fork Trinity basins was selected using the MTBS interactive viewer ([https://www.mtbs.gov/viewer/index.html](https://www.mtbs.gov/viewer/index.html)) on 2023-10-17 At about 10:38 a.m. After recieving the download link the zipped data was downloaded using wget to `/media/storage/MTBS/Trinity_and_S_Trinity` directory and unzipped resulting in a directory, `mtbs`, full of directories for years ranging from  1985 and 2021.  Within the directory for a given year there are boundaries and burn indices for each fire attributed to that year. 

In [2]:
from datetime import datetime
from pathlib import Path


import pandas as pd
import geopandas as gpd
import xarray as xr
import rioxarray

import warnings
warnings.filterwarnings('ignore')

We will only look at the fires within the trinity river basin

In [3]:
# Trinity River basin as AOI
aoi_path = Path('/media/storage/TrinityCounty/Trinity_county_boundary_26910.gpkg')
aoi = gpd.read_file(aoi_path)
aoi_poly = aoi.geometry.values[0]

# get crs (26910)
crs = aoi.crs

# path to root dir of fire data
mtbs_dir = Path('/media/storage/MTBS/Trinity_and_S_Trinity/mtbs/')

In [4]:
# glob needed files
boundary_files = mtbs_dir.rglob('*_burn_bndy.shp')
burn_files = list(mtbs_dir.rglob('*_dnbr.tif'))

# read all shapes intersecting AOI into gdf, and associate dnbr tifs
df_list = []
for bound_path in boundary_files:
    df = gpd.read_file(bound_path).to_crs(crs)
    if df.geometry.intersects(aoi)[0]:
        event_id, pre_date, post_date, _, _ = bound_path.stem.split('_')
        df['tif_path'] = [str(tif) for tif in burn_files if event_id in str(tif)][0]
        df['pre_date'], df['post_date'] = pre_date, post_date
        df_list.append(df)

fires = pd.concat(df_list)

In [5]:
fires.head()

Unnamed: 0,Event_ID,irwinID,Incid_Name,Incid_Type,Map_ID,Map_Prog,Asmnt_Type,BurnBndAc,BurnBndLat,BurnBndLon,...,NoData_T,IncGreen_T,Low_T,Mod_T,High_T,Comment,geometry,tif_path,pre_date,post_date
0,CA4079212335020120711,,FLAT,Wildfire,829,MTBS,Extended,1833,40.792,-123.335,...,-970,-150,100,332,600,,"POLYGON ((471448.099 4517290.303, 471465.694 4...",/media/storage/MTBS/Trinity_and_S_Trinity/mtbs...,20110723,20130712
0,CA4054612309020120905,,STAFFORD,Wildfire,839,MTBS,Extended,4509,40.541,-123.107,...,-970,-150,120,328,580,,"POLYGON ((489114.061 4485768.818, 489099.707 4...",/media/storage/MTBS/Trinity_and_S_Trinity/mtbs...,20110723,20130712
0,CA4061012305619920820,,BARKER,Wildfire,7749,MTBS,Extended,5276,40.61,-123.053,...,-970,-150,100,358,650,Barker,"POLYGON ((491763.775 4493399.286, 491980.594 4...",/media/storage/MTBS/Trinity_and_S_Trinity/mtbs...,19920803,19930806
0,CA4060412308120150731,F01696EE-036C-40D4-AF42-B39C440AAB84,BARKER,Wildfire,25326,MTBS,Extended,3943,40.628,-123.102,...,-970,-150,50,265,510,,"POLYGON ((492882.683 4496886.854, 492884.745 4...",/media/storage/MTBS/Trinity_and_S_Trinity/mtbs...,20150616,20160602
0,CA4069112352420150610,1166A376-514D-4391-AC77-52D9A3914510,SADDLE,Wildfire,25324,MTBS,Extended,1923,40.687,-123.534,...,-970,-150,50,259,500,,"POLYGON ((452982.277 4504148.656, 452969.732 4...",/media/storage/MTBS/Trinity_and_S_Trinity/mtbs...,20140706,20160711


In [6]:
# save as geoparquet
parq_path = mtbs_dir / 'trinity_basin_fires_1885-2021.parquet'
fires.to_parquet(parq_path)

### Climate data
Now lets get the climate data.  It has already been downloaded and unzipped.

The directory `/media/storage/CBCM` contains California Basin Characterization Model data downloaded from
[https://www.sciencebase.gov/catalog/item/5f29c62d82cef313ed9edb39](https://www.sciencebase.gov/catalog/item/5f29c62d82cef313ed9edb39).  
The following files were downloaded on 2023-09-27 at about 9:40:

	aet_WY1990_99.zip
	aet_WY2000_09.zip
	aet_WY2010_20.zip
	str_WY1990_99.zip
	str_WY2000_09.zip
	str_WY2010_20.zip

and the following files were downloaded 2023-10-17 at about 10:20:

	cwd_WY1990_99.zip
	cwd_WY2000_09.zip
	cwd_WY2010_20.zip

+ Zips were each unzipped into a directory of the same name.
+ The directories were then entered and a directory called `tifs` was created within.
+ The .asc files were then converted to tifs.

Prefixes have the following menings:
 
`aet` : monthly actual evpotranspiration  
`str` : monthly soil storage  
`cwd` : montly climatic water deficit  

Here is an example workflow:
```
mkdir aet_WY2000_09
unzip aet_WY2000_09.zip -d aet_WY2000_09
cd aet_WY2000_09
mkdir tifs
ls *.asc | parallel --progress gdal_translate -of GTiff -co "TILED=YES" -a_srs EPSG:3310 {} tifs/{.}.tif
```

We will once again need to crawl around inside of the directory tree globbing things.


In [7]:
# path to root dir of climate data
cbcm_dir = Path('/media/storage/CBCM')


The dates are present in the file names, which are formatted like so, `aet1993feb.tif`.  We need to change the thee letter month to a number.

In [8]:
# will need for date format
def m2n(m):
    '''changes three letter month ton number'''
    months = {
        'jan': 1,
        'feb': 2,
        'mar': 3,
        'apr':4,
         'may':5,
         'jun':6,
         'jul':7,
         'aug':8,
         'sep':9,
         'oct':10,
         'nov':11,
         'dec':12
        }
    n = months[m]
    return n

Now we can make a big data cube of climate data. Resolution is 270 m, so it should fit in memory.

In [9]:
# open file and reproject as template
template_path = next(cbcm_dir.rglob('aet*.tif'))
template = rioxarray.open_rasterio(template_path).rio.reproject(crs).rio.clip(aoi.geometry.values)

# glob needed files
aet_files = cbcm_dir.rglob('aet*.tif')
str_files = cbcm_dir.rglob('str*.tif')
cwd_files = cbcm_dir.rglob('cwd*.tif')

# stack aet by time
lyrs = []
for aet in list(aet_files):
    # get date
    yyyy = aet.stem.lstrip('aet')[:4]
    m = aet.stem.lstrip('aet')[4:]
    yyyymm = [pd.Period(f'{yyyy}-{m2n(m)}').to_timestamp()]
    
    pre = rioxarray.open_rasterio(aet).squeeze(dim='band')
    lyrs.append(pre.rio.reproject_match(template).expand_dims(time=yyyymm))
aet_temporal_cube = xr.concat(lyrs, dim='time').sortby('time')

# stack str by time
lyrs = []
for str_ in list(str_files):
    # get date
    yyyy = str_.stem.lstrip('str')[:4]
    m = str_.stem.lstrip('str')[4:]
    yyyymm = [pd.Period(f'{yyyy}-{m2n(m)}').to_timestamp()]
    
    pre = rioxarray.open_rasterio(str_).squeeze(dim='band')
    lyrs.append(pre.rio.reproject_match(template).expand_dims(time=yyyymm))
str_temporal_cube = xr.concat(lyrs, dim='time').sortby('time')

# stack cwd by time
lyrs = []
for cwd in list(cwd_files):
    # get date
    yyyy = cwd.stem.lstrip('cwd')[:4]
    m = cwd.stem.lstrip('cwd')[4:]
    yyyymm = [pd.Period(f'{yyyy}-{m2n(m)}').to_timestamp()]
    
    pre = rioxarray.open_rasterio(cwd).squeeze(dim='band')
    lyrs.append(pre.rio.reproject_match(template).expand_dims(time=yyyymm))
cwd_temporal_cube = xr.concat(lyrs, dim='time').sortby('time')

In [10]:
ds = xr.Dataset({
    'AET': aet_temporal_cube,
    'STR': str_temporal_cube,
    'CWD': cwd_temporal_cube
})

In [11]:
ds

In [12]:
ds.time

As can be seen above the datacube  has 372 monthly entries.  Lets save it as a netCDF so we won't have to create it again.

In [13]:
# start and end months
start = ''.join(str(ds.time.min().values).split('-')[:2])
end = ''.join(str(ds.time.max().values).split('-')[:2])

# path to netcdf 
netcdf_path = cbcm_dir / f'cbcm_{start}_{end}.nc'

# save
ds.to_netcdf(netcdf_path)