# Trying to bring all the different datasets together for analysis. 
Attempt to use weighed computations in xarray? Or resize/resample so that the soil grids data is the base size (250m x 250m). 

In [None]:
import xarray as xr
import rioxarray as rxr
import requests
import zipfile
import io
import datetime 
import os
import earthaccess


## Constants and shared variables

In [None]:
today = datetime.date.today()
yesterday = today - datetime.timedelta(days=1)
date = yesterday.strftime("%Y%m%d")
print(date)

## Fetch Data
Fetch Prism Data for yesterday. Note that PRISM day ends at 12:00 GMT, or 7:00 am Eastern Time. 

Prism servers their files as Cloud optimized GeoTiffs now, but they are given as zip files, so you can't stream them directly into rioxarray using open_rasterio(url). It needs to be downloaded, as I have done here. The zip file is loaded into memory, and then the .tif file is taken from memory and saved to disk as a temporary local file. Then the path is given to open_rasterio. 

In [None]:
url_ppt = f"https://services.nacse.org/prism/data/get/us/800m/ppt/{date}"
url_tmean = f"https://services.nacse.org/prism/data/get/us/800m/tmean/{date}"

In [None]:
response_ppt = requests.get(url_ppt)
response_ppt.raise_for_status()

In [None]:
with zipfile.ZipFile(io.BytesIO(response_ppt.content)) as z:
    ppt_filename = [f for f in z.namelist() if f.endswith(".tif")][0]
    with z.open(ppt_filename) as ppt_file:
        with open(ppt_filename, "wb") as f:
            f.write(ppt_file.read())

In [None]:
response_tmean = requests.get(url_tmean)
response_tmean.raise_for_status()

In [None]:
with zipfile.ZipFile(io.BytesIO(response_tmean.content)) as z:
    tmean_filename = [f for f in z.namelist() if f.endswith(".tif")][0]
    with z.open(tmean_filename) as tmean_file:
        with open(tmean_filename, "wb") as f:
            f.write(tmean_file.read())

In [None]:
ds_prism_ppt = rxr.open_rasterio(ppt_filename, masked=True)
ds_prism_tmean = rxr.open_rasterio(tmean_filename, masked=True)


In [None]:
# Try xr.align(dataset_1, dataset_2, join="exact") to check if the grids are aligned correctly. I think it will throw and error if not. From Xarray in 45 min tutorial


Remove the local file after it has been saved elsewhere. Could also look into python package tempfile to automatically handle this. 

In [None]:
os.remove(ppt_filename)
os.remove(tmean_filename)


## SoilGrids

Load Soil grids processed dataset. 

In [None]:
ds_soil = xr.open_dataset("data/soils/soil_ds_5070.nc")

## SMAP for baseline soil moisture

Download and load into dataarray latest SMAP data for conUS. It looks like SPL4SMGP is actually a full coverage product, and updated every 3 hours. So even though the temporal latency is relatively high, I should be able to get fairly well updated data.  Earth Access seems to be promising but it is not fully documented yet. 

In [None]:
today = datetime.datetime.today() 
start_date = today - datetime.timedelta(days=4)
start_date = start_date.strftime("%Y-%m-%dT%H")
end_date = today.strftime("%Y-%m-%dT%H") 

sort_key="-end_date" sorts them in descending order of end date. This allows me to get the latest. Since the data is 3-hourly, I need to provide the hour in my end_Date request too. This will make sure I get more than the 9pm-midnight UTC window. 

In [None]:
auth = earthaccess.login(strategy="netrc")

granules = earthaccess.search_data(
    short_name='SPL4SMGP',
    version='008',
    daac='NSIDC',
    provider='NSIDC_ECS',
    doi='10.5067/T5RUATAQREF8',
    bounding_box=(-126, 24, -65, 50),
    temporal=(f"{start_date}", f"{end_date}"),
    sort_key="-end_date",
)

In [None]:
fs = earthaccess.get_fsspec_https_session()
url = granules[0].data_links()[0] 
with fs.open(url, mode="rb") as f:
    ds_smap = xr.open_dataset(f, engine="h5netcdf", group='Geophysical_Data')
    ds_vars_smap = ds_smap[["sm_surface_wetness", "vegetation_greenness_fraction", "surface_temp"]]
    ds_vars_smap.load()

Need to use the .load() method to load the object into memory before leaving the with statement. Otherwise you get the error I/O on closed file. I've selected the variables I want before loading to manage the size of memory taken up. 

If need metadata or other info about the smap data, need to open the root level of the h5 file by removing group='Geophysical_Data'. The root level includes all of the attributes and metadata. 

In [None]:
ds_vars_smap = ds_smap[["sm_surface_wetness", "vegetation_greenness_fraction", "surface_temp"]]

## Merge all the dataarrays together 

Get all of the dataarrays into one dataset with the same CRS and grid. I think the best option for my use case will be upsampling to the highest spatial resolution (250m x 250m of soilgrids) so that I can have the most cells overlapping with trail vectors + bufffer. Use rio.reproject_match(ds_soil) on the other ds to get them all into the same grid size. Then use xr.merge()