## Retrieval of PRISM Precipitation Data for a Watershed of Interest

**Authors**:  
- Irene Garousi-Nejad <igarousi@cuahsi.org>

**Last Updated**: 05.08.2023

**Description**: 
    
This notebook fetches PRISM precipitation data and extracts a subset for a specific region using the spatial extent provided by the watershed shapefile. The output will be a CSV file containing the monthly normal precipitation data, with a spatial resolution of 800, averaged spatially across the watershed.

---

TODO: update the image

![watershed](https://www.hydroshare.org/resource/b1379f00121e456f958f9e22e913aa8a/data/contents/case-study-logan-river-watershed.png)

In [47]:
# install the following libraries
# make sure the kernel is set to conda env: iguide
!pip install cartopy --quiet
!pip install rasterstats --quiet
!pip install geopandas --quiet

In [None]:
# import libraries
import os
import pandas as pd
import subprocess
import rasterio as rio
from rasterstats import zonal_stats
from geopandas import GeoSeries, GeoDataFrame, read_file, gpd
import matplotlib.pyplot as plt

import cartopy.crs as ccrs
from shapely.geometry import MultiPolygon
from cartopy.io.shapereader import Reader
from cartopy.feature import ShapelyFeature

import matplotlib.pyplot as plt
from matplotlib import colors

## 1. Load the Watershed of Interest 

In [None]:
# load the watershed
mp = MultiPolygon(Reader('./GISBasins/WeberRiverBasin.shp').geometries())

# read the geometries for plotting
shape_feature = ShapelyFeature(mp.geoms,
                                ccrs.PlateCarree(), facecolor='none')

# visualize data on the map
plt.figure(figsize=(10, 10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()

shape_feature = ShapelyFeature(mp.geoms,
                                ccrs.PlateCarree(), facecolor='none')
ax.add_feature(shape_feature, zorder=1)

gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--');

# modify the x and y limits based on the watershed's bounding box information
ax.set_ylim([40.5, 41.5]);
ax.set_xlim([-112.35, -110.75]);
ax.set_aspect('equal');
ax.coastlines();

The watershed of interest contains seven HUC8 catchments. If we apply the `zonal_stats` function from the `rasterstats` library to this shapefile containing multiple geometries, the function calculates statistics separately for each individual geometry. However, in our case, we want to compute statistics for the entire watershed rather than for each individual catchment. To achieve this, we need to dissolve the multiple catchments into a single feature. By dissolving the geometries, we will merge them together to create a single polygon representing the entire watershed. This will allow us to calculate the desired statistics for the watershed as a whole.

In [5]:
# read data as a dataframe
watershed = read_file('./GISBasins/WeberRiverBasin.shp')

# add a column with a constant value that will be used to dissolve the shapefile
watershed['temp']=1

# dissolve
watershed_dis = watershed.dissolve(by = 'temp', aggfunc = 'sum')

In [None]:
watershed_dis

In [None]:
# read the geometries for plotting
shape_feature = ShapelyFeature(watershed_dis.geometry,
                                ccrs.PlateCarree(), facecolor='none')

# visualize data on the map
plt.figure(figsize=(10, 10))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.set_global()

shape_feature = ShapelyFeature( watershed_dis.geometry,
                                ccrs.PlateCarree(), facecolor='none')
ax.add_feature(shape_feature, zorder=1)

gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True,
                  linewidth=2, color='gray', alpha=0.5, linestyle='--');

# modify the x and y limits based on the watershed's bounding box information
ax.set_ylim([40.5, 41.5]);
ax.set_xlim([-112.35, -110.75]);
ax.set_aspect('equal');
ax.coastlines();

In [None]:
!gdalinfo ./GISBasins/WeberRiverBasin.shp

## 2. Download PRISM Monthly Normals Precipitation Data

The PRISM web service provides a single file (i.e., grids in BIL format) per request. We will run the following bash script to perform a bulk download of multiple grid files. This downloads PRISM precipitation data (`ppt`) and saves these files into PRISM_monthly_normals. The results are *_bil.zip. 

Create a folder

In [None]:
%%bash

# define the folder name
folder="./PRISM_monthly_normals"

# check if the folder already exists or not
if [ ! -d "$folder" ]; then
    mkdir -p "$folder"
    echo "Directory created: $folder"
else
    echo "Directory already exists: $folder"
fi


#### Print the data links

In [None]:
%%bash

for m in {01..12};do
    echo https://ftp.prism.oregonstate.edu/normals_800m/ppt/PRISM_ppt_30yr_normal_800mM4_${m}_bil.zip
done

In [None]:
### TODO: work on this code.
### For some reason it does not stop.

# for some reason the following code never stops.
# %%bash

# m=1
# while [[ $m -le 12 ]]; do
#     month=$(printf "%02d" "$m")  # Format month with leading zero if needed
#     echo "Downloading data for Month: $month"
#     url="https://ftp.prism.oregonstate.edu/normals_800m/ppt/PRISM_ppt_30yr_normal_800mM4_${month}_bil.zip"
    
#     # Use 'wget' to download the file using the generated URL
#     wget "$url" -P ./PRISM_monthly_normals
    
#     sleep 4
    
#     # Increment the month
#     m=$((m+1))
# done

In [None]:
# try downloading each month manually
!wget https://ftp.prism.oregonstate.edu/normals_800m/ppt/PRISM_ppt_30yr_normal_800mM4_01_bil.zip -P ./PRISM_monthly_normals

#### Unzip files

In [9]:
%%bash
folder="./PRISM_monthly_normals"

for file in "$folder"/*.zip; do
    python -c "import zipfile; zipfile.ZipFile('$file', 'r').extractall('$folder')"  # unzip is not avail
done

#### Visualize one file

In [None]:
# Use rasterio to import the data as img
with rio.open("./PRISM_monthly_normals/PRISM_ppt_30yr_normal_800mM4_01_bil.bil") as src:
    boundary = src.bounds
    img_precip = src.read()
    nodata = src.nodata

print(img_precip[0].min(), img_precip[0].max())
x1=((img_precip[0].max())-0)/5
x2=x1*2
x3=x1*3
x4=x1*4
print(x1, x2, x3, x4)
    
# plot
plt.figure(figsize=(20,8))
plt.title("Precipitation", size=16)
cmap = colors.ListedColormap(['cyan', 'skyblue', 'deepskyblue', 'royalblue', 'navy'])
cmap.set_under('w')
# bounds=[0, x1, x2, x3, x4, img_precip[0].max()]
bounds=[0, 50, 100, 200, 600, img_precip[0].max()]
norm = colors.BoundaryNorm(bounds, cmap.N)
imgplot = plt.imshow(img_precip[0], cmap=cmap, norm=norm) 
cbar = plt.colorbar()
cbar.set_label('Precipitation (mm)', size=16)

plt.show()

## 3. Use Consistent Projectinos

As is often the case with GIS, there is a need to have consistent projections. The following GDAL command examines the projection of this precipitation data. Note that this data is not yet projected, and it has only a geographic coordinate system. That is why the UNIT is "Degree", and the Pixel Size is 0.008333333333333 degree.

In [None]:
# Check the stats of the PRISM data
!gdalinfo -stats ./PRISM_monthly_normals/PRISM_ppt_30yr_normal_800mM4_01_bil.bil 

In [None]:
# Examine the projection of the watershed
!ogrinfo -al ./GISBasins/WeberRiverBasin.shp

Use the information above in conjuction with the `gdalwrap` function to assign the projection of the shapefile to each of the PRISM files. 

In [None]:
# Get a list of all .bil files in the folder
bil_files = [file for file in os.listdir("./PRISM_monthly_normals") if file.endswith('.bil')]
bil_files

In [None]:
# make sure the following folders exist in the current working directory
input_folder = './PRISM_monthly_normals'  
output_folder = './PRISM_monthly_normals/outputs'

# Loop through each .bil file and use gdalwrap to covert the projection
for bil_file in bil_files:
    
    # Specify the input and output file paths
    input_file = os.path.join(input_folder, bil_file)
    output_file = os.path.join(output_folder, bil_file)
    
    # Construct the gdalwarp command
    gdalwarp_cmd = f'gdalwarp -overwrite -t_srs EPSG:4269 {input_file} {output_file}'
    
    # Execute the gdalwarp command using subprocess
    subprocess.run(gdalwarp_cmd, shell=True)
    

## 4. Subset PRISM Data for the Watershed

Use `zonal_stat` to compute the statistics of the PRISM data clipped for the watershed boundary. Note that we are interested in the `mean` values. Create a dataframe that contains dates and spatially averaged daily precipitation values. 

In [None]:
# Get a list of all .bil files in the folder
bil_files = [file for file in os.listdir("./PRISM_monthly_normals/outputs") if file.endswith('.bil')]
print(bil_files)

month=[]
p=[]

# Loop through each .bil file
for bil_file in bil_files:

    stats=zonal_stats("./GISBasins/WeberRiverBasin.shp", f"./PRISM_monthly_normals/outputs/{bil_file}")
    
    month.append(int(bil_file.split("_")[-2]))
    
    p.append(stats[0]['mean'])
    
df = pd.DataFrame({'Month': month, 'Precipitation (mm)': p})

In [None]:
df

In [43]:
# sort the dataframe based on dates
df = df.sort_values(by='Month')

In [45]:
# save the dataframe as a CSV file
df.to_csv('./PRISM_monthly_normals/outputs/PRISM_Monthly_Normal_Precipitation.csv')

## 5. Plot the Precipitation Timeseries

In [None]:
fig, ax = plt.subplots(figsize=(10,5))
import matplotlib.pyplot as plt
ax.plot(df['Month'], df['Precipitation (mm)'], color='b')
ax.set_ylabel('Depth (mm)', size=18)
ax.tick_params(axis='y', labelsize=14)
ax.tick_params(axis='x', labelsize=14)
ax.set_title('PRISM monthly normals precipitation averaged across the watershed', size=16)
plt.show()