# San Joaquin Valley Water Shortages
This notebook performs the ETL of the California water reservoir datasets in order to compute a percentage of capacity of the reservoirs per Township-Range and per year.

Related links:
* For the documentation about this dataset, its source, how to download, and the features of interest, please refer to our [Water Shortage Reports Dataset](doc/assets/shortage.md) documentation.

In [1]:
import sys
sys.path.append('..')

In [2]:
import numpy as np
import altair as alt

from lib.shortage import ShortageReportsDataset
from lib.viz import draw_missing_data_chart, view_trs_side_by_side, display_data_on_map

In [3]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [4]:
shortage = ShortageReportsDataset()

Loading local datasets. Please wait...
Data not found locally.
Downloading the water shortage reports dataset. Please wait...
Downloads complete.
Loading local datasets. Please wait...
Loading of datasets complete.


In [5]:
print(shortage.map_df.shape)
shortage.map_df.sample(4)

(4792, 10)


Unnamed: 0,REPORT_DATE,COUNTY,LATITUDE,LONGITUDE,STATUS,SHORTAGE_TYPE,PRIMARY_USAGES,YEAR,MONTH,geometry
4469,SHORTAGE_REPORTED_2022-07-18,Fresno,36.71677,-119.54044,Interim Solution,Dry well (groundwater),Household,2022,7,POINT (-119.54044 36.71677)
2358,SHORTAGE_REPORTED_2016-05-23,Tulare,36.098636,-119.107032,Interim Solution,Dry well (groundwater),Household,2016,5,POINT (-119.10703 36.09864)
4788,SHORTAGE_REPORTED_2022-08-23,Santa Clara,37.128409,-121.705521,Undefined,Dry well (groundwater),Household,2022,8,POINT (-121.70552 37.12841)
3887,SHORTAGE_REPORTED_2022-02-18,Fresno,36.790898,-119.897879,Undefined,Dry well (groundwater),Household,2022,2,POINT (-119.89788 36.79090)


Pre-process the shortage report data, geospatial location and combine them into the the geospatial map_df dataset and overlay the San Joaquin Valley boundaries to keep only the data in the San Joaquin Valley

In [6]:
shortage.preprocess_map_df(features_to_keep= ['REPORT_DATE', 'COUNTY', 'LATITUDE', 'LONGITUDE', 'STATUS', 'SHORTAGE_TYPE', 'PRIMARY_USAGES', "YEAR", "MONTH" , "geometry"])
shortage.keep_only_sjv_data()
shortage.map_df

Unnamed: 0,REPORT_DATE,COUNTY,LATITUDE,LONGITUDE,STATUS,SHORTAGE_TYPE,PRIMARY_USAGES,YEAR,MONTH,geometry
0,SHORTAGE_REPORTED_2017-10-31,Kern,35.280117,-118.896376,Undefined,Dry well (groundwater),Household,2017,10,POINT (-118.89638 35.28012)
1,SHORTAGE_REPORTED_2018-06-01,Kern,35.290510,-118.627383,Undefined,Dry well (groundwater),Household,2018,6,POINT (-118.62738 35.29051)
2,SHORTAGE_REPORTED_2018-05-24,Kern,35.290852,-118.628775,Undefined,Owner of well will not fix problem with well,Household,2018,5,POINT (-118.62878 35.29085)
3,SHORTAGE_REPORTED_2019-05-29,Kern,35.304669,-118.914434,Undefined,"smell like oil, sand in water",Household,2019,5,POINT (-118.91443 35.30467)
4,SHORTAGE_REPORTED_2017-08-07,Kern,35.325274,-118.911520,Undefined,Dry well (groundwater),Household,2017,8,POINT (-118.91152 35.32527)
...,...,...,...,...,...,...,...,...,...,...
2571,SHORTAGE_REPORTED_2014-10-16,San Joaquin,37.827564,-120.937342,Undefined,Dry well (groundwater),Household,2014,10,POINT (-120.93734 37.82756)
2572,SHORTAGE_REPORTED_2014-10-13,San Joaquin,37.828512,-120.952090,Outage,Dry well (groundwater),Combination of Household/Agriculture,2014,10,POINT (-120.95209 37.82851)
2573,SHORTAGE_REPORTED_2014-10-16,San Joaquin,37.841925,-120.978688,Undefined,Dry well (groundwater),Household,2014,10,POINT (-120.97869 37.84193)
2574,SHORTAGE_REPORTED_2014-09-12,San Joaquin,37.842171,-120.976857,Outage,Dry well (groundwater),Household,2014,9,POINT (-120.97686 37.84217)


Look at missing data in the dataset

In [7]:
draw_missing_data_chart(shortage.map_df)

Map of the shortages in wells in the year 2015 in the San Joaquin Valley

In [8]:
display_data_on_map(shortage.map_df, feature="COUNTY", year=2015, color_scheme="tab10")


Next, for all Township-Ranges and every year:

1. Count the number of water shortage reports
2. Missing Township-Ranges had reports so we set their count of water shortage reports to 0

In [9]:
shortage.compute_features_by_township()
shortage.fill_townships_with_no_data(features_to_fill=["SHORTAGE_COUNT"], feature_value=0)
shortage.map_df

Unnamed: 0,TOWNSHIP_RANGE,YEAR,geometry,SHORTAGE_COUNT
0,T01N R07E,2021,"POLYGON ((-121.25785 37.88341, -121.25785 37.9...",1
1,T01N R08E,2021,"POLYGON ((-121.14913 37.88453, -121.14913 37.9...",1
2,T01S R08E,2015,"POLYGON ((-121.14772 37.88617, -121.03392 37.8...",2
3,T01S R08E,2021,"POLYGON ((-121.03392 37.88617, -121.03392 37.7...",4
4,T01S R09E,2014,"POLYGON ((-120.92335 37.88683, -120.92335 37.7...",4
...,...,...,...,...
473,T32S R26E,2019,"POLYGON ((-119.23510 35.09371, -119.23510 35.1...",0
474,T32S R27E,2019,"POLYGON ((-119.12837 35.09439, -119.12837 35.1...",0
475,T32S R28E,2019,"POLYGON ((-119.02170 35.09292, -119.02170 35.1...",0
476,T32S R29E,2019,"POLYGON ((-118.91470 35.09263, -118.91470 35.1...",0


## Visualizing the Normalized Well Count
### Comparison per year
Compute the WELL_COUNT normalized by year

In [10]:
normalized_df = shortage.return_yearly_normalized_township_feature(feature_name="SHORTAGE_COUNT", normalize_method = "minmax")
normalized_df

Unnamed: 0,TOWNSHIP_RANGE,YEAR,geometry,SHORTAGE_COUNT,SHORTAGE_COUNT_NORMALIZED
0,T01N R07E,2021,"POLYGON ((-121.25785 37.88341, -121.25785 37.9...",1,0.031250
1,T01N R08E,2021,"POLYGON ((-121.14913 37.88453, -121.14913 37.9...",1,0.031250
2,T01S R08E,2015,"POLYGON ((-121.14772 37.88617, -121.03392 37.8...",2,0.013423
3,T01S R08E,2021,"POLYGON ((-121.03392 37.88617, -121.03392 37.7...",4,0.125000
4,T01S R09E,2014,"POLYGON ((-120.92335 37.88683, -120.92335 37.7...",4,0.027586
...,...,...,...,...,...
473,T32S R26E,2019,"POLYGON ((-119.23510 35.09371, -119.23510 35.1...",0,0.000000
474,T32S R27E,2019,"POLYGON ((-119.12837 35.09439, -119.12837 35.1...",0,0.000000
475,T32S R28E,2019,"POLYGON ((-119.02170 35.09292, -119.02170 35.1...",0,0.000000
476,T32S R29E,2019,"POLYGON ((-118.91470 35.09263, -118.91470 35.1...",0,0.000000


In [11]:
view_trs_side_by_side(normalized_df, feature="YEAR", value="SHORTAGE_COUNT_NORMALIZED", title="Water Shortage Reports Count")

### Map visualization for 2017

In [12]:
display_data_on_map(normalized_df, feature="SHORTAGE_COUNT_NORMALIZED", year=2017)

### Map visualization for 2021

In [13]:
display_data_on_map(normalized_df, feature="SHORTAGE_COUNT_NORMALIZED", year=2021)

In [14]:
shortage.prepare_output_from_map_df()
shortage.output_dataset_to_csv(output_filename="../assets/outputs/shortage_reports.csv")