# San Joaquin Valley Township Precipitation Data and Stations

Related links:

* For the documentation about this dataset, its source, how to download, and the features of interest, please refer to our [Well Completion Reports Dataset](/doc/assets/precipitation.md) documentation.
* For the explanations about the Public Land Survey System Township Range, please refer to our [Public Land Survey System](../assets/plss_sanjoaquin_riverbasin.md) documentation.
* For the explanations on how we transform point measurements of precipitation by weather stations into township precipitation estimates, please refer to our [Transforming Point Values into Township Values](doc/etl/from_point_to_region_values.md) documentation.


In [17]:
import sys
sys.path.append('..')

In [18]:
import matplotlib.pyplot as plt
import altair as alt
from lib.precipitation_v2 import PrecipitationDataset
from lib.viz_matplotlib import plot_townships_feature_per_year

By initializing the PrecipitationDataset class, it automatically:
* scraps the web to retrieve the precipitation data for the state of California (default is between 2013 and 2022)
* scraps the web to retrieve the geospatial data of the precipitation stations in California

In [None]:
precipitation_dataset = PrecipitationDataset()

Here is an overview of the monthly precipitation data scrapped from the web.

In [None]:
precipitation_dataset.data_df

By default the precipitation data are collected for the following years:

In [None]:
list(precipitation_dataset.data_df.YEAR.unique())

Here is an overview of the precipitation stations geospatial data scrapped from the web.

In [None]:
precipitation_dataset.map_df

In [None]:
precipitation_dataset.preprocess_map_df()
precipitation_dataset.merge_map_with_data("inner", dropna=True)
precipitation_dataset.map_df

Let's look at California's precipitation recording station locations compared to the San Joaquin Valley townships for the year 2021

In [None]:
fig, ax = plt.subplots(figsize=(30,30))
precipitation_dataset.ca_boundaries.plot(ax=ax, facecolor="none", edgecolor='black', linewidth = 1, cmap=None, legend=None)
precipitation_dataset.sjv_boundaries.plot(ax=ax, facecolor="grey", edgecolor='black', linewidth = 1, cmap=None, legend=None)
precipitation_dataset.map_df[precipitation_dataset.map_df["YEAR"]==2021].plot(ax=ax, edgecolor='black', linewidth = 1, cmap="rainbow", legend=True)
plt.show()

Next, based on the precipitation station points, we compute Voronoi Diagram with the Thiessen Polygon for each station

In [None]:
precipitation_dataset.compute_areas_from_points()

In [None]:
fig, ax = plt.subplots(figsize=(30,30))
precipitation_dataset.map_df[precipitation_dataset.map_df["YEAR"]==2021].plot(ax=ax, column="AVERAGE_YEARLY_PRECIPITATION",
                                                                              edgecolor='black', linewidth = 1, cmap="Blues", legend=True)
precipitation_dataset.map_df[precipitation_dataset.map_df["YEAR"]==2021].points.plot(ax=ax, facecolor="black", edgecolor='black', linewidth = 1)
precipitation_dataset.ca_boundaries.plot(ax=ax, facecolor="none", edgecolor='black', linewidth = 1, cmap=None, legend=None)
plt.show()

Then we clip the data to the San Joaquin Valley boundaries and overlay the Township boundaries

In [None]:
precipitation_dataset.overlay_township_boundries()
precipitation_dataset.map_df

In [None]:
fig, ax = plt.subplots(figsize=(30,30))
precipitation_dataset.map_df[precipitation_dataset.map_df["YEAR"]==2021].plot(ax=ax, column="AVERAGE_YEARLY_PRECIPITATION",
                                                                              edgecolor='black', linewidth = 1, cmap="Blues", legend=True)
precipitation_dataset.map_df[precipitation_dataset.map_df["YEAR"]==2021].points.plot(ax=ax, facecolor="black", edgecolor='black', linewidth = 1)
plt.show()

Because of the way we [transform point measurements of precipitation by weather stations into township precipitation estimates](doc/etl/from_point_to_region_values.md), although there are 16 weather stations in the San Joaquin Valley, a total of 33 stations are use to estimate the county-townships precipitations. Looking at their value average yearly precipitation measurement per year, we get.

In [None]:
station_precipitation_per_year_df = precipitation_dataset.map_df[["YEAR", "AVERAGE_YEARLY_PRECIPITATION", "STATION_NAME"]].drop_duplicates()
alt.Chart(station_precipitation_per_year_df).mark_bar().encode( x="YEAR:N",y='AVERAGE_YEARLY_PRECIPITATION:Q', color='STATION_NAME:N')

We then compute the precipitation value at the Township level. As some townships cross several Voronoïd areas, for every year, for every Township we take the mean of the values of the Voronoïd areas crossing that Township.

In [None]:
precipitation_dataset.aggregate_feature_at_township_level(group_by_features=["TOWNSHIP", "YEAR"],
                                                          feature_to_aggregate_on="AVERAGE_YEARLY_PRECIPITATION")
precipitation_dataset.map_df

In [None]:
plot_townships_feature_per_year(precipitation_dataset.map_df, feature_name="AVERAGE_YEARLY_PRECIPITATION", cmap="Blues")

The dataset is ready for output

In [None]:
precipitation_dataset.prepare_output_from_map_df()
precipitation_dataset.output_df.to_csv("../assets/outputs/precipitations.csv")
precipitation_dataset.output_df