# San Joaquin Valley Township Crop Classes
Related links:
* For the documentation about these datasets, their source, how to download them, and the features of interest, please refer to our [Crops Datasets](doc/assets/crops.md) documentation
* For the explanations on how the crops mapping datasets are overlaid with township boundaries to obtain the amount of land used for each crop type in each township, please refer to our [Overlaying San Joaquin Valley Township Boundaries](doc/etl/township_overlay.md) documentation

__WARNING:__ The Crops datasets are made of ten of thousands of small fields. When overlaying the TRS Township boundaries on the Crops geospatial data it takes a lot of time to cut the fields and group them by Township boundaries.


In [1]:
# For Deepnote to be able to use the custom libraries in the parent ../lib folder
import sys
sys.path.append('..')

In [2]:
import matplotlib.pyplot as plt
from lib.crops import CropsDataset

Load the data by instantiating the CropsDataset class based on
* the geospatial map data for the years 2014, 2016 and 2018,
* the soil CSV dataset and the GeoJSON map data of the San Joaquin Valley

__Note:__ If the data are not available locally, they will be downloaded from their source first, which can take some time.

In [3]:
crops_dataset = CropsDataset()

Loading local datasets. Please wait...
Loading of datasets complete.


Pre-process the crops dataset to keep only the selected features for the final analysis.

In [4]:
crops_dataset.preprocess_map_df(features_to_keep = ["YEAR", "CROP_TYPE", "geometry"])

In [5]:
crops_dataset.map_df

Unnamed: 0,YEAR,CROP_TYPE,geometry
0,2014,T,"POLYGON ((-121.80922 36.88657, -121.80976 36.8..."
1,2014,D,"POLYGON ((-119.44511 36.54089, -119.44511 36.5..."
2,2014,P,"POLYGON ((-115.29447 32.75199, -115.29842 32.7..."
3,2014,V,"POLYGON ((-119.97019 36.77553, -119.97018 36.7..."
4,2014,T,"POLYGON ((-121.69127 38.82860, -121.69131 38.8..."
...,...,...,...
1159974,2018,P,"POLYGON ((-120.90513 40.13510, -120.90503 40.1..."
1159975,2018,P,"POLYGON ((-120.87495 40.14633, -120.87445 40.1..."
1159976,2018,P,"POLYGON ((-120.93540 40.14685, -120.93581 40.1..."
1159977,2018,V,"POLYGON ((-121.94810 40.51928, -121.94915 40.5..."


Overlay the San Joaquin Valley township boundaries on the Crops dataset to cut the crops land areas with the township boundaries, thus extracting all the crops types per townships.

In [6]:
crops_dataset.overlay_township_boundries()

KeyboardInterrupt: 

Display the map of the 2018 Crops dataset

In [None]:
# This geospatial dataset has too many small complex features to display on a Folium map or with Altair so we use Matplotlib here
fig, ax = plt.subplots(figsize=(40,40))
crops_dataset.map_df[crops_dataset.map_df["YEAR"]==2018].plot(ax=ax, column="CROP_TYPE", edgecolor='grey', linewidth = 1, cmap=None, legend=True)
crops_dataset.sjv_township_range_df.plot(ax=ax, facecolor="none", edgecolor='black', linewidth = 1, cmap=None, legend=None)
plt.show()

The Crops datasets is further modified as follow:
1. Missing townships are filled with the "X - Unclassified" class from the Crops datasets to inlude them in the dataset.
2. Data for the year 2014 are used to fill the 2015 data, the 2016 for 2017 and the 2018 for the years 2019~.
3. The dataframe is then pivoted so that each `CROP_TYPE`becomes a feature
4. features matching the below criteria are dropped:
    * cover less than 5% of the land surface of every township for any given year
    * and `["X - Not Classified", "U - Urban", "NR - Native Riparian"]` classes are dropped

Note that the result of 1 and 4 is that the missing Township-Ranges will then be in the dataset with a land surface value of `0? for every type of crop.

In [None]:
crops_dataset.fill_missing_years()
crops_dataset.fill_townships_with_no_data(features_to_fill=["CROP_TYPE"], feature_value="X")
crops_dataset.pivot_township_categorical_feature_for_output(feature_name="CROP_TYPE", feature_prefix="CROP")
crops_dataset.drop_features(drop_rate=0.05, unwanted_features=["CROP_X", "CROP_U", "CROP_NR"])

In [None]:
crops_dataset.output_df

In [None]:
crops_dataset.output_dataset_to_csv("../assets/outputs/crops_classes.csv")