# San Joaquin Valley Township Individual Crops
This notebook performs the same ETL than the "San Joaquin Valley Township Crop Classes" `crops.ipynb` notebook  but extract the crops at the individual crop name (e.g. "strawberries", "potatoes", etc.) instead of the class (e.g. "D", "R", etc) level.

Related links:
* For the documentation about these datasets, their source, how to download them, and the features of interest, please refer to our [Crops Datasets](doc/assets/crops.md) documentation
* For the explanations on how the crom mapping datasets are overlaid with township boundaries to obtain the amount of land used for each crop type in each township, please refer to our [Overlaying San Joaquin Valley Township Boundaries](doc/etl/township_overlay.md) documentation



In [1]:
# For Deepnote to be able to use the custom libraries in the parent ../lib folder
import sys
sys.path.append('..')

In [2]:
import matplotlib.pyplot as plt
from lib.crops import CropsDataset

Load the data by instantiating the CropsDataset class based on
* the geospatial map data for the years 2014, 2016 and 2018,
* the soil CSV dataset and the GeoJSON map data of the San Joaquin Valley

In [3]:
crops_dataset = CropsDataset()

KeyboardInterrupt: 

Pre-process the crops dataset to keep only the selected features for the final analysis.

In [None]:
crops_dataset.preprocess_map_df(features_to_keep = ["YEAR", "CROP_TYPE", "geometry"], get_crops_details=True)

In [None]:
crops_dataset.map_df

Overlay the San Joaquin Valley township boundaries on the Crops dataset to cut the crops land areas with the township boundaries, thus extracting all the crops types per townships.

In [None]:
crops_dataset.overlay_township_boundries()

Display the map of the 2018 Crops dataset

In [None]:
# This geospatial dataset has too many small complex features to display on a Folium map or with Altair so we use Matplotlib here
fig, ax = plt.subplots(figsize=(40,40))
crops_dataset.map_df[crops_dataset.map_df["YEAR"]==2018].plot(ax=ax, column="CROP_TYPE", edgecolor='grey', linewidth = 1, cmap=None, legend=True)
crops_dataset.sjv_township_range_df.plot(ax=ax, facecolor="none", edgecolor='black', linewidth = 1, cmap=None, legend=None)
plt.show()

The Crops datasets is further modified as follow:
1. Missing townships are filled with the "X - Unclassified" class from the Crops datasets to inlude them in the dataset.
2. Data for the year 2014 are used to fill the 2015 data, the 2016 for 2017 and the 2018 for the years 2019~.
3. The dataframe is then pivoted so that each `CROP_TYPE`becomes a feature
4. features matching the below criteria are dropped:
    * cover less than 5% of the land surface of every township for any given year
    * and `["X - Not Classified", "U - Urban", "NR - Native Riparian"]` classes are dropped

Note that the result of 1 and 4 is that the missing Township-Ranges will then be in the dataset with a land surface value of `0? for every type of crop.

In [4]:
crops_dataset.fill_missing_years()
crops_dataset.fill_townships_with_no_data(features_to_fill=["CROP_TYPE"], feature_value="X")
crops_dataset.pivot_township_categorical_feature_for_output(feature_name="CROP_TYPE", feature_prefix="CROP")
crops_dataset.drop_features(drop_rate=0.05, unwanted_features=["CROP_X", "CROP_U", "CROP_NR"])

NameError: name 'crops_dataset' is not defined

In [None]:
crops_dataset.output_df

In [None]:
crops_dataset.output_dataset_to_csv("../assets/outputs/crops_detailed.csv")