<img src="https://brand.gatech.edu/sites/default/files/inline-images/extended-RGB.png" 
    width="700" 
    height="200" />

# MGT-6203 Group Project "Climate Avengers"
### County-Level Spatial Analysis for Effective Crop Production Planning

<b>Author:</b> Steven Wasserman
<b>Date Created:</b> 04/12/2024

<b>Purpose of script:</b> To determine the current production totals of crops in each county of California, then recommend ideal counties for producing crops through use of cardinal temperatures/precipitation conditions and spatial analysis of 10-year and 30-year normals. 

<b>Copyright (c) Steven Wasserman, 2024
Email: [swasserman9@gatech.edu](mailto:swasserman9@gatech.edu)</b>

### Environment Set-up

This section calls all required packages and libraries, and reads in datafiles from GitHub. <b>NOTE:</b> A pre-established connection to the GitHub repository is required to properly call the data.

In [461]:
# Load packages
import numpy as np
import os
import pandas as pd
import scipy.constants
from scipy.constants import convert_temperature
from scipy import spatial

In [462]:
# Import data

## File paths
file_path = os.path.normpath(os.getcwd() + os.sep + os.pardir)
Imputed_Combined_Daily_Normals = '/Main_Data/Imputed_Combined_Daily_Normals.csv'
CaliforniaCropsCountyReady = '/Crop_Data/CaliforniaCropsCountyReady.csv'
CaliforniaCardinalData = '/Crop_Data/CaliforniaCardinalData.csv'

## File imports
# `Weather_df` describes the combined daily weather recordings as well as 30-year weather normals for each weather station. There is also data on observed weather patterns for each day (e.g., snow, hail, tornadoes), geographic data, and more. The data has been imputed to reduce or remove NA values as appropriate.  
Weather_df = pd.read_csv(file_path + Imputed_Combined_Daily_Normals, dtype = {"STATION": "string", "DailyStation": "string"})

# `CACrops_Counts` describes the acreage in each CA county used to grow each crop of interest to this analysis (i.e., the top 10 highest value crops for CA)
# NOTE : The columns 'ALAND' and 'AWATER' are in sq. meters, and 'Count' describes parcels of land equivalent to 0.22 acres of land
CACrops_Counts = pd.read_csv(file_path + CaliforniaCropsCountyReady).rename(columns = {'CropTypes' : 'Crop'})

# `CACrops_Descr` describes the topographical information each crop of interest to this analysis (i.e., the top 10 highest value crops for CA)
CACrops_Descr = pd.read_csv(file_path + CaliforniaCardinalData).rename(columns = {'CropTypes' : 'Crop'})

### Current Crop Production Patterns in CA, 2010-2020

This section performs preparation of the data to describe what the current distribution of crop production looks like across CA. Three dataframes are created, including:
- `CACrops_Descr_tr` - Describes the top crops produced in each county for each year between 2010 and 2020
- `CACrops_Descr_tr_agg` - Describes the top crops produced in each county between 2010 and 2020
- `CACrops_Descr_tr_agg_total` - Describes the top crops produced across CA between 2010 and 2020

The three datasets are merged to create `CACrops_df`. Additionally, `CAFarmland` is created to describe the top-producing CA counties, in terms of farmland. It takes the average acreage of farmland recorded for each county between 2010 and 2020, and produces a ranking accordingly. 

In [463]:
# This code transforms the metrics of sq. meters for columns from the TIGER dataset and from acres for columns from the crop production dataset
CACrops_Descr_tr = CACrops_Counts.copy()
# Source: https://stackoverflow.com/questions/31246602/shape-area-and-aland-awater-in-tiger-census-data
CACrops_Descr_tr['Total_Land_Area_sq_mi'] = CACrops_Descr_tr['ALAND'].apply(lambda x: x/(scipy.constants.mile**2))
CACrops_Descr_tr['Total_Water_Area_sq_mi'] = CACrops_Descr_tr['AWATER'].apply(lambda x: x/(scipy.constants.mile**2))
# NOTE : The 'Count' column represents parcels of land that are 0.22 acres in dimension
CACrops_Descr_tr['Total_Crop_Area_sq_mi'] = CACrops_Descr_tr['Count'].apply(lambda x: (x*0.22)/640)
CACrops_Descr_tr.drop(columns = ['ALAND', 'AWATER', 'Count'], inplace = True)
# Creates the ranking for production of each crop in each county for each year
CACrops_Descr_tr['Crop_Prevalence_by_County_per_Year'] = (CACrops_Descr_tr.groupby(by = ['County', 'Year'])['Total_Crop_Area_sq_mi'].rank(method = "dense", ascending = False)).astype(int)

# Creates the ranking for production of each crop in each county
CACrops_Descr_tr_agg = CACrops_Descr_tr[['County', 'Crop', 'Total_Crop_Area_sq_mi']].copy()
CACrops_Descr_tr_agg = CACrops_Descr_tr_agg.set_index(['County', 'Crop'])
CACrops_Descr_tr_agg = CACrops_Descr_tr_agg.groupby(by = ['County', 'Crop']).apply(np.sum, axis = 0)
CACrops_Descr_tr_agg.reset_index(inplace = True)
CACrops_Descr_tr_agg['Crop_Prevalence_by_County_rank'] = (CACrops_Descr_tr_agg.groupby(by = ['County'])['Total_Crop_Area_sq_mi'].rank(method = "dense", ascending = False)).astype(int)
CACrops_Descr_tr_agg.sort_values(by = ['County', 'Crop_Prevalence_by_County_rank'], inplace = True)

# Creates the ranking for production of each crop of interest in the CA study
CACrops_Descr_tr_agg_total = CACrops_Descr_tr[['Crop', 'Total_Crop_Area_sq_mi']].copy()
CACrops_Descr_tr_agg_total = CACrops_Descr_tr_agg_total.set_index('Crop')
CACrops_Descr_tr_agg_total = CACrops_Descr_tr_agg_total.groupby('Crop').apply(np.sum, axis = 0)
CACrops_Descr_tr_agg_total.reset_index(inplace = True)
CACrops_Descr_tr_agg_total['Crop_Prevalence_CA_rank'] = (CACrops_Descr_tr_agg_total['Total_Crop_Area_sq_mi'].rank(method = "dense", ascending = False)).astype(int)
CACrops_Descr_tr_agg_total.sort_values(by = ['Crop_Prevalence_CA_rank'], inplace = True)

# Aggregates all the data together
CACrops_df = CACrops_Descr_tr.merge(CACrops_Descr_tr_agg[['County', 'Crop', 'Crop_Prevalence_by_County_rank']], how = 'left', on = ['County', 'Crop']).merge(CACrops_Descr_tr_agg_total[['Crop', 'Crop_Prevalence_CA_rank']], how = 'left', on = 'Crop').sort_values(by = ['County','Year', 'Crop_Prevalence_by_County_per_Year'])

# Describes the farmland use in each county in CA
CAFarmland = CACrops_Descr_tr[['County', 'Year', 'Total_Crop_Area_sq_mi']].copy()
CAFarmland = CAFarmland.set_index(['County', 'Year'])
CAFarmland = CAFarmland.groupby(by = ['County', 'Year']).apply(np.sum, axis = 0).rename(columns = {'Total_Crop_Area_sq_mi' : 'Total_Annual_Farmland_sq_mi'}, inplace = False)
CAFarmland.reset_index(inplace = True)
CAFarmland.drop(columns = ['Year'], inplace = True)
CAFarmland = CAFarmland.set_index('County')
CAFarmland = CAFarmland.groupby('County').apply(np.mean, axis = 0).rename(columns = {'Total_Annual_Farmland_sq_mi' : 'Avg_Farmland_sq_mi'}, inplace = False)
CAFarmland.reset_index(inplace = True)
CAFarmland['Total_Farmland_rank'] = (CAFarmland['Avg_Farmland_sq_mi'].rank(method = "dense", ascending = False)).astype(int)
CAFarmland.sort_values(by = ['Total_Farmland_rank'], inplace = True)
CAFarmland = CAFarmland.merge(CACrops_Descr_tr[['County', 'Total_Land_Area_sq_mi']].drop_duplicates(), on = 'County')
CAFarmland['Farmland_to_Land_ratio'] = CAFarmland['Avg_Farmland_sq_mi'].div(CAFarmland['Total_Land_Area_sq_mi'])
CAFarmland = CAFarmland.sort_values(by = 'Total_Farmland_rank')

### Spatial Analysis: What <i>Should</i> Crop Production in CA Be?

This section performs data curation and manipulation to prepare the observed weather data for each county. The data is collated to find temperature and precipitation averages for each crop, then spatial analysis is used to compare each crop's attributes to the observed weather patterns in each CA county. A ranking is produced than suggests, according to observed weather patterns, which crop each county should be producing. This is then compared to actual current production to finalize a recommendation that suggests:
- Which crops each county in California should be producing based on observed weather conditions, including temperature and rainfall, from 2010-2020
- Which counties are currently producing according to model expectations, and which are not

In [465]:
## This code prepares the cardinal weather data (primarily temperature and precipitation) into understandable metrics that can be compared to other curated datasets

# Constants for conversion
mm = scipy.constants.milli; inch = scipy.constants.inch

# Convert temperature from Celcius to Fahrenheit
CA_Crops = CACrops_Descr[["Crop", "OptimalTempMin_C", "OptimalTempMax_C",
                            "AbsTempMin_C", "AbsTempMax_C", "OptimalRainfallMin_mm",
                            "OptimalRainfallMax_mm", "AbsRainfallMin_mm", "AbsRainfallMax_mm"]].copy()
CA_Crops['AbsMinTemp_F'] = CA_Crops['AbsTempMin_C'].apply(lambda x: convert_temperature(x, 'C', 'F'))
CA_Crops['AbsMaxTemp_F'] = CA_Crops['AbsTempMax_C'].apply(lambda x: convert_temperature(x, 'C', 'F'))
CA_Crops['OptimalMinTemp_F'] = CA_Crops['OptimalTempMin_C'].apply(lambda x: convert_temperature(x, 'C', 'F'))
CA_Crops['OptimalMinTemp_F'] = CA_Crops['OptimalTempMax_C'].apply(lambda x: convert_temperature(x, 'C', 'F'))
CA_Crops['AvgTemp_F'] = ((CA_Crops['OptimalTempMin_C'] + CA_Crops['OptimalTempMax_C'])/2).apply(lambda x: convert_temperature(x, 'C', 'F'))

#Convert rainfall from millimeters to inches
CA_Crops['AbsMinRainfall_in'] = CA_Crops['AbsRainfallMin_mm'].apply(lambda x: (x*mm)/inch)
CA_Crops['AbsMaxRainfall_in'] = CA_Crops['AbsRainfallMax_mm'].apply(lambda x: (x*mm)/inch)
CA_Crops['OptimalMinRainfall_in'] = CA_Crops['OptimalRainfallMin_mm'].apply(lambda x: (x*mm)/inch)
CA_Crops['OptimalMaxRainfall_in'] = CA_Crops['OptimalRainfallMax_mm'].apply(lambda x: (x*mm)/inch)
CA_Crops['AvgRainfall_in'] = ((CA_Crops['OptimalRainfallMin_mm'] + CA_Crops['OptimalRainfallMax_mm'])/2).apply(lambda x: (x*mm)/inch)

# Insert growing months, convert from Gregorian calendar names to numbers
CACrops_growing_months = CACrops_Descr[['Crop', 'GrowingMonth_Start', 'GrowingMonth_End']].copy()
choices = np.array(range(1, 13), dtype = int)
start_month_conditions = [
    (CACrops_growing_months['GrowingMonth_Start'] == 'January'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'February'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'March'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'April'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'May'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'June'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'July'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'August'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'September'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'October'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'November'),
    (CACrops_growing_months['GrowingMonth_Start'] == 'December')
]
end_month_conditions = [
    (CACrops_growing_months['GrowingMonth_End'] == 'January'),
    (CACrops_growing_months['GrowingMonth_End'] == 'February'),
    (CACrops_growing_months['GrowingMonth_End'] == 'March'),
    (CACrops_growing_months['GrowingMonth_End'] == 'April'),
    (CACrops_growing_months['GrowingMonth_End'] == 'May'),
    (CACrops_growing_months['GrowingMonth_End'] == 'June'),
    (CACrops_growing_months['GrowingMonth_End'] == 'July'),
    (CACrops_growing_months['GrowingMonth_End'] == 'August'),
    (CACrops_growing_months['GrowingMonth_End'] == 'September'),
    (CACrops_growing_months['GrowingMonth_End'] == 'October'),
    (CACrops_growing_months['GrowingMonth_End'] == 'November'),
    (CACrops_growing_months['GrowingMonth_End'] == 'December')
]
CACrops_growing_months['GrowingMonth_Start_no'] = np.select(start_month_conditions, choices, default = np.nan)
CACrops_growing_months['GrowingMonth_End_no'] = np.select(end_month_conditions, choices, default = np.nan)

# Save output data to new '_converted' DataFrame, merge dataframes together
CA_Crops_ct = CA_Crops[['Crop', 'AbsMinTemp_F', 'AvgTemp_F', 'AbsMaxTemp_F',
                        'AbsMinRainfall_in', 'AvgRainfall_in', 'AbsMaxRainfall_in']] \
               .merge(CACrops_growing_months, how = 'left', on = 'Crop')
                # .set_index('Crop')
# CA_Crops_ct

In [466]:
# This code computes the monthly averages for each county's avg/min/max observed temperatures
CA_Counties_temps = Weather_df[['Date', 'DailyCounty', 'MinTemp', 'AvgTemp', 'MaxTemp']].copy().rename(columns = {'DailyCounty' : 'County'}, inplace = False)
CA_Counties_temps['Month'] = pd.DatetimeIndex(CA_Counties_temps['Date']).month
CA_Counties_temps = CA_Counties_temps.set_index(['Month', 'County'])
CA_Counties_temps.drop(['Date'], axis = 1, inplace = True)
CA_Counties_temps_agg = CA_Counties_temps.groupby(by = ['Month', 'County']).apply(np.mean, axis = 0)
CA_Counties_temps_agg.reset_index(inplace = True)

# This code computes the monthly totals for each county's observed rainfall
CA_Counties_precip = Weather_df[['Date', 'DailyCounty', 'Precipitation']].copy().rename(columns = {'DailyCounty' : 'County'}, inplace = False)
# NOTE : The documentation in '~/Team-95/Data/Daily_Weather_Data/Daily_Source_Documentation.txt' states the following:
#   "Many stations do not report “0” on days with no precipitation, therefore “99.99” will often appear on these days."
# We will infill '0' for these instances
CA_Counties_precip[CA_Counties_precip['Precipitation'] == 99.99] = 0
CA_Counties_precip['Month'] = pd.DatetimeIndex(CA_Counties_precip['Date']).month
CA_Counties_precip = CA_Counties_precip.set_index(['Month', 'County'])
CA_Counties_precip.drop(['Date'], axis = 1, inplace = True)
CA_Counties_precip_agg = CA_Counties_precip.groupby(by = ['Month', 'County']).apply((np.sum), axis = 0)
CA_Counties_precip_agg.reset_index(inplace = True)
# We will compare this with the 30-year normals on month-to-date precipitation...
# For this analysis, we take the average of precipitation observations for all stations in each county
CA_Counties_precip_mtd = Weather_df[['MonthDay', 'NormalCounty', 'normalMtdPrcp']].copy().rename(columns = {'NormalCounty' : 'County', 'normalMtdPrcp' : 'Precipitation_month_to_date'}, inplace = False).drop_duplicates()
CA_Counties_precip_mtd = CA_Counties_precip_mtd.set_index(['MonthDay', 'County'])
CA_Counties_precip_mtd = CA_Counties_precip_mtd.groupby(by = ['MonthDay', 'County']).apply((np.mean), axis = 0)
CA_Counties_precip_mtd.reset_index(inplace = True)
# ... then we find the monthly totals...
CA_Counties_precip_mtd['Month'] = (CA_Counties_precip_mtd['MonthDay'].str.split('-', expand = True)[0]).astype(int)
CA_Counties_precip_mtd.drop(['MonthDay'], axis = 1, inplace = True)
CA_Counties_precip_mtd = CA_Counties_precip_mtd.set_index(['Month', 'County'])
CA_Counties_precip_mtd_agg = CA_Counties_precip_mtd.groupby(by = ['Month', 'County']).apply((np.sum), axis = 0)
CA_Counties_precip_mtd_agg.reset_index(inplace = True)
# 'CA_Rainfall' describes both 10-year and 30-year normals for precipitation in CA counties
CA_Rainfall = CA_Counties_precip_agg.merge(CA_Counties_precip_mtd_agg, how = 'left', on = ['County', 'Month']).rename(columns = {'Precipitation' : 'Avg_Precipitation_2010_to_2020', 'Precipitation_month_to_date' : 'Avg_Precipitation_1991_to_2020'})
# 'CA_Counties_Weather' describes monthly statistics on CA county observed weather, including average daily min/max/avg. temperature and monthly precipitation totals
CA_Counties_Weather = CA_Counties_temps_agg.merge(CA_Rainfall, how = 'left', on = ['County', 'Month']).sort_values(by = ['County', 'Month'], inplace = False)
# CA_Counties_Weather

In [473]:
# Perform spatial analysis
## The 'CA_Counties_Weather' dataset is reduced into two matrices: One with 10-yr. precipitation normals and the other with 30-yr. precipitation normals
## The 'CA_Crops_ct' dataset is parsed then reduced to columns needed for analysis

# `k_factor` - Determines the top 'k' number of results to return
k_factor = 3
output = pd.DataFrame()

for index, row in CA_Crops_ct.iterrows():
    ## Object instantiation for each loop
    # LIST () : All the months of the year
    growing_months = list(range(1, 13))
    # Name of the crop
    crop = row['Crop']; crop_max_rain = row['AbsMaxRainfall_in']; crop_min_rain = row['AbsMinRainfall_in']
    crop_stats = row[['AbsMinTemp_F', 'AvgTemp_F', 'AbsMaxTemp_F', 'AvgRainfall_in']].copy()
    # Copy of the weather data
    County_Weather_crop = CA_Counties_Weather.copy()
    
    # When there are no specific growing months...
    if (pd.isna(row['GrowingMonth_Start_no'])):
        pass
    
    # When there ARE specific growing months...
    else:
        # Determine the growing months
        start = growing_months.index(int(row['GrowingMonth_Start_no']))
        months_reordered = growing_months[start:] + growing_months[:start]
        end = months_reordered.index(int(row['GrowingMonth_End_no']))
        growing_months = months_reordered[:end+1]
        # Take only observations that are in the growing months
        County_Weather_crop = County_Weather_crop[County_Weather_crop['Month'].isin(growing_months)].drop(columns = ['Month'], axis = 1, inplace = False)
    
    ## Curate the aggregate statistics to compare to the crop
    # Determine the average observed temperatures for the growing months of the crop
    County_Weather_crop_temp = County_Weather_crop[['County', 'MinTemp', 'AvgTemp', 'MaxTemp']].copy()
    County_Weather_crop_temp = County_Weather_crop_temp.set_index('County')
    County_Weather_crop_temp = County_Weather_crop_temp.groupby('County').apply(np.mean, axis = 0)
    County_Weather_crop_temp.reset_index(inplace = True)
    # Determine the total observed rainfall for the growing months of the crop 
    County_Weather_crop_rain = County_Weather_crop[['County', 'Avg_Precipitation_2010_to_2020', 'Avg_Precipitation_1991_to_2020']].copy()
    County_Weather_crop_rain = County_Weather_crop_rain.set_index('County')
    County_Weather_crop_rain = County_Weather_crop_rain.groupby('County').apply(np.sum, axis = 0)
    County_Weather_crop_rain.reset_index(inplace = True)
    # Put it together into two matrices: One looking at more recent rainfall patterns, and the other at longer patterns
    County_Weather_10yr = County_Weather_crop_temp.merge(County_Weather_crop_rain[['County', 'Avg_Precipitation_2010_to_2020']].copy(), how = 'left', on = 'County')
    County_Weather_30yr = County_Weather_crop_temp.merge(County_Weather_crop_rain[['County', 'Avg_Precipitation_1991_to_2020']].copy(), how = 'left', on = 'County')
    # Create datasets that exclude counties where the observed rainfall is outside [min, max]
    County_Weather_10yr_s = County_Weather_10yr[(County_Weather_10yr['Avg_Precipitation_2010_to_2020'] < crop_max_rain) & \
                                                (County_Weather_10yr['Avg_Precipitation_2010_to_2020'] > crop_min_rain)]
    County_Weather_30yr_s = County_Weather_30yr[(County_Weather_30yr['Avg_Precipitation_1991_to_2020'] < crop_max_rain) & \
                                                (County_Weather_30yr['Avg_Precipitation_1991_to_2020'] > crop_min_rain)]
    # Create trees for each dataset...
    tree_10yr_all = spatial.KDTree(County_Weather_10yr.to_numpy()[:,1:])
    tree_30yr_all = spatial.KDTree(County_Weather_30yr.to_numpy()[:,1:])
    tree_10yr_ranged = spatial.KDTree(County_Weather_10yr_s.to_numpy()[:,1:])
    tree_30yr_ranged = spatial.KDTree(County_Weather_30yr_s.to_numpy()[:,1:])
    # Query spatial analysis results from each tree
    tree_10yr_all_top_3_search = tree_10yr_all.query(crop_stats, k = list(range(1,k_factor + 1)))
    tree_10yr_all_top_3_results = [County_Weather_10yr.iloc[i,0] for i in tree_10yr_all_top_3_search[1].tolist()]
    
    tree_30yr_all_top_3_search = tree_30yr_all.query(crop_stats, k = list(range(1,k_factor + 1)))
    tree_30yr_all_top_3_results = [County_Weather_30yr.iloc[i,0] for i in tree_30yr_all_top_3_search[1].tolist()]

    tree_10yr_ranged_top_3_search = tree_10yr_ranged.query(crop_stats, k = list(range(1,k_factor + 1)))
    tree_10yr_ranged_top_3_results = [County_Weather_10yr_s.iloc[i,0] for i in tree_10yr_ranged_top_3_search[1].tolist()]
    
    tree_30yr_ranged_top_3_search = tree_30yr_ranged.query(crop_stats, k = list(range(1,k_factor + 1)))
    tree_30yr_ranged_top_3_results = [County_Weather_30yr_s.iloc[i,0] for i in tree_30yr_ranged_top_3_search[1].tolist()]
    # Put results of analysis into DataFrame
    results = pd.DataFrame({'Ranked_Counties_10yr_precip_minmax_inclusive' : tree_10yr_all_top_3_results,
                            'Ranked_Counties_30yr_precip_minmax_inclusive' : tree_30yr_all_top_3_results,
                            'Ranked_Counties_10yr_precip_minmax_exclusive' : tree_10yr_ranged_top_3_results,
                            'Ranked_Counties_30yr_precip_minmax_exclusive' : tree_30yr_ranged_top_3_results})
    results['Crop'] = crop
    # Get the rankings of current production
    CACountyRankings_percrop = CACrops_Descr_tr_agg[['County', 'Crop', 'Total_Crop_Area_sq_mi']].copy()
    CACountyRankings_percrop['Rank'] = (CACountyRankings_percrop.groupby('Crop')['Total_Crop_Area_sq_mi'].rank(method = "dense", ascending = False)).astype(int)
    CACountyRankings_percrop = CACountyRankings_percrop[(CACountyRankings_percrop['Rank'] <= k_factor) & (CACountyRankings_percrop['Crop'] == crop)].sort_values(by = ['Rank'], inplace = False)
    CACountyRankings_percrop.index = CACountyRankings_percrop['Rank']-1
    # Append current production rankings to spatial analysis results
    results = results.join(CACountyRankings_percrop[['County']], how = 'left').rename(columns = {'County' : 'Current_County_ranked'})
    results = results.loc[:, ['Crop', 'Current_County_ranked', 'Ranked_Counties_10yr_precip_minmax_inclusive', 'Ranked_Counties_30yr_precip_minmax_inclusive',
                              'Ranked_Counties_10yr_precip_minmax_exclusive', 'Ranked_Counties_30yr_precip_minmax_exclusive']]
    # Append to output dataframe
    output = pd.concat([output, results], ignore_index = True)
# Display output data frame
output

Unnamed: 0,Crop,Current_County_ranked,Ranked_Counties_10yr_precip_minmax_inclusive,Ranked_Counties_30yr_precip_minmax_inclusive,Ranked_Counties_10yr_precip_minmax_exclusive,Ranked_Counties_30yr_precip_minmax_exclusive
0,Rice,Colusa,Sacramento,Sacramento,Sacramento,Sacramento
1,Rice,Sutter,Yuba,Calaveras,Yuba,Calaveras
2,Rice,Butte,Nye,Solano,Nye,Solano
3,Tomatoes,Fresno,Fresno,Riverside,Fresno,Riverside
4,Tomatoes,Yolo,Madera,Kings,Madera,Kings
5,Tomatoes,San Joaquin,Stanislaus,San Bernardino,Stanislaus,San Bernardino
6,Oranges,Tulare,Imperial,Mohave,Imperial,Mohave
7,Oranges,Kern,Inyo,Clark,Inyo,Clark
8,Oranges,Fresno,Yuma,Yuma,Yuma,Yuma
9,Lettuce,Imperial,Churchill,Mineral,Churchill,Mineral


### Conclusion & Next Steps

As evident in the dataset, few if any of the recommended counties are actually the producers of the top-valued crops in CA. An emergent pattern of more matches could possibly be seen if the `k_factor` was to be increased, but it is most likely due to data not in scope for this project. Next steps could seek to increase the fidelity of current data or identify new data that could describe the difference. 

Further, later work in this area could analyze soil composition and insect invasion patterns to further inform the given recommendations. Codifying the cyclical nature of annual pest invasion cycles could prove valuable to farmers to protect harvests through preventative measures. And the inclusion of soil composition data could lead to more bountiful harvests, more sustained success, and greater individual outcomes for farmers. 


In [475]:
output.to_csv(f"CA_Crops_Spatial_Analysis_Top_{k_factor}.csv")