<img src = "https://escp.eu/sites/default/files/logo/ESCP-logo-white-misalign.svg" width = 400 style="background-color: #240085;">
<h1 align=center><font size = 6>ESCP Business School</font></h1>
<h3 align=center><font size = 5>SCOR Datathon</font><br/>
<font size = 3>The Data Science Challenge Bridging Indian Agricultureal Protection Gap</font></h3>
<h6 align=center>Chapter 1 - Preprocessing (cont'd)</h6>

Last Updated: December 20, 2021\
Author: Group 21 - Anniek Brink, Jeanne Dubois, and Resha Dirga

<h3>Chapter Objectives</h3>

<p>In this chapter the merged datasets will be processed based on the findings and next steps in Data Exploratory chapter. At the end of this chapter, the dataset should be ready to be used for clustering. The treatment to the dataset, including:</p>
<ul>
    <li>Datatypes correction</li>
    <li>Data imputation, and</li>    
    <li>Review of dataset consistency after imputation</li>
</ul>
<p>In addition to the three items above, the best granularity level for analysis shall be determined during Data Imputation to get the best result possible.</p>

<u>Note:</u> This chapter is a continuation of <b><i>Chapter 1 - Preprocessing</i></b>

<h3>Chapter 1: Import modules</h3>
<p>This chapter lists all modules that being used on this document. The module import process will be performed on this chapter</p>

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

<h3>Chapter 2: Read datasets</h3>
<p>This chapter reads the merged datasets and provides a quick overview of each datasets, i.e.: the first five line of each dataset, information of the data types and the summary of all numerical values.</p>

<h5>Chapter 2.1 - Read datasets from the preprocessing checkpoint</h5>

In [2]:
# List of file directories
checkpoint_preprocessed_filenames = [
    "datasets_preprocessed/df_kharif_preprocessed_checkpoint_2.csv",
    "datasets_preprocessed/df_rabi_preprocessed_checkpoint_2.csv"
]

In [3]:
# Store datasets in a dataframe
df = {}
for filename in checkpoint_preprocessed_filenames:
    df[filename] = pd.read_csv(filename, delimiter=";",index_col = 0)

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [4]:
# Create a function for indexing the specific dataset
def df_shorten(season):
    if season == "kharif":
        return df[list(df.keys())[0]]
    elif season == "rabi":
        return df[list(df.keys())[1]]
    else:
        print("Data is out of range. Available season data: kharif, rabi (case sensitive)")
        return

<h5>Chapter 2.2 - Dataset overview</h5>

In [5]:
# Setup viewing parameter for dataset overview
pd.set_option('display.max_columns', None)

In [6]:
# Print datasets info
for index in checkpoint_preprocessed_filenames:
    print('')
    print(index)
    print('----------')
    df[index].info()
    print('<<=====================>>')


datasets_preprocessed/df_kharif_preprocessed_checkpoint_2.csv
----------
<class 'pandas.core.frame.DataFrame'>
Int64Index: 511242 entries, 1850 to 511241
Data columns (total 35 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   State               511242 non-null  object 
 1   Cluster             511242 non-null  int64  
 2   District            511242 non-null  object 
 3   Sub-District        429814 non-null  object 
 4   Block               365178 non-null  object 
 5   GP                  374588 non-null  object 
 6   Season              511242 non-null  object 
 7   Crop                511242 non-null  object 
 8   Area Sown (Ha)      235991 non-null  float64
 9   Area Insured (Ha)   463017 non-null  float64
 10  SI Per Ha (Inr/Ha)  483684 non-null  float64
 11  Sum Insured (Inr)   490574 non-null  float64
 12  Indemnity Level     511242 non-null  float64
 13  2000 Yield          13682 non-null   float64
 14  2001 Yi

In [7]:
# Print datasets first five lines
for index in checkpoint_preprocessed_filenames:
    print('')
    print(index)
    print('----------')
    print(df[index].head())
    print('<<=====================>>')


datasets_preprocessed/df_kharif_preprocessed_checkpoint_2.csv
----------
               State  Cluster  District Sub-District Block                 GP  \
1850  andhra pradesh        1  chittoor  b.kothakota   NaN    badikayalapalle   
1851  andhra pradesh        1  chittoor  b.kothakota   NaN  bayyappagaripalle   
1852  andhra pradesh        1  chittoor  b.kothakota   NaN           beerangi   
1853  andhra pradesh        1  chittoor  b.kothakota   NaN              gattu   
1854  andhra pradesh        1  chittoor  b.kothakota   NaN         gollapalle   

      Season   Crop  Area Sown (Ha)  Area Insured (Ha)  SI Per Ha (Inr/Ha)  \
1850  kharif  paddy        9.336967           7.330287             70000.0   
1851  kharif  paddy        9.336967           7.330287             70000.0   
1852  kharif  paddy        9.336967           7.330287             70000.0   
1853  kharif  paddy        9.336967           7.330287             70000.0   
1854  kharif  paddy        9.336967           7.3

In [8]:
# Print datasets summary for numerical columns
for index in checkpoint_preprocessed_filenames:
    print('')
    print(index)
    print('----------')
    print(df[index].describe())
    print('<<=====================>>')


datasets_preprocessed/df_kharif_preprocessed_checkpoint_2.csv
----------
             Cluster  Area Sown (Ha)  Area Insured (Ha)  SI Per Ha (Inr/Ha)  \
count  511242.000000   235991.000000      463017.000000       483684.000000   
mean        4.812815      414.274412         173.325412        38769.535964   
std         2.889222     1188.494481         830.693864        21618.886084   
min         1.000000        0.000000           0.000000         1400.000000   
25%         2.000000       78.093023           1.667800        20524.000000   
50%         4.000000      170.689798          35.017224        36000.000000   
75%         7.000000      376.251451         150.433460        51686.000000   
max        12.000000    85824.000000       85300.000000       250000.000000   

       Sum Insured (Inr)  Indemnity Level    2000 Yield    2001 Yield  \
count       4.905740e+05    511242.000000  13682.000000  13664.000000   
mean        7.813047e+06         0.810371   1547.075772   2334.42837

<h3>Chapter 4: Preprocessing 3 - External data integration</h3>
<p>There are some external data that are complimentary to the clustering and analysis. Thus, this chapter contains all procedures of adding external data to the main dataset that will be used for clustering later on.</p>

In [9]:
# Create function to check null-values
def check_null(data):
    try:
        sns.heatmap(data.isnull(),yticklabels=False,cbar=False,cmap='viridis')
    except:
        pass
    
    df_check_null = {
    'Null': data.isna().sum(),
    'Non-null': data.count(),
    }
    
    return pd.DataFrame(df_check_null).transpose()

In [10]:
# List of file directories
additional_file_filenames = [
    "./../01_Additional_data/cities_climate/SCOR_Cities_Climate_integrated_kharif.csv",
    "./../01_Additional_data/cities_climate/SCOR_Cities_Climate_integrated_rabi.csv"
]

In [11]:
# Store datasets in a dataframe
df_additionals = {}
for filename in additional_file_filenames:
    df_additionals[filename] = pd.read_csv(filename)

In [12]:
# Assign additional data to a variable
df_adds_climate_kharif = df_additionals[list(df_additionals.keys())[0]]
df_adds_climate_rabi = df_additionals[list(df_additionals.keys())[1]]

In [13]:
# Prepare the additional data for integration
df_adds_climate_kharif['Integration_ID'] = df_adds_climate_kharif.apply(lambda x: x['District ID'].split('_')[0].lower() + "_" + x['District ID'].split('_')[2].lower(), axis=1)
df_adds_climate_kharif = df_adds_climate_kharif[df_adds_climate_kharif.columns.tolist()[15:]]

for col in df_adds_climate_kharif.columns.tolist()[:-3]:
    try:
        df_adds_climate_kharif[col] = df_adds_climate_kharif.apply(lambda x: str(x[col]).split(" ")[0], axis=1)
        df_adds_climate_kharif[col] = pd.to_numeric(df_adds_climate_kharif[col], errors='coerce')
    except:
        pass

df_adds_climate_kharif = df_adds_climate_kharif.drop(columns=['index_ref'])
df_adds_climate_kharif.replace('unknown', np.nan, inplace=True)
df_adds_climate_kharif

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_adds_climate_kharif[col] = df_adds_climate_kharif.apply(lambda x: str(x[col]).split(" ")[0], axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_adds_climate_kharif[col] = pd.to_numeric(df_adds_climate_kharif[col], errors='coerce')


Unnamed: 0,Average Temperature January (C),Average Temperature February (C),Average Temperature March (C),Average Temperature April (C),Average Temperature May (C),Average Temperature June (C),Average Temperature July (C),Average Temperature August (C),Average Temperature September (C),Average Temperature October (C),Average Temperature November (C),Average Temperature December (C),Min. Temperature January (C),Min. Temperature February (C),Min. Temperature March (C),Min. Temperature April (C),Min. Temperature May (C),Min. Temperature June (C),Min. Temperature July (C),Min. Temperature August (C),Min. Temperature September (C),Min. Temperature October (C),Min. Temperature November (C),Min. Temperature December (C),Max. Temperature January (C),Max. Temperature February (C),Max. Temperature March (C),Max. Temperature April (C),Max. Temperature May (C),Max. Temperature June (C),Max. Temperature July (C),Max. Temperature August (C),Max. Temperature September (C),Max. Temperature October (C),Max. Temperature November (C),Max. Temperature December (C),Precipitation January,Precipitation February,Precipitation March,Precipitation April,Precipitation May,Precipitation June,Precipitation July,Precipitation August,Precipitation September,Precipitation October,Precipitation November,Precipitation December,Humidity (%) January,Humidity (%) February,Humidity (%) March,Humidity (%) April,Humidity (%) May,Humidity (%) June,Humidity (%) July,Humidity (%) August,Humidity (%) September,Humidity (%) October,Humidity (%) November,Humidity (%) December,Rainy days (d) January,Rainy days (d) February,Rainy days (d) March,Rainy days (d) April,Rainy days (d) May,Rainy days (d) June,Rainy days (d) July,Rainy days (d) August,Rainy days (d) September,Rainy days (d) October,Rainy days (d) November,Rainy days (d) December,Average Sun Hours January,Average Sun Hours February,Average Sun Hours March,Average Sun Hours April,Average Sun Hours May,Average Sun Hours June,Average Sun Hours July,Average Sun Hours August,Average Sun Hours September,Average Sun Hours October,Average Sun Hours November,Average Sun Hours December,Integration_ID
0,22.8,26.1,29.6,33.6,35.5,31.0,27.0,26.2,26.6,26.4,24.6,22.5,16.1,19.1,22.4,26.5,29.6,27.2,24.7,24.0,23.6,21.7,19.1,16.2,29.2,32.7,36.3,40.1,41.3,35.3,29.9,29.0,30.1,31.0,29.9,28.5,13.0,8.0,13.0,10.0,15.0,185.0,294.0,282.0,156.0,60.0,16.0,3.0,,,,,,,,,,,,,1.0,1.0,2.0,2.0,2.0,10.0,16.0,15.0,11.0,5.0,2.0,1.0,9.6,10.1,10.7,11.3,11.7,10.6,8.7,8.1,8.5,9.5,9.6,9.6,telangana_adilabad
1,18.0,21.1,26.0,31.4,33.7,30.8,26.2,25.1,25.8,25.9,22.8,19.1,11.1,13.9,18.4,23.8,27.1,26.2,23.8,23.0,22.4,19.8,16.4,12.4,25.0,28.2,33.2,38.3,40.0,36.0,29.3,27.9,29.8,31.9,29.3,26.0,5.0,7.0,5.0,3.0,8.0,103.0,343.0,288.0,122.0,17.0,8.0,7.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,15.0,16.0,9.0,2.0,1.0,1.0,9.3,10.0,10.7,11.4,11.8,10.4,6.0,5.0,8.1,10.1,9.7,9.4,madhya pradesh_agar malwa
2,14.1,17.8,23.7,29.9,33.4,33.5,29.9,28.6,28.2,26.1,21.2,15.9,8.0,11.1,16.0,21.7,26.2,28.4,26.9,25.9,24.4,19.8,14.7,9.6,20.6,24.6,31.2,37.7,40.2,38.6,33.5,32.0,32.5,32.5,28.0,22.7,15.0,21.0,12.0,8.0,12.0,78.0,225.0,231.0,108.0,22.0,6.0,8.0,,,,,,,,,,,,,2.0,2.0,2.0,2.0,3.0,6.0,15.0,15.0,9.0,2.0,1.0,1.0,8.5,9.6,10.6,11.5,12.1,11.7,9.3,8.6,9.1,10.0,9.6,9.0,uttar pradesh_agra
3,20.4,22.8,27.2,31.2,33.0,31.9,28.5,27.5,28.0,27.9,25.0,21.5,13.9,15.7,19.6,23.5,26.4,27.6,26.0,25.0,24.5,22.1,18.9,15.2,27.6,30.2,34.8,38.8,40.2,37.1,31.7,30.4,31.9,34.2,32.0,28.7,1.0,1.0,1.0,1.0,1.0,73.0,307.0,242.0,109.0,17.0,3.0,1.0,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,5.0,14.0,15.0,7.0,2.0,1.0,0.0,9.7,10.2,10.8,11.4,11.3,9.5,7.1,6.2,8.0,10.1,9.9,9.6,gujarat_ahmedabad
4,22.0,24.5,27.7,30.7,30.4,26.4,24.2,23.7,24.1,24.7,23.6,22.0,15.3,17.3,20.4,23.2,23.6,22.9,22.0,21.4,20.9,19.8,17.9,15.8,28.5,31.3,34.5,37.8,38.0,31.4,27.6,27.1,28.2,29.7,29.2,28.2,2.0,2.0,6.0,7.0,16.0,171.0,174.0,149.0,151.0,79.0,22.0,6.0,,,,,,,,,,,,,0.0,0.0,1.0,1.0,3.0,12.0,16.0,15.0,12.0,7.0,3.0,1.0,9.8,10.3,10.8,11.3,11.5,8.6,6.7,6.0,7.0,9.3,9.5,9.5,maharashtra_ahmednagar
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,18.7,22.6,27.1,30.9,31.5,29.5,26.8,26.5,26.3,25.1,22.3,19.1,12.9,16.2,19.9,23.8,25.8,25.9,24.6,24.2,23.6,21.1,17.3,13.7,25.1,29.1,34.1,38.0,38.0,34.0,30.1,29.9,29.9,29.4,27.6,24.9,16.0,17.0,20.0,32.0,73.0,245.0,305.0,280.0,201.0,78.0,11.0,11.0,,,,,,,,,,,,,2.0,2.0,3.0,5.0,9.0,15.0,19.0,20.0,16.0,7.0,1.0,1.0,9.3,9.7,10.6,11.2,11.4,9.8,7.6,7.1,7.5,8.6,9.2,9.2,jharkhand_west singhbhum
467,24.3,25.4,27.5,29.9,31.6,31.0,30.3,29.6,29.3,27.4,25.5,24.5,20.9,21.2,22.9,25.9,27.6,27.2,26.5,26.0,25.8,24.5,22.9,21.8,28.0,30.0,32.8,35.0,37.0,35.9,35.0,34.2,33.9,31.3,28.5,27.5,34.0,16.0,21.0,31.0,44.0,43.0,52.0,86.0,91.0,233.0,323.0,174.0,,,,,,,,,,,,,5.0,3.0,3.0,4.0,5.0,7.0,8.0,10.0,11.0,15.0,15.0,11.0,7.3,8.1,8.9,9.8,10.9,11.1,11.0,10.8,10.3,8.8,7.2,6.9,telangana_yadadri
468,23.9,26.3,29.6,32.4,33.1,28.5,26.6,26.0,26.0,25.8,25.0,23.8,17.6,19.7,22.8,26.0,27.6,24.9,23.7,23.2,22.8,21.5,19.4,17.6,30.2,32.8,35.9,38.4,38.9,33.1,30.5,29.7,29.9,30.6,30.6,30.0,5.0,2.0,6.0,14.0,36.0,145.0,177.0,178.0,153.0,106.0,21.0,4.0,,,,,,,,,,,,,1.0,0.0,1.0,2.0,5.0,12.0,15.0,15.0,12.0,9.0,2.0,1.0,9.7,10.1,10.7,11.2,11.4,9.1,8.0,7.8,7.9,9.0,9.2,9.3,karnataka_yadgir
469,12.6,15.8,21.1,27.5,31.2,31.6,28.8,28.0,26.9,23.8,19.1,14.3,6.7,9.4,13.7,19.2,23.3,25.8,25.7,25.1,22.8,17.5,12.7,8.2,19.2,22.6,28.5,35.3,38.4,37.0,32.5,31.6,31.2,30.4,26.4,21.4,48.0,62.0,42.0,27.0,28.0,103.0,281.0,275.0,156.0,14.0,5.0,22.0,,,,,,,,,,,,,3.0,4.0,4.0,3.0,5.0,9.0,18.0,18.0,9.0,1.0,1.0,1.0,7.9,9.2,10.6,11.5,12.1,11.4,9.4,9.1,9.2,10.0,9.5,8.5,haryana_yamunanagar


In [14]:
# Prepare the main dataset for integration
df_shorten("kharif")['Integration_ID'] = df_shorten("kharif").apply(lambda x: x['State'] + "_" + x['District'], axis=1)
df_shorten("kharif")

Unnamed: 0,State,Cluster,District,Sub-District,Block,GP,Season,Crop,Area Sown (Ha),Area Insured (Ha),SI Per Ha (Inr/Ha),Sum Insured (Inr),Indemnity Level,2000 Yield,2001 Yield,2002 Yield,2003 Yield,2004 Yield,2005 Yield,2006 Yield,2007 Yield,2008 Yield,2009 Yield,2010 Yield,2011 Yield,2012 Yield,2013 Yield,2014 Yield,2015 Yield,2016 Yield,2017 Yield,2018 Yield,ID,GP_Group,District_Group,Integration_ID
1850,andhra pradesh,1,chittoor,b.kothakota,,badikayalapalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3804.00,3716.00,3943.0,,,1850,andhra pradesh|chittoor|b.kothakota|nan|badika...,andhra pradesh|chittoor|paddy,andhra pradesh_chittoor
1851,andhra pradesh,1,chittoor,b.kothakota,,bayyappagaripalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1851,andhra pradesh|chittoor|b.kothakota|nan|bayyap...,andhra pradesh|chittoor|paddy,andhra pradesh_chittoor
1852,andhra pradesh,1,chittoor,b.kothakota,,beerangi,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1852,andhra pradesh|chittoor|b.kothakota|nan|beeran...,andhra pradesh|chittoor|paddy,andhra pradesh_chittoor
1853,andhra pradesh,1,chittoor,b.kothakota,,gattu,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1853,andhra pradesh|chittoor|b.kothakota|nan|gattu|...,andhra pradesh|chittoor|paddy,andhra pradesh_chittoor
1854,andhra pradesh,1,chittoor,b.kothakota,,gollapalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3804.00,3716.00,3943.0,,,1854,andhra pradesh|chittoor|b.kothakota|nan|gollap...,andhra pradesh|chittoor|paddy,andhra pradesh_chittoor
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
511237,west bengal,2,uttar dinajpur,,hemtabad,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1223.840000,1533.2600,1657.400000,1479.010000,1814.60,1748.97,1781.37,1942.86,2086.49,2346.79,,,,511237,west bengal|uttar dinajpur|nan|hemtabad|nan|au...,west bengal|uttar dinajpur|aus paddy,west bengal_uttar dinajpur
511238,west bengal,2,uttar dinajpur,,itahar,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1223.840000,1533.2600,1657.400000,1479.010000,1814.60,1748.97,1781.37,1942.86,2086.49,2346.79,,,,511238,west bengal|uttar dinajpur|nan|itahar|nan|aus ...,west bengal|uttar dinajpur|aus paddy,west bengal_uttar dinajpur
511239,west bengal,2,uttar dinajpur,,kaliaganj,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,924.460000,1841.0700,2069.640000,1053.080000,1641.16,1406.63,1781.37,1942.86,2086.49,2346.79,,,,511239,west bengal|uttar dinajpur|nan|kaliaganj|nan|a...,west bengal|uttar dinajpur|aus paddy,west bengal_uttar dinajpur
511240,west bengal,2,uttar dinajpur,,karandighi,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1323.630000,1456.3100,1554.340000,1479.010000,1814.60,1748.97,1781.37,1942.86,2086.49,2346.79,,,,511240,west bengal|uttar dinajpur|nan|karandighi|nan|...,west bengal|uttar dinajpur|aus paddy,west bengal_uttar dinajpur


In [15]:
# Join the additional with the main dataset
df_kharif = pd.merge(df_shorten("kharif"), 
                   df_adds_climate_kharif, 
                   left_on='Integration_ID', 
                   right_on='Integration_ID', 
                   how='left', 
                   suffixes=['',''])
df_kharif = df_kharif.drop(columns=['Integration_ID'])
df_kharif

Unnamed: 0,State,Cluster,District,Sub-District,Block,GP,Season,Crop,Area Sown (Ha),Area Insured (Ha),SI Per Ha (Inr/Ha),Sum Insured (Inr),Indemnity Level,2000 Yield,2001 Yield,2002 Yield,2003 Yield,2004 Yield,2005 Yield,2006 Yield,2007 Yield,2008 Yield,2009 Yield,2010 Yield,2011 Yield,2012 Yield,2013 Yield,2014 Yield,2015 Yield,2016 Yield,2017 Yield,2018 Yield,ID,GP_Group,District_Group,Average Temperature January (C),Average Temperature February (C),Average Temperature March (C),Average Temperature April (C),Average Temperature May (C),Average Temperature June (C),Average Temperature July (C),Average Temperature August (C),Average Temperature September (C),Average Temperature October (C),Average Temperature November (C),Average Temperature December (C),Min. Temperature January (C),Min. Temperature February (C),Min. Temperature March (C),Min. Temperature April (C),Min. Temperature May (C),Min. Temperature June (C),Min. Temperature July (C),Min. Temperature August (C),Min. Temperature September (C),Min. Temperature October (C),Min. Temperature November (C),Min. Temperature December (C),Max. Temperature January (C),Max. Temperature February (C),Max. Temperature March (C),Max. Temperature April (C),Max. Temperature May (C),Max. Temperature June (C),Max. Temperature July (C),Max. Temperature August (C),Max. Temperature September (C),Max. Temperature October (C),Max. Temperature November (C),Max. Temperature December (C),Precipitation January,Precipitation February,Precipitation March,Precipitation April,Precipitation May,Precipitation June,Precipitation July,Precipitation August,Precipitation September,Precipitation October,Precipitation November,Precipitation December,Humidity (%) January,Humidity (%) February,Humidity (%) March,Humidity (%) April,Humidity (%) May,Humidity (%) June,Humidity (%) July,Humidity (%) August,Humidity (%) September,Humidity (%) October,Humidity (%) November,Humidity (%) December,Rainy days (d) January,Rainy days (d) February,Rainy days (d) March,Rainy days (d) April,Rainy days (d) May,Rainy days (d) June,Rainy days (d) July,Rainy days (d) August,Rainy days (d) September,Rainy days (d) October,Rainy days (d) November,Rainy days (d) December,Average Sun Hours January,Average Sun Hours February,Average Sun Hours March,Average Sun Hours April,Average Sun Hours May,Average Sun Hours June,Average Sun Hours July,Average Sun Hours August,Average Sun Hours September,Average Sun Hours October,Average Sun Hours November,Average Sun Hours December
0,andhra pradesh,1,chittoor,b.kothakota,,badikayalapalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3804.00,3716.00,3943.0,,,1850,andhra pradesh|chittoor|b.kothakota|nan|badika...,andhra pradesh|chittoor|paddy,22.3,24.4,27.2,29.6,30.9,29.2,28.4,27.8,27.3,25.7,23.5,22.1,16.4,17.7,20.4,23.6,25.9,25.4,24.8,24.3,23.6,22.0,19.4,17.3,28.4,31.3,34.4,36.4,37.1,34.2,33.1,32.4,32.0,30.2,28.0,27.1,9.0,7.0,13.0,21.0,62.0,85.0,83.0,96.0,121.0,153.0,107.0,52.0,,,,,,,,,,,,,2.0,1.0,2.0,3.0,8.0,10.0,9.0,10.0,11.0,13.0,10.0,5.0,7.6,8.8,9.8,10.3,11.0,10.9,10.2,9.9,9.7,8.3,6.7,6.3
1,andhra pradesh,1,chittoor,b.kothakota,,bayyappagaripalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1851,andhra pradesh|chittoor|b.kothakota|nan|bayyap...,andhra pradesh|chittoor|paddy,22.3,24.4,27.2,29.6,30.9,29.2,28.4,27.8,27.3,25.7,23.5,22.1,16.4,17.7,20.4,23.6,25.9,25.4,24.8,24.3,23.6,22.0,19.4,17.3,28.4,31.3,34.4,36.4,37.1,34.2,33.1,32.4,32.0,30.2,28.0,27.1,9.0,7.0,13.0,21.0,62.0,85.0,83.0,96.0,121.0,153.0,107.0,52.0,,,,,,,,,,,,,2.0,1.0,2.0,3.0,8.0,10.0,9.0,10.0,11.0,13.0,10.0,5.0,7.6,8.8,9.8,10.3,11.0,10.9,10.2,9.9,9.7,8.3,6.7,6.3
2,andhra pradesh,1,chittoor,b.kothakota,,beerangi,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1852,andhra pradesh|chittoor|b.kothakota|nan|beeran...,andhra pradesh|chittoor|paddy,22.3,24.4,27.2,29.6,30.9,29.2,28.4,27.8,27.3,25.7,23.5,22.1,16.4,17.7,20.4,23.6,25.9,25.4,24.8,24.3,23.6,22.0,19.4,17.3,28.4,31.3,34.4,36.4,37.1,34.2,33.1,32.4,32.0,30.2,28.0,27.1,9.0,7.0,13.0,21.0,62.0,85.0,83.0,96.0,121.0,153.0,107.0,52.0,,,,,,,,,,,,,2.0,1.0,2.0,3.0,8.0,10.0,9.0,10.0,11.0,13.0,10.0,5.0,7.6,8.8,9.8,10.3,11.0,10.9,10.2,9.9,9.7,8.3,6.7,6.3
3,andhra pradesh,1,chittoor,b.kothakota,,gattu,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3802.00,3848.00,4094.0,,,1853,andhra pradesh|chittoor|b.kothakota|nan|gattu|...,andhra pradesh|chittoor|paddy,22.3,24.4,27.2,29.6,30.9,29.2,28.4,27.8,27.3,25.7,23.5,22.1,16.4,17.7,20.4,23.6,25.9,25.4,24.8,24.3,23.6,22.0,19.4,17.3,28.4,31.3,34.4,36.4,37.1,34.2,33.1,32.4,32.0,30.2,28.0,27.1,9.0,7.0,13.0,21.0,62.0,85.0,83.0,96.0,121.0,153.0,107.0,52.0,,,,,,,,,,,,,2.0,1.0,2.0,3.0,8.0,10.0,9.0,10.0,11.0,13.0,10.0,5.0,7.6,8.8,9.8,10.3,11.0,10.9,10.2,9.9,9.7,8.3,6.7,6.3
4,andhra pradesh,1,chittoor,b.kothakota,,gollapalle,kharif,paddy,9.336967,7.330287,70000.0,513120.118930,0.8,,,,,,,2218.792203,2880.0469,2423.768505,2185.001109,2292.00,4050.00,3742.00,4000.00,3804.00,3716.00,3943.0,,,1854,andhra pradesh|chittoor|b.kothakota|nan|gollap...,andhra pradesh|chittoor|paddy,22.3,24.4,27.2,29.6,30.9,29.2,28.4,27.8,27.3,25.7,23.5,22.1,16.4,17.7,20.4,23.6,25.9,25.4,24.8,24.3,23.6,22.0,19.4,17.3,28.4,31.3,34.4,36.4,37.1,34.2,33.1,32.4,32.0,30.2,28.0,27.1,9.0,7.0,13.0,21.0,62.0,85.0,83.0,96.0,121.0,153.0,107.0,52.0,,,,,,,,,,,,,2.0,1.0,2.0,3.0,8.0,10.0,9.0,10.0,11.0,13.0,10.0,5.0,7.6,8.8,9.8,10.3,11.0,10.9,10.2,9.9,9.7,8.3,6.7,6.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
541945,west bengal,2,uttar dinajpur,,kaliaganj,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,924.460000,1841.0700,2069.640000,1053.080000,1641.16,1406.63,1781.37,1942.86,2086.49,2346.79,,,,511239,west bengal|uttar dinajpur|nan|kaliaganj|nan|a...,west bengal|uttar dinajpur|aus paddy,17.7,21.0,26.2,31.8,34.7,31.9,26.9,25.7,26.4,26.2,22.8,18.8,10.9,13.8,18.4,23.9,28.2,27.4,24.5,23.6,22.8,20.0,16.3,12.2,24.7,28.2,33.6,38.8,41.0,37.1,30.0,28.6,30.5,32.4,29.5,25.8,8.0,13.0,6.0,3.0,8.0,136.0,394.0,337.0,125.0,19.0,10.0,8.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,16.0,17.0,10.0,2.0,1.0,1.0,9.2,10.0,10.7,11.4,11.9,10.7,6.5,5.4,8.4,10.1,9.7,9.3
541946,west bengal,2,uttar dinajpur,,karandighi,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1323.630000,1456.3100,1554.340000,1479.010000,1814.60,1748.97,1781.37,1942.86,2086.49,2346.79,,,,511240,west bengal|uttar dinajpur|nan|karandighi|nan|...,west bengal|uttar dinajpur|aus paddy,17.7,21.0,26.2,31.8,34.7,31.9,26.9,25.7,26.4,26.2,22.8,18.8,10.9,13.8,18.4,23.9,28.2,27.4,24.5,23.6,22.8,20.0,16.3,12.2,24.7,28.2,33.6,38.8,41.0,37.1,30.0,28.6,30.5,32.4,29.5,25.8,8.0,13.0,6.0,3.0,8.0,136.0,394.0,337.0,125.0,19.0,10.0,8.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,16.0,17.0,10.0,2.0,1.0,1.0,9.2,10.0,10.7,11.4,11.9,10.7,6.5,5.4,8.4,10.1,9.7,9.3
541947,west bengal,2,uttar dinajpur,,karandighi,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1323.630000,1456.3100,1554.340000,1479.010000,1814.60,1748.97,1781.37,1942.86,2086.49,2346.79,,,,511240,west bengal|uttar dinajpur|nan|karandighi|nan|...,west bengal|uttar dinajpur|aus paddy,17.7,21.0,26.2,31.8,34.7,31.9,26.9,25.7,26.4,26.2,22.8,18.8,10.9,13.8,18.4,23.9,28.2,27.4,24.5,23.6,22.8,20.0,16.3,12.2,24.7,28.2,33.6,38.8,41.0,37.1,30.0,28.6,30.5,32.4,29.5,25.8,8.0,13.0,6.0,3.0,8.0,136.0,394.0,337.0,125.0,19.0,10.0,8.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,16.0,17.0,10.0,2.0,1.0,1.0,9.2,10.0,10.7,11.4,11.9,10.7,6.5,5.4,8.4,10.1,9.7,9.3
541948,west bengal,2,uttar dinajpur,,raiganj,,kharif,aus paddy,,1.857143,60500.0,112357.142857,0.9,,,,,,1591.45,1323.630000,1456.3100,1554.340000,1620.980000,1872.41,1863.08,1781.37,1942.86,2086.49,2346.79,,,,511241,west bengal|uttar dinajpur|nan|raiganj|nan|aus...,west bengal|uttar dinajpur|aus paddy,17.7,21.0,26.2,31.8,34.7,31.9,26.9,25.7,26.4,26.2,22.8,18.8,10.9,13.8,18.4,23.9,28.2,27.4,24.5,23.6,22.8,20.0,16.3,12.2,24.7,28.2,33.6,38.8,41.0,37.1,30.0,28.6,30.5,32.4,29.5,25.8,8.0,13.0,6.0,3.0,8.0,136.0,394.0,337.0,125.0,19.0,10.0,8.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,16.0,17.0,10.0,2.0,1.0,1.0,9.2,10.0,10.7,11.4,11.9,10.7,6.5,5.4,8.4,10.1,9.7,9.3


In [19]:
# Prepare the additional data for integration
df_adds_climate_rabi['Integration_ID'] = df_adds_climate_rabi.apply(lambda x: x['District ID'].split('_')[0].lower() + "_" + x['District ID'].split('_')[2].lower(), axis=1)
df_adds_climate_rabi = df_adds_climate_rabi[df_adds_climate_rabi.columns.tolist()[15:]]

for col in df_adds_climate_rabi.columns.tolist()[:-3]:
    try:
        df_adds_climate_rabi[col] = df_adds_climate_rabi.apply(lambda x: str(x[col]).split(" ")[0], axis=1)
        df_adds_climate_rabi[col] = pd.to_numeric(df_adds_climate_rabi[col], errors='coerce')
    except:
        pass

df_adds_climate_rabi = df_adds_climate_rabi.drop(columns=['index_ref'])
df_adds_climate_rabi.replace('unknown', np.nan, inplace=True)
df_adds_climate_rabi

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_adds_climate_rabi[col] = df_adds_climate_rabi.apply(lambda x: str(x[col]).split(" ")[0], axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_adds_climate_rabi[col] = pd.to_numeric(df_adds_climate_rabi[col], errors='coerce')


Unnamed: 0,Average Temperature January (C),Average Temperature February (C),Average Temperature March (C),Average Temperature April (C),Average Temperature May (C),Average Temperature June (C),Average Temperature July (C),Average Temperature August (C),Average Temperature September (C),Average Temperature October (C),Average Temperature November (C),Average Temperature December (C),Min. Temperature January (C),Min. Temperature February (C),Min. Temperature March (C),Min. Temperature April (C),Min. Temperature May (C),Min. Temperature June (C),Min. Temperature July (C),Min. Temperature August (C),Min. Temperature September (C),Min. Temperature October (C),Min. Temperature November (C),Min. Temperature December (C),Max. Temperature January (C),Max. Temperature February (C),Max. Temperature March (C),Max. Temperature April (C),Max. Temperature May (C),Max. Temperature June (C),Max. Temperature July (C),Max. Temperature August (C),Max. Temperature September (C),Max. Temperature October (C),Max. Temperature November (C),Max. Temperature December (C),Precipitation January,Precipitation February,Precipitation March,Precipitation April,Precipitation May,Precipitation June,Precipitation July,Precipitation August,Precipitation September,Precipitation October,Precipitation November,Precipitation December,Humidity (%) January,Humidity (%) February,Humidity (%) March,Humidity (%) April,Humidity (%) May,Humidity (%) June,Humidity (%) July,Humidity (%) August,Humidity (%) September,Humidity (%) October,Humidity (%) November,Humidity (%) December,Rainy days (d) January,Rainy days (d) February,Rainy days (d) March,Rainy days (d) April,Rainy days (d) May,Rainy days (d) June,Rainy days (d) July,Rainy days (d) August,Rainy days (d) September,Rainy days (d) October,Rainy days (d) November,Rainy days (d) December,Average Sun Hours January,Average Sun Hours February,Average Sun Hours March,Average Sun Hours April,Average Sun Hours May,Average Sun Hours June,Average Sun Hours July,Average Sun Hours August,Average Sun Hours September,Average Sun Hours October,Average Sun Hours November,Average Sun Hours December,Integration_ID
0,22.8,26.1,29.6,33.6,35.5,31.0,27.0,26.2,26.6,26.4,24.6,22.5,16.1,19.1,22.4,26.5,29.6,27.2,24.7,24.0,23.6,21.7,19.1,16.2,29.2,32.7,36.3,40.1,41.3,35.3,29.9,29.0,30.1,31.0,29.9,28.5,13.0,8.0,13.0,10.0,15.0,185.0,294.0,282.0,156.0,60.0,16.0,3.0,,,,,,,,,,,,,1.0,1.0,2.0,2.0,2.0,10.0,16.0,15.0,11.0,5.0,2.0,1.0,9.6,10.1,10.7,11.3,11.7,10.6,8.7,8.1,8.5,9.5,9.6,9.6,telangana_adilabad
1,18.0,21.1,26.0,31.4,33.7,30.8,26.2,25.1,25.8,25.9,22.8,19.1,11.1,13.9,18.4,23.8,27.1,26.2,23.8,23.0,22.4,19.8,16.4,12.4,25.0,28.2,33.2,38.3,40.0,36.0,29.3,27.9,29.8,31.9,29.3,26.0,5.0,7.0,5.0,3.0,8.0,103.0,343.0,288.0,122.0,17.0,8.0,7.0,,,,,,,,,,,,,1.0,1.0,1.0,1.0,2.0,8.0,15.0,16.0,9.0,2.0,1.0,1.0,9.3,10.0,10.7,11.4,11.8,10.4,6.0,5.0,8.1,10.1,9.7,9.4,madhya pradesh_agar malwa
2,14.1,17.8,23.7,29.9,33.4,33.5,29.9,28.6,28.2,26.1,21.2,15.9,8.0,11.1,16.0,21.7,26.2,28.4,26.9,25.9,24.4,19.8,14.7,9.6,20.6,24.6,31.2,37.7,40.2,38.6,33.5,32.0,32.5,32.5,28.0,22.7,15.0,21.0,12.0,8.0,12.0,78.0,225.0,231.0,108.0,22.0,6.0,8.0,,,,,,,,,,,,,2.0,2.0,2.0,2.0,3.0,6.0,15.0,15.0,9.0,2.0,1.0,1.0,8.5,9.6,10.6,11.5,12.1,11.7,9.3,8.6,9.1,10.0,9.6,9.0,uttar pradesh_agra
3,20.4,22.8,27.2,31.2,33.0,31.9,28.5,27.5,28.0,27.9,25.0,21.5,13.9,15.7,19.6,23.5,26.4,27.6,26.0,25.0,24.5,22.1,18.9,15.2,27.6,30.2,34.8,38.8,40.2,37.1,31.7,30.4,31.9,34.2,32.0,28.7,1.0,1.0,1.0,1.0,1.0,73.0,307.0,242.0,109.0,17.0,3.0,1.0,,,,,,,,,,,,,0.0,0.0,0.0,0.0,0.0,5.0,14.0,15.0,7.0,2.0,1.0,0.0,9.7,10.2,10.8,11.4,11.3,9.5,7.1,6.2,8.0,10.1,9.9,9.6,gujarat_ahmedabad
4,22.0,24.5,27.7,30.7,30.4,26.4,24.2,23.7,24.1,24.7,23.6,22.0,15.3,17.3,20.4,23.2,23.6,22.9,22.0,21.4,20.9,19.8,17.9,15.8,28.5,31.3,34.5,37.8,38.0,31.4,27.6,27.1,28.2,29.7,29.2,28.2,2.0,2.0,6.0,7.0,16.0,171.0,174.0,149.0,151.0,79.0,22.0,6.0,,,,,,,,,,,,,0.0,0.0,1.0,1.0,3.0,12.0,16.0,15.0,12.0,7.0,3.0,1.0,9.8,10.3,10.8,11.3,11.5,8.6,6.7,6.0,7.0,9.3,9.5,9.5,maharashtra_ahmednagar
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
435,18.7,22.6,27.1,30.9,31.5,29.5,26.8,26.5,26.3,25.1,22.3,19.1,12.9,16.2,19.9,23.8,25.8,25.9,24.6,24.2,23.6,21.1,17.3,13.7,25.1,29.1,34.1,38.0,38.0,34.0,30.1,29.9,29.9,29.4,27.6,24.9,16.0,17.0,20.0,32.0,73.0,245.0,305.0,280.0,201.0,78.0,11.0,11.0,,,,,,,,,,,,,2.0,2.0,3.0,5.0,9.0,15.0,19.0,20.0,16.0,7.0,1.0,1.0,9.3,9.7,10.6,11.2,11.4,9.8,7.6,7.1,7.5,8.6,9.2,9.2,jharkhand_west singhbhum
436,24.3,25.4,27.5,29.9,31.6,31.0,30.3,29.6,29.3,27.4,25.5,24.5,20.9,21.2,22.9,25.9,27.6,27.2,26.5,26.0,25.8,24.5,22.9,21.8,28.0,30.0,32.8,35.0,37.0,35.9,35.0,34.2,33.9,31.3,28.5,27.5,34.0,16.0,21.0,31.0,44.0,43.0,52.0,86.0,91.0,233.0,323.0,174.0,,,,,,,,,,,,,5.0,3.0,3.0,4.0,5.0,7.0,8.0,10.0,11.0,15.0,15.0,11.0,7.3,8.1,8.9,9.8,10.9,11.1,11.0,10.8,10.3,8.8,7.2,6.9,telangana_yadadri
437,23.9,26.3,29.6,32.4,33.1,28.5,26.6,26.0,26.0,25.8,25.0,23.8,17.6,19.7,22.8,26.0,27.6,24.9,23.7,23.2,22.8,21.5,19.4,17.6,30.2,32.8,35.9,38.4,38.9,33.1,30.5,29.7,29.9,30.6,30.6,30.0,5.0,2.0,6.0,14.0,36.0,145.0,177.0,178.0,153.0,106.0,21.0,4.0,,,,,,,,,,,,,1.0,0.0,1.0,2.0,5.0,12.0,15.0,15.0,12.0,9.0,2.0,1.0,9.7,10.1,10.7,11.2,11.4,9.1,8.0,7.8,7.9,9.0,9.2,9.3,karnataka_yadgir
438,12.6,15.8,21.1,27.5,31.2,31.6,28.8,28.0,26.9,23.8,19.1,14.3,6.7,9.4,13.7,19.2,23.3,25.8,25.7,25.1,22.8,17.5,12.7,8.2,19.2,22.6,28.5,35.3,38.4,37.0,32.5,31.6,31.2,30.4,26.4,21.4,48.0,62.0,42.0,27.0,28.0,103.0,281.0,275.0,156.0,14.0,5.0,22.0,,,,,,,,,,,,,3.0,4.0,4.0,3.0,5.0,9.0,18.0,18.0,9.0,1.0,1.0,1.0,7.9,9.2,10.6,11.5,12.1,11.4,9.4,9.1,9.2,10.0,9.5,8.5,haryana_yamunanagar


In [20]:
# Prepare the main dataset for integration
df_shorten("rabi")['Integration_ID'] = df_shorten("rabi").apply(lambda x: x['State'] + "_" + x['District'], axis=1)
df_shorten("rabi")

Unnamed: 0,State,Cluster,District,Sub-District,Block,GP,Season,Crop,Area Sown (Ha),Area Insured (Ha),SI Per Ha (Inr/Ha),Sum Insured (Inr),Indemnity Level,2002 Yield,2003 Yield,2004 Yield,2005 Yield,2006 Yield,2007 Yield,2008 Yield,2009 Yield,2010 Yield,2011 Yield,2012 Yield,2013 Yield,2014 Yield,2015 Yield,2016 Yield,2000 Yield,2001 Yield,ID,2017 Yield,2018 Yield,GP_Group,District_Group,Integration_ID
5679,andhra pradesh,1.0,east godavari,1,,,rabi,red chilli,,132.9471,112500.0,14956548.75,0.9,,,,,2987.00,2480.00,2480.00,4183.50,820.00,1180.00,1212.00,547.00,865.00,,,,,5679,,,andhra pradesh|east godavari|1|nan|nan|red chilli,andhra pradesh|east godavari|red chilli,andhra pradesh_east godavari
5685,andhra pradesh,1.0,vishakhapatnam,1,,,rabi,red chilli,,3.3880,93750.0,317625.00,0.9,,,,,2766.00,829.00,1243.00,3835.50,1177.00,1155.00,1212.00,940.00,762.00,,,,,5685,,,andhra pradesh|vishakhapatnam|1|nan|nan|red ch...,andhra pradesh|vishakhapatnam|red chilli,andhra pradesh_vishakhapatnam
6113,bihar,1.0,arwal,,,,rabi,bengal gram (chana),402.33,,27950.0,,0.8,,,,773.00,967.00,892.00,810.00,781.00,670.00,835.00,895.00,379.00,596.00,,,,,6113,,,bihar|arwal|nan|nan|nan|bengal gram (chana),bihar|arwal|bengal gram (chana),bihar_arwal
6114,bihar,2.0,aurangabad,,,,rabi,bengal gram (chana),3689.33,,28750.0,,0.8,,,,452.00,1082.00,1075.00,801.00,482.00,502.50,955.00,1173.00,415.00,888.00,,,,,6114,,,bihar|aurangabad|nan|nan|nan|bengal gram (chana),bihar|aurangabad|bengal gram (chana),bihar_aurangabad
6115,bihar,3.0,banka,,,,rabi,bengal gram (chana),1129.50,,25988.0,,0.8,,,,794.00,873.00,830.00,1027.00,500.00,652.50,1165.00,1515.00,419.00,991.00,,,,,6115,,,bihar|banka|nan|nan|nan|bengal gram (chana),bihar|banka|bengal gram (chana),bihar_banka
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
616785,west bengal,1.0,howrah,bally-jagachha,,,rabi,wheat,,0.2000,56810.0,11362.00,0.9,,,,1610.15,1475.36,2179.77,1604.97,952.84,2559.50,3060.84,1541.83,1941.00,2940.80,1987.48,,,,616785,,,west bengal|howrah|bally-jagachha|nan|nan|wheat,west bengal|howrah|wheat,west bengal_howrah
616786,west bengal,1.0,howrah,uluberia-ii,,,rabi,wheat,,0.2000,56810.0,11362.00,0.9,,,,1582.98,1522.03,2179.77,1604.97,2084.77,715.10,502.04,819.81,866.19,797.49,1151.15,,,,616786,,,west bengal|howrah|uluberia-ii|nan|nan|wheat,west bengal|howrah|wheat,west bengal_howrah
616787,west bengal,3.0,purba medinipur,haldia,,,rabi,wheat,,0.7500,38557.0,28917.75,0.9,,,,2136.41,2323.66,2599.19,2432.22,2084.77,1987.31,1417.77,2804.98,2382.03,2518.48,1362.95,,,,616787,,,west bengal|purba medinipur|haldia|nan|nan|wheat,west bengal|purba medinipur|wheat,west bengal_purba medinipur
616788,west bengal,3.0,purba medinipur,mohisadal-ii,,,rabi,wheat,,0.7500,38557.0,28917.75,0.9,,,,2136.41,2323.66,2599.19,2432.22,2450.12,1987.31,1417.77,2804.98,2382.03,2518.48,1362.95,,,,616788,,,west bengal|purba medinipur|mohisadal-ii|nan|n...,west bengal|purba medinipur|wheat,west bengal_purba medinipur


In [21]:
# Join the additional with the main dataset
df_rabi = pd.merge(df_shorten("rabi"), 
                   df_adds_climate_rabi, 
                   left_on='Integration_ID', 
                   right_on='Integration_ID', 
                   how='left', 
                   suffixes=['',''])
df_rabi = df_rabi.drop(columns=['Integration_ID'])
df_rabi

Unnamed: 0,State,Cluster,District,Sub-District,Block,GP,Season,Crop,Area Sown (Ha),Area Insured (Ha),SI Per Ha (Inr/Ha),Sum Insured (Inr),Indemnity Level,2002 Yield,2003 Yield,2004 Yield,2005 Yield,2006 Yield,2007 Yield,2008 Yield,2009 Yield,2010 Yield,2011 Yield,2012 Yield,2013 Yield,2014 Yield,2015 Yield,2016 Yield,2000 Yield,2001 Yield,ID,2017 Yield,2018 Yield,GP_Group,District_Group,Average Temperature January (C),Average Temperature February (C),Average Temperature March (C),Average Temperature April (C),Average Temperature May (C),Average Temperature June (C),Average Temperature July (C),Average Temperature August (C),Average Temperature September (C),Average Temperature October (C),Average Temperature November (C),Average Temperature December (C),Min. Temperature January (C),Min. Temperature February (C),Min. Temperature March (C),Min. Temperature April (C),Min. Temperature May (C),Min. Temperature June (C),Min. Temperature July (C),Min. Temperature August (C),Min. Temperature September (C),Min. Temperature October (C),Min. Temperature November (C),Min. Temperature December (C),Max. Temperature January (C),Max. Temperature February (C),Max. Temperature March (C),Max. Temperature April (C),Max. Temperature May (C),Max. Temperature June (C),Max. Temperature July (C),Max. Temperature August (C),Max. Temperature September (C),Max. Temperature October (C),Max. Temperature November (C),Max. Temperature December (C),Precipitation January,Precipitation February,Precipitation March,Precipitation April,Precipitation May,Precipitation June,Precipitation July,Precipitation August,Precipitation September,Precipitation October,Precipitation November,Precipitation December,Humidity (%) January,Humidity (%) February,Humidity (%) March,Humidity (%) April,Humidity (%) May,Humidity (%) June,Humidity (%) July,Humidity (%) August,Humidity (%) September,Humidity (%) October,Humidity (%) November,Humidity (%) December,Rainy days (d) January,Rainy days (d) February,Rainy days (d) March,Rainy days (d) April,Rainy days (d) May,Rainy days (d) June,Rainy days (d) July,Rainy days (d) August,Rainy days (d) September,Rainy days (d) October,Rainy days (d) November,Rainy days (d) December,Average Sun Hours January,Average Sun Hours February,Average Sun Hours March,Average Sun Hours April,Average Sun Hours May,Average Sun Hours June,Average Sun Hours July,Average Sun Hours August,Average Sun Hours September,Average Sun Hours October,Average Sun Hours November,Average Sun Hours December
0,andhra pradesh,1.0,east godavari,1,,,rabi,red chilli,,132.9471,112500.0,14956548.75,0.9,,,,,2987.00,2480.00,2480.00,4183.50,820.00,1180.00,1212.00,547.00,865.00,,,,,5679,,,andhra pradesh|east godavari|1|nan|nan|red chilli,andhra pradesh|east godavari|red chilli,23.8,25.3,27.8,29.9,31.9,30.4,28.6,28.1,28.0,27.3,25.8,24.2,20.3,21.4,23.8,26.4,28.5,27.6,26.3,26.0,25.8,24.9,23.0,21.2,27.5,29.5,32.4,34.4,36.3,33.8,31.3,30.7,30.7,30.1,28.7,27.5,12.0,14.0,12.0,15.0,62.0,160.0,222.0,218.0,220.0,194.0,84.0,21.0,,,,,,,,,,,,,2.0,2.0,2.0,2.0,4.0,11.0,15.0,15.0,15.0,13.0,6.0,2.0,8.6,9.0,9.5,9.5,10.2,10.6,10.1,9.7,9.0,8.7,8.9,8.7
1,andhra pradesh,1.0,vishakhapatnam,1,,,rabi,red chilli,,3.3880,93750.0,317625.00,0.9,,,,,2766.00,829.00,1243.00,3835.50,1177.00,1155.00,1212.00,940.00,762.00,,,,,5685,,,andhra pradesh|vishakhapatnam|1|nan|nan|red ch...,andhra pradesh|vishakhapatnam|red chilli,22.7,24.7,27.5,29.4,30.9,29.8,28.2,27.7,27.6,26.8,24.9,23.1,18.9,20.6,23.9,26.3,27.9,27.3,26.2,25.8,25.6,24.3,21.8,19.6,26.6,29.0,31.8,33.6,34.9,32.9,30.6,30.0,30.0,29.6,28.1,26.6,11.0,12.0,10.0,10.0,51.0,125.0,187.0,205.0,182.0,163.0,87.0,28.0,,,,,,,,,,,,,2.0,2.0,2.0,1.0,3.0,10.0,14.0,14.0,14.0,12.0,5.0,2.0,9.0,9.4,10.1,10.4,10.7,11.0,10.7,10.3,9.4,8.9,9.1,9.0
2,bihar,1.0,arwal,,,,rabi,bengal gram (chana),402.33,,27950.0,,0.8,,,,773.00,967.00,892.00,810.00,781.00,670.00,835.00,895.00,379.00,596.00,,,,,6113,,,bihar|arwal|nan|nan|nan|bengal gram (chana),bihar|arwal|bengal gram (chana),16.1,20.0,25.9,31.3,33.0,32.0,29.0,28.5,27.7,25.3,21.6,17.5,9.9,13.2,18.2,23.6,26.6,27.5,26.4,25.9,25.0,20.9,15.5,11.2,22.5,26.6,33.1,38.6,39.1,36.7,32.3,31.9,31.1,29.9,27.6,23.9,16.0,18.0,9.0,9.0,25.0,177.0,326.0,290.0,208.0,56.0,7.0,7.0,,,,,,,,,,,,,2.0,2.0,2.0,2.0,4.0,11.0,19.0,19.0,16.0,5.0,1.0,1.0,8.6,9.7,10.7,11.3,11.4,10.3,8.3,8.0,8.2,8.9,9.3,8.7
3,bihar,2.0,aurangabad,,,,rabi,bengal gram (chana),3689.33,,28750.0,,0.8,,,,452.00,1082.00,1075.00,801.00,482.00,502.50,955.00,1173.00,415.00,888.00,,,,,6114,,,bihar|aurangabad|nan|nan|nan|bengal gram (chana),bihar|aurangabad|bengal gram (chana),16.2,20.3,26.0,31.5,33.5,32.2,28.8,28.3,27.6,25.4,21.5,17.5,9.6,13.1,18.0,23.4,26.7,27.8,26.2,25.8,24.9,20.9,15.4,10.9,22.7,27.1,33.3,38.8,39.7,36.9,32.1,31.6,31.1,30.1,27.5,23.9,16.0,17.0,10.0,9.0,21.0,169.0,315.0,264.0,204.0,58.0,8.0,7.0,,,,,,,,,,,,,2.0,2.0,2.0,2.0,4.0,11.0,18.0,18.0,15.0,5.0,1.0,1.0,8.8,9.8,10.7,11.4,11.7,10.6,8.5,8.0,8.3,9.1,9.5,9.0
4,bihar,3.0,banka,,,,rabi,bengal gram (chana),1129.50,,25988.0,,0.8,,,,794.00,873.00,830.00,1027.00,500.00,652.50,1165.00,1515.00,419.00,991.00,,,,,6115,,,bihar|banka|nan|nan|nan|bengal gram (chana),bihar|banka|bengal gram (chana),16.3,20.0,25.3,29.5,30.1,29.7,28.0,27.8,27.3,25.3,21.5,17.7,10.5,13.7,18.5,23.0,25.0,26.2,25.7,25.5,24.7,21.4,16.2,12.1,22.2,26.2,31.9,36.1,35.6,33.8,31.2,31.1,30.7,29.4,26.9,23.4,17.0,17.0,20.0,26.0,93.0,204.0,291.0,249.0,228.0,80.0,7.0,7.0,,,,,,,,,,,,,2.0,2.0,2.0,4.0,9.0,14.0,20.0,19.0,17.0,6.0,1.0,1.0,8.8,9.6,10.4,9.9,8.8,8.6,7.9,7.7,7.7,8.8,9.1,8.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
638796,west bengal,1.0,howrah,bally-jagachha,,,rabi,wheat,,0.2000,56810.0,11362.00,0.9,,,,1610.15,1475.36,2179.77,1604.97,952.84,2559.50,3060.84,1541.83,1941.00,2940.80,1987.48,,,,616785,,,west bengal|howrah|bally-jagachha|nan|nan|wheat,west bengal|howrah|wheat,19.0,22.7,27.1,29.9,30.6,29.5,28.1,27.9,27.6,26.3,23.3,20.1,12.9,16.5,21.4,25.5,26.9,27.0,26.1,25.8,25.3,23.0,18.4,14.5,25.3,29.0,33.3,35.7,35.4,33.1,31.1,31.0,30.8,30.2,28.5,25.9,12.0,27.0,35.0,63.0,127.0,277.0,365.0,312.0,260.0,135.0,32.0,11.0,,,,,,,,,,,,,1.0,2.0,3.0,6.0,9.0,16.0,21.0,21.0,18.0,10.0,2.0,1.0,9.1,9.3,9.7,9.7,8.9,8.7,8.2,7.8,7.8,8.3,8.9,8.8
638797,west bengal,1.0,howrah,uluberia-ii,,,rabi,wheat,,0.2000,56810.0,11362.00,0.9,,,,1582.98,1522.03,2179.77,1604.97,2084.77,715.10,502.04,819.81,866.19,797.49,1151.15,,,,616786,,,west bengal|howrah|uluberia-ii|nan|nan|wheat,west bengal|howrah|wheat,19.0,22.7,27.1,29.9,30.6,29.5,28.1,27.9,27.6,26.3,23.3,20.1,12.9,16.5,21.4,25.5,26.9,27.0,26.1,25.8,25.3,23.0,18.4,14.5,25.3,29.0,33.3,35.7,35.4,33.1,31.1,31.0,30.8,30.2,28.5,25.9,12.0,27.0,35.0,63.0,127.0,277.0,365.0,312.0,260.0,135.0,32.0,11.0,,,,,,,,,,,,,1.0,2.0,3.0,6.0,9.0,16.0,21.0,21.0,18.0,10.0,2.0,1.0,9.1,9.3,9.7,9.7,8.9,8.7,8.2,7.8,7.8,8.3,8.9,8.8
638798,west bengal,3.0,purba medinipur,haldia,,,rabi,wheat,,0.7500,38557.0,28917.75,0.9,,,,2136.41,2323.66,2599.19,2432.22,2084.77,1987.31,1417.77,2804.98,2382.03,2518.48,1362.95,,,,616787,,,west bengal|purba medinipur|haldia|nan|nan|wheat,west bengal|purba medinipur|wheat,19.4,23.1,27.3,30.1,30.8,29.8,28.3,28.0,27.7,26.4,23.5,20.3,13.4,17.2,22.0,25.9,27.3,27.2,26.2,25.9,25.5,23.1,18.6,14.7,25.7,29.3,33.4,35.9,35.7,33.3,31.1,31.0,30.7,30.2,28.6,26.1,13.0,28.0,36.0,55.0,111.0,258.0,349.0,308.0,258.0,134.0,36.0,12.0,,,,,,,,,,,,,1.0,2.0,3.0,5.0,8.0,15.0,20.0,21.0,18.0,10.0,2.0,1.0,9.0,9.2,9.8,9.8,9.3,9.2,8.5,8.1,8.1,8.4,8.9,8.8
638799,west bengal,3.0,purba medinipur,mohisadal-ii,,,rabi,wheat,,0.7500,38557.0,28917.75,0.9,,,,2136.41,2323.66,2599.19,2432.22,2450.12,1987.31,1417.77,2804.98,2382.03,2518.48,1362.95,,,,616788,,,west bengal|purba medinipur|mohisadal-ii|nan|n...,west bengal|purba medinipur|wheat,19.4,23.1,27.3,30.1,30.8,29.8,28.3,28.0,27.7,26.4,23.5,20.3,13.4,17.2,22.0,25.9,27.3,27.2,26.2,25.9,25.5,23.1,18.6,14.7,25.7,29.3,33.4,35.9,35.7,33.3,31.1,31.0,30.7,30.2,28.6,26.1,13.0,28.0,36.0,55.0,111.0,258.0,349.0,308.0,258.0,134.0,36.0,12.0,,,,,,,,,,,,,1.0,2.0,3.0,5.0,8.0,15.0,20.0,21.0,18.0,10.0,2.0,1.0,9.0,9.2,9.8,9.8,9.3,9.2,8.5,8.1,8.1,8.4,8.9,8.8


In [22]:
# Export csv as checkpoint for further preprocessing
df_kharif.to_csv("datasets_preprocessed/df_kharif_preprocessed_checkpoint_3.csv", sep=';')
df_rabi.to_csv("datasets_preprocessed/df_rabi_preprocessed_checkpoint_3.csv", sep=';')