<img src = "https://escp.eu/sites/default/files/logo/ESCP-logo-white-misalign.svg" width = 400 style="background-color: #240085;">
<h1 align=center><font size = 6>ESCP Business School</font></h1>
<h3 align=center><font size = 5>SCOR Datathon</font><br/>
<font size = 3>The Data Science Challenge Bridging Indian Agricultureal Protection Gap</font></h3>
<h6 align=center>Additional Data - Web Scraping (Merging with the main dataset)</h6>

Last Updated: February 15, 2022\
Author: Group 21 - Anniek Brink, Jeanne Dubois, and Resha Dirga

<h3>Chapter Objectives</h3>

<p>After having a consolidated file of external data, the data will need to be mapped into the main dataset. This document aims to integrate the consolidated external data file into the main dataset containing yields, assurances, etc.</p>

<h3>Chapter 1: Import modules</h3>
<p>This chapter lists all modules that being used on this document. The module import process will be performed on this chapter</p>

In [1]:
# Import modules
import pandas as pd
import numpy as np

<h3>Chapter 2: Load datasets</h3>
<p>This chapter loads the datasets required to perform the integration:
<ul>
    <li>Consolidated external data file</li>
    <li>Mapping references file</li>
    <li>Main dataset references</li>
</ul>
</p>

<p><u><i>Note:</i></u> The external data is an additional step after having the very first cluster only utilising the data given by SCOR. Thus, the 'Main dataset references' file uses the dataset resulted from the clustering. However, this document does not use the Clusters generated from previous process. Thus, the clustering result may be different after adding the external dataset.</p>

In [2]:
# Define path to the required datasets
files_to_merge = [
    'cities_climate/Final Dataset (Additional Data) - Kharif.csv',
    'cities_climate/Final Dataset (Additional Data) - Rabi.csv',
    'cities_climate/SCOR_Cities_Climate.csv'
]

In [3]:
# Assign datasets to a variable
df_kharif = pd.read_csv(files_to_merge[0])
df_rabi = pd.read_csv(files_to_merge[1])
df_references = pd.read_csv(files_to_merge[2])

<h3>Chapter 3: Merge datasets</h3>
<p>This chapter performs dataset integration to the main dataset with reference. The generated files will be integrated on the main process in Chapter 2C - Preprocessing.</p>

In [4]:
# Fill the nan values to unknown before merging
# df_references = df_big_ref
df_references.replace('', 'unknown', inplace=True)
df_references = df_references.fillna("unknown")
df_references.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5664 entries, 0 to 5663
Data columns (total 14 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Unnamed: 0                        5664 non-null   int64  
 1   State                             5664 non-null   object 
 2   City                              5664 non-null   object 
 3   Month                             5664 non-null   object 
 4   Avg. Temperature °C (°F)          5664 non-null   object 
 5   Min. Temperature °C (°F)          5664 non-null   object 
 6   Max. Temperature °C (°F)          5664 non-null   object 
 7   Precipitation / Rainfall mm (in)  5664 non-null   float64
 8   Humidity(%)                       5664 non-null   object 
 9   Rainy days (d)                    5664 non-null   float64
 10  avg. Sun hours (hours)            5664 non-null   float64
 11  Unnamed: 0.1                      5664 non-null   float64
 12  Unname

<h5><u>For Kharif</u></h5>

In [5]:
# Create a list of index to map
list_index = df_kharif['Index/ref'].to_list()
len(list_index)

435

In [6]:
# Transpose long dataset to wide dataset
list_avg_temp_jan = []
list_avg_temp_feb = []
list_avg_temp_mar = []
list_avg_temp_apr = []
list_avg_temp_may = []
list_avg_temp_jun = []
list_avg_temp_jul = []
list_avg_temp_agt = []
list_avg_temp_sep = []
list_avg_temp_oct = []
list_avg_temp_nov = []
list_avg_temp_dec = []

list_min_temp_jan = []
list_min_temp_feb = []
list_min_temp_mar = []
list_min_temp_apr = []
list_min_temp_may = []
list_min_temp_jun = []
list_min_temp_jul = []
list_min_temp_agt = []
list_min_temp_sep = []
list_min_temp_oct = []
list_min_temp_nov = []
list_min_temp_dec = []

list_max_temp_jan = []
list_max_temp_feb = []
list_max_temp_mar = []
list_max_temp_apr = []
list_max_temp_may = []
list_max_temp_jun = []
list_max_temp_jul = []
list_max_temp_agt = []
list_max_temp_sep = []
list_max_temp_oct = []
list_max_temp_nov = []
list_max_temp_dec = []

list_prec_jan = []
list_prec_feb = []
list_prec_mar = []
list_prec_apr = []
list_prec_may = []
list_prec_jun = []
list_prec_jul = []
list_prec_agt = []
list_prec_sep = []
list_prec_oct = []
list_prec_nov = []
list_prec_dec = []

list_humid_jan = []
list_humid_feb = []
list_humid_mar = []
list_humid_apr = []
list_humid_may = []
list_humid_jun = []
list_humid_jul = []
list_humid_agt = []
list_humid_sep = []
list_humid_oct = []
list_humid_nov = []
list_humid_dec = []

list_rainy_days_jan = []
list_rainy_days_feb = []
list_rainy_days_mar = []
list_rainy_days_apr = []
list_rainy_days_may = []
list_rainy_days_jun = []
list_rainy_days_jul = []
list_rainy_days_agt = []
list_rainy_days_sep = []
list_rainy_days_oct = []
list_rainy_days_nov = []
list_rainy_days_dec = []

list_sun_hour_jan = []
list_sun_hour_feb = []
list_sun_hour_mar = []
list_sun_hour_apr = []
list_sun_hour_may = []
list_sun_hour_jun = []
list_sun_hour_jul = []
list_sun_hour_agt = []
list_sun_hour_sep = []
list_sun_hour_oct = []
list_sun_hour_nov = []
list_sun_hour_dec = []

index_ref = []

for ref_index in list_index:
    df_temp = df_references[df_references['index'] == ref_index]
    if len(df_temp):
        index_ref.append(ref_index)

        # Avg temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_dec.append(value)

        # Min temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_dec.append(value)

        # Max temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_dec.append(value)

        # Precipitation
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_dec.append(value)

        # Humidity
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Humidity(%)'].values[0]
        list_humid_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Humidity(%)'].values[0]
        list_humid_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Humidity(%)'].values[0]
        list_humid_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Humidity(%)'].values[0]
        list_humid_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Humidity(%)'].values[0]
        list_humid_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Humidity(%)'].values[0]
        list_humid_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Humidity(%)'].values[0]
        list_humid_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Humidity(%)'].values[0]
        list_humid_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Humidity(%)'].values[0]
        list_humid_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Humidity(%)'].values[0]
        list_humid_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Humidity(%)'].values[0]
        list_humid_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Humidity(%)'].values[0]
        list_humid_dec.append(value)

        # Rainy days
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Rainy days (d)'].values[0]
        list_rainy_days_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Rainy days (d)'].values[0]
        list_rainy_days_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Rainy days (d)'].values[0]
        list_rainy_days_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Rainy days (d)'].values[0]
        list_rainy_days_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Rainy days (d)'].values[0]
        list_rainy_days_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Rainy days (d)'].values[0]
        list_rainy_days_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Rainy days (d)'].values[0]
        list_rainy_days_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Rainy days (d)'].values[0]
        list_rainy_days_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Rainy days (d)'].values[0]
        list_rainy_days_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Rainy days (d)'].values[0]
        list_rainy_days_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Rainy days (d)'].values[0]
        list_rainy_days_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Rainy days (d)'].values[0]
        list_rainy_days_dec.append(value)

        # Sun hour
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_dec.append(value)
    
result = {
    'index_ref': index_ref,
    'Average Temperature January (C)': list_avg_temp_jan,
    'Average Temperature February (C)': list_avg_temp_feb,
    'Average Temperature March (C)': list_avg_temp_mar,
    'Average Temperature April (C)': list_avg_temp_apr, 
    'Average Temperature May (C)': list_avg_temp_may,
    'Average Temperature June (C)': list_avg_temp_jun, 
    'Average Temperature July (C)': list_avg_temp_jul,
    'Average Temperature August (C)': list_avg_temp_agt, 
    'Average Temperature September (C)': list_avg_temp_sep,
    'Average Temperature October (C)': list_avg_temp_oct, 
    'Average Temperature November (C)': list_avg_temp_nov,
    'Average Temperature December (C)': list_avg_temp_dec, 
    'Min. Temperature January (C)': list_min_temp_jan,
    'Min. Temperature February (C)': list_min_temp_feb, 
    'Min. Temperature March (C)': list_min_temp_mar,
    'Min. Temperature  April (C)': list_min_temp_apr, 
    'Min. Temperature May (C)': list_min_temp_may,
    'Min. Temperature June (C)': list_min_temp_jun, 
    'Min. Temperature July (C)': list_min_temp_jul,
    'Min. Temperature August (C)': list_min_temp_agt, 
    'Min. Temperature September (C)': list_min_temp_sep,
    'Min. Temperature October (C)': list_min_temp_oct, 
    'Min. Temperature November (C)': list_min_temp_nov,
    'Min. Temperature December (C)': list_min_temp_dec, 
    'Max. Temperature January (C)': list_max_temp_jan,
    'Max. Temperature February (C)': list_max_temp_feb, 
    'Max. Temperature March (C)': list_max_temp_mar,
    'Max. Temperature April (C)': list_max_temp_apr, 
    'Max. Temperature May (C)': list_max_temp_may,
    'Max. Temperature June (C)': list_max_temp_jun, 
    'Max. Temperature July (C)': list_max_temp_jul,
    'Max. Temperature August (C)': list_max_temp_agt, 
    'Max. Temperature September (C)': list_max_temp_sep,
    'Max. Temperature October (C)': list_max_temp_oct, 
    'Max. Temperature November (C)': list_max_temp_nov,
    'Max. Temperature December (C)': list_max_temp_dec, 
    'Precipitation January': list_prec_jan,
    'Precipitation February': list_prec_feb, 
    'Precipitation March': list_prec_mar, 
    'Precipitation April': list_prec_apr,
    'Precipitation May': list_prec_may, 
    'Precipitation June': list_prec_jun, 
    'Precipitation July': list_prec_jul,
    'Precipitation August': list_prec_agt, 
    'Precipitation September': list_prec_sep,
    'Precipitation October': list_prec_oct, 
    'Precipitation November': list_prec_nov,
    'Precipitation December': list_prec_dec, 
    'Humidity (%) January': list_humid_jan,
    'Humidity (%) February': list_humid_feb, 
    'Humidity (%) March': list_humid_mar, 
    'Humidity (%) April': list_humid_apr,
    'Humidity (%) May': list_humid_may, 
    'Humidity (%) June': list_humid_jun, 
    'Humidity (%) July': list_humid_jul,
    'Humidity (%) August': list_humid_agt, 
    'Humidity (%) September': list_humid_sep, 
    'Humidity (%) October': list_humid_oct,
    'Humidity (%) November': list_humid_nov, 
    'Humidity (%) December': list_humid_dec,
    'Rainy days (d) January': list_rainy_days_jan, 
    'Rainy days (d) February': list_rainy_days_feb,
    'Rainy days (d) March': list_rainy_days_mar, 
    'Rainy days (d) April': list_rainy_days_apr, 
    'Rainy days (d) May': list_rainy_days_may,
    'Rainy days (d) June': list_rainy_days_jun, 
    'Rainy days (d) July': list_rainy_days_jul, 
    'Rainy days (d) August': list_rainy_days_agt,
    'Rainy days (d) September': list_rainy_days_sep, 
    'Rainy days (d) October': list_rainy_days_oct,
    'Rainy days (d) November': list_rainy_days_nov, 
    'Rainy days (d) December': list_rainy_days_dec,
    'Average Sun Hours January': list_sun_hour_jan, 
    'Average Sun Hours February': list_sun_hour_feb,
    'Average Sun Hours March': list_sun_hour_mar,
    'Average Sun Hours April': list_sun_hour_apr,
    'Average Sun Hours May': list_sun_hour_may,
    'Average Sun Hours June': list_sun_hour_jun,
    'Average Sun Hours July': list_sun_hour_jul,
    'Average Sun Hours August': list_sun_hour_agt,
    'Average Sun Hours September': list_sun_hour_sep,
    'Average Sun Hours October': list_sun_hour_oct,
    'Average Sun Hours November': list_sun_hour_nov,
    'Average Sun Hours December': list_sun_hour_dec
}
    
df_result = pd.DataFrame(result)

In [7]:
# List columns of external dataset to merge into the main dataset
list_columns = [
       'Average Temperature January (C)',
       'Average Temperature February (C)', 'Average Temperature March (C)',
       'Average Temperature April (C)', 'Average Temperature May (C)',
       'Average Temperature June (C)', 'Average Temperature July (C)',
       'Average Temperature August (C)', 'Average Temperature September (C)',
       'Average Temperature October (C)', 'Average Temperature November (C)',
       'Average Temperature December (C)', 'Min. Temperature January (C)',
       'Min. Temperature February (C)', 'Min. Temperature March (C)',
       'Min. Temperature  April (C)', 'Min. Temperature May (C)',
       'Min. Temperature June (C)', 'Min. Temperature July (C)',
       'Min. Temperature August (C)', 'Min. Temperature September (C)',
       'Min. Temperature October (C)', 'Min. Temperature November (C)',
       'Min. Temperature December (C)', 'Max. Temperature January (C)',
       'Max. Temperature February (C)', 'Max. Temperature March (C)',
       'Max. Temperature April (C)', 'Max. Temperature May (C)',
       'Max. Temperature June (C)', 'Max. Temperature July (C)',
       'Max. Temperature August (C)', 'Max. Temperature September (C)',
       'Max. Temperature October (C)', 'Max. Temperature November (C)',
       'Max. Temperature December (C)', 'Precipitation January',
       'Precipitation February', 'Precipitation March', 'Precipitation April',
       'Precipitation May', 'Precipitation June', 'Precipitation July',
       'Precipitation August', 'Precipitation September',
       'Precipitation October', 'Precipitation November',
       'Precipitation December', 'Humidity (%) January',
       'Humidity (%) February', 'Humidity (%) March', 'Humidity (%) April',
       'Humidity (%) May', 'Humidity (%) June', 'Humidity (%) July',
       'Humidity (%) August', 'Humidity (%) September', 'Humidity (%) October',
       'Humidity (%) November', 'Humidity (%) December',
       'Rainy days (d) January', 'Rainy days (d) February',
       'Rainy days (d) March', 'Rainy days (d) April', 'Rainy days (d) May',
       'Rainy days (d) June', 'Rainy days (d) July', 'Rainy days (d) August',
       'Rainy days (d) September', 'Rainy days (d) October',
       'Rainy days (d) November', 'Rainy days (d) December',
       'Average Sun Hours January', 'Average Sun Hours February',
       'Average Sun Hours March', 'Average Sun Hours April',
       'Average Sun Hours May', 'Average Sun Hours June',
       'Average Sun Hours July', 'Average Sun Hours August',
       'Average Sun Hours September', 'Average Sun Hours October',
       'Average Sun Hours November', 'Average Sun Hours December'
]

In [8]:
df_result

Unnamed: 0,index_ref,Average Temperature January (C),Average Temperature February (C),Average Temperature March (C),Average Temperature April (C),Average Temperature May (C),Average Temperature June (C),Average Temperature July (C),Average Temperature August (C),Average Temperature September (C),...,Average Sun Hours March,Average Sun Hours April,Average Sun Hours May,Average Sun Hours June,Average Sun Hours July,Average Sun Hours August,Average Sun Hours September,Average Sun Hours October,Average Sun Hours November,Average Sun Hours December
0,https://en.climate-data.org/asia/india/adilaba...,22.8 °C,26.1 °C,29.6 °C,33.6 °C,35.5 °C,31 °C,27 °C,26.2 °C,26.6 °C,...,10.7,11.3,11.7,10.6,8.7,8.1,8.5,9.5,9.6,9.6
1,https://en.climate-data.org/asia/india/madhya-...,18 °C,21.1 °C,26 °C,31.4 °C,33.7 °C,30.8 °C,26.2 °C,25.1 °C,25.8 °C,...,10.7,11.4,11.8,10.4,6.0,5.0,8.1,10.1,9.7,9.4
2,https://en.climate-data.org/asia/india/uttar-p...,14.1 °C,17.8 °C,23.7 °C,29.9 °C,33.4 °C,33.5 °C,29.9 °C,28.6 °C,28.2 °C,...,10.6,11.5,12.1,11.7,9.3,8.6,9.1,10.0,9.6,9.0
3,https://en.climate-data.org/asia/india/gujarat...,20.4 °C,22.8 °C,27.2 °C,31.2 °C,33 °C,31.9 °C,28.5 °C,27.5 °C,28 °C,...,10.8,11.4,11.3,9.5,7.1,6.2,8.0,10.1,9.9,9.6
4,https://en.climate-data.org/asia/india/maharas...,22 °C,24.5 °C,27.7 °C,30.7 °C,30.4 °C,26.4 °C,24.2 °C,23.7 °C,24.1 °C,...,10.8,11.3,11.5,8.6,6.7,6.0,7.0,9.3,9.5,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
430,https://en.climate-data.org/asia/india/jharkha...,18.7 °C,22.6 °C,27.1 °C,30.9 °C,31.5 °C,29.5 °C,26.8 °C,26.5 °C,26.3 °C,...,10.6,11.2,11.4,9.8,7.6,7.1,7.5,8.6,9.2,9.2
431,https://en.climate-data.org/asia/india/tamil-n...,24.3 °C,25.4 °C,27.5 °C,29.9 °C,31.6 °C,31 °C,30.3 °C,29.6 °C,29.3 °C,...,8.9,9.8,10.9,11.1,11.0,10.8,10.3,8.8,7.2,6.9
432,https://en.climate-data.org/asia/india/karnata...,23.9 °C,26.3 °C,29.6 °C,32.4 °C,33.1 °C,28.5 °C,26.6 °C,26 °C,26 °C,...,10.7,11.2,11.4,9.1,8.0,7.8,7.9,9.0,9.2,9.3
433,https://en.climate-data.org/asia/india/haryana...,12.6 °C,15.8 °C,21.1 °C,27.5 °C,31.2 °C,31.6 °C,28.8 °C,28 °C,26.9 °C,...,10.6,11.5,12.1,11.4,9.4,9.1,9.2,10.0,9.5,8.5


In [9]:
# Merge main dataset references with external dataset
df_kharif_new = pd.merge(df_kharif, df_result, left_on='Index/ref', right_on='index_ref', how='left', suffixes=['', '_new'])
df_kharif_new

Unnamed: 0.1,Unnamed: 0,District ID,Threshold Yield,Loss Calculation,KMN Clusters,KMN Distance to Centroid,SHC Clusters,DBS Clusters,MSC Clusters,Country,...,Average Sun Hours March_new,Average Sun Hours April_new,Average Sun Hours May_new,Average Sun Hours June_new,Average Sun Hours July_new,Average Sun Hours August_new,Average Sun Hours September_new,Average Sun Hours October_new,Average Sun Hours November_new,Average Sun Hours December_new
0,311,Telangana_1_Adilabad,4.013751e-03,0.030947,5,393.000000,4,0,0,India,...,10.7,11.3,11.7,10.6,8.7,8.1,8.5,9.5,9.6,9.6
1,159,Madhya Pradesh_1_Agar Malwa,4.513338e-04,0.120004,4,0.000738,1,0,0,India,...,10.7,11.4,11.8,10.4,6.0,5.0,8.1,10.1,9.7,9.4
2,367,Uttar Pradesh_2_Agra,3.104472e-02,0.133326,4,0.000813,1,-1,0,India,...,10.6,11.5,12.1,11.7,9.3,8.6,9.1,10.0,9.6,9.0
3,69,Gujarat_4_Ahmedabad,6.950000e+10,0.426229,1,0.009760,0,-1,4,India,...,10.8,11.4,11.3,9.5,7.1,6.2,8.0,10.1,9.9,9.6
4,239,Maharashtra_6_Ahmednagar,1.614336e-03,0.142123,4,0.001803,1,0,0,India,...,10.8,11.3,11.5,8.6,6.7,6.0,7.0,9.3,9.5,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
466,128,Jharkhand_4_West Singhbhum,4.810000e+09,0.158840,2,0.001633,1,0,0,India,...,10.6,11.2,11.4,9.8,7.6,7.1,7.5,8.6,9.2,9.2
467,320,Telangana_2_Yadadri,4.899660e-03,0.033456,5,146.000000,4,0,0,India,...,8.9,9.8,10.9,11.1,11.0,10.8,10.3,8.8,7.2,6.9
468,149,Karnataka_6_Yadgir,2.310801e-03,0.166794,2,0.001060,1,0,0,India,...,10.7,11.2,11.4,9.1,8.0,7.8,7.9,9.0,9.2,9.3
469,104,Haryana_3_Yamunanagar,1.570000e+11,0.179593,2,0.000655,1,1,0,India,...,10.6,11.5,12.1,11.4,9.4,9.1,9.2,10.0,9.5,8.5


In [10]:
# Clean unecessary columns from external data
for columns in list_columns:
    df_kharif_new[columns] = df_kharif_new[columns+"_new"]
    df_kharif_new = df_kharif_new.drop(columns=[columns+"_new"])

<h5><u>For Rabi</u></h5>

In [11]:
# Create a list of index to map
list_index = df_rabi['Index/ref'].to_list()
len(list_index)

426

In [12]:
# Transpose long dataset to wide dataset
list_avg_temp_jan = []
list_avg_temp_feb = []
list_avg_temp_mar = []
list_avg_temp_apr = []
list_avg_temp_may = []
list_avg_temp_jun = []
list_avg_temp_jul = []
list_avg_temp_agt = []
list_avg_temp_sep = []
list_avg_temp_oct = []
list_avg_temp_nov = []
list_avg_temp_dec = []

list_min_temp_jan = []
list_min_temp_feb = []
list_min_temp_mar = []
list_min_temp_apr = []
list_min_temp_may = []
list_min_temp_jun = []
list_min_temp_jul = []
list_min_temp_agt = []
list_min_temp_sep = []
list_min_temp_oct = []
list_min_temp_nov = []
list_min_temp_dec = []

list_max_temp_jan = []
list_max_temp_feb = []
list_max_temp_mar = []
list_max_temp_apr = []
list_max_temp_may = []
list_max_temp_jun = []
list_max_temp_jul = []
list_max_temp_agt = []
list_max_temp_sep = []
list_max_temp_oct = []
list_max_temp_nov = []
list_max_temp_dec = []

list_prec_jan = []
list_prec_feb = []
list_prec_mar = []
list_prec_apr = []
list_prec_may = []
list_prec_jun = []
list_prec_jul = []
list_prec_agt = []
list_prec_sep = []
list_prec_oct = []
list_prec_nov = []
list_prec_dec = []

list_humid_jan = []
list_humid_feb = []
list_humid_mar = []
list_humid_apr = []
list_humid_may = []
list_humid_jun = []
list_humid_jul = []
list_humid_agt = []
list_humid_sep = []
list_humid_oct = []
list_humid_nov = []
list_humid_dec = []

list_rainy_days_jan = []
list_rainy_days_feb = []
list_rainy_days_mar = []
list_rainy_days_apr = []
list_rainy_days_may = []
list_rainy_days_jun = []
list_rainy_days_jul = []
list_rainy_days_agt = []
list_rainy_days_sep = []
list_rainy_days_oct = []
list_rainy_days_nov = []
list_rainy_days_dec = []

list_sun_hour_jan = []
list_sun_hour_feb = []
list_sun_hour_mar = []
list_sun_hour_apr = []
list_sun_hour_may = []
list_sun_hour_jun = []
list_sun_hour_jul = []
list_sun_hour_agt = []
list_sun_hour_sep = []
list_sun_hour_oct = []
list_sun_hour_nov = []
list_sun_hour_dec = []

index_ref = []

for ref_index in list_index:
    df_temp = df_references[df_references['index'] == ref_index]
    if len(df_temp):
        index_ref.append(ref_index)

        # Avg temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Avg. Temperature °C (°F)'].values[0]
        list_avg_temp_dec.append(value)

        # Min temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Min. Temperature °C (°F)'].values[0]
        list_min_temp_dec.append(value)

        # Max temprature
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Max. Temperature °C (°F)'].values[0]
        list_max_temp_dec.append(value)

        # Precipitation
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Precipitation / Rainfall mm (in)'].values[0]
        list_prec_dec.append(value)

        # Humidity
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Humidity(%)'].values[0]
        list_humid_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Humidity(%)'].values[0]
        list_humid_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Humidity(%)'].values[0]
        list_humid_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Humidity(%)'].values[0]
        list_humid_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Humidity(%)'].values[0]
        list_humid_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Humidity(%)'].values[0]
        list_humid_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Humidity(%)'].values[0]
        list_humid_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Humidity(%)'].values[0]
        list_humid_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Humidity(%)'].values[0]
        list_humid_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Humidity(%)'].values[0]
        list_humid_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Humidity(%)'].values[0]
        list_humid_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Humidity(%)'].values[0]
        list_humid_dec.append(value)

        # Rainy days
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'Rainy days (d)'].values[0]
        list_rainy_days_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'Rainy days (d)'].values[0]
        list_rainy_days_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'Rainy days (d)'].values[0]
        list_rainy_days_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'Rainy days (d)'].values[0]
        list_rainy_days_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'Rainy days (d)'].values[0]
        list_rainy_days_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'Rainy days (d)'].values[0]
        list_rainy_days_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'Rainy days (d)'].values[0]
        list_rainy_days_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'Rainy days (d)'].values[0]
        list_rainy_days_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'Rainy days (d)'].values[0]
        list_rainy_days_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'Rainy days (d)'].values[0]
        list_rainy_days_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'Rainy days (d)'].values[0]
        list_rainy_days_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'Rainy days (d)'].values[0]
        list_rainy_days_dec.append(value)

        # Sun hour
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jan'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jan.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Feb'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_feb.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Mar'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_mar.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Apr'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_apr.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'May'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_may.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jun'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jun.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Jul'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_jul.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Aug'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_agt.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Sep'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_sep.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Oct'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_oct.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Nov'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_nov.append(value)
        value = df_temp.loc[df_temp.index[df_temp['Month'] == 'Dec'], 'avg. Sun hours (hours)'].values[0]
        list_sun_hour_dec.append(value)
    
result = {
    'index_ref': index_ref,
    'Average Temperature January (C)': list_avg_temp_jan,
    'Average Temperature February (C)': list_avg_temp_feb,
    'Average Temperature March (C)': list_avg_temp_mar,
    'Average Temperature April (C)': list_avg_temp_apr, 
    'Average Temperature May (C)': list_avg_temp_may,
    'Average Temperature June (C)': list_avg_temp_jun, 
    'Average Temperature July (C)': list_avg_temp_jul,
    'Average Temperature August (C)': list_avg_temp_agt, 
    'Average Temperature September (C)': list_avg_temp_sep,
    'Average Temperature October (C)': list_avg_temp_oct, 
    'Average Temperature November (C)': list_avg_temp_nov,
    'Average Temperature December (C)': list_avg_temp_dec, 
    'Min. Temperature January (C)': list_min_temp_jan,
    'Min. Temperature February (C)': list_min_temp_feb, 
    'Min. Temperature March (C)': list_min_temp_mar,
    'Min. Temperature  April (C)': list_min_temp_apr, 
    'Min. Temperature May (C)': list_min_temp_may,
    'Min. Temperature June (C)': list_min_temp_jun, 
    'Min. Temperature July (C)': list_min_temp_jul,
    'Min. Temperature August (C)': list_min_temp_agt, 
    'Min. Temperature September (C)': list_min_temp_sep,
    'Min. Temperature October (C)': list_min_temp_oct, 
    'Min. Temperature November (C)': list_min_temp_nov,
    'Min. Temperature December (C)': list_min_temp_dec, 
    'Max. Temperature January (C)': list_max_temp_jan,
    'Max. Temperature February (C)': list_max_temp_feb, 
    'Max. Temperature March (C)': list_max_temp_mar,
    'Max. Temperature April (C)': list_max_temp_apr, 
    'Max. Temperature May (C)': list_max_temp_may,
    'Max. Temperature June (C)': list_max_temp_jun, 
    'Max. Temperature July (C)': list_max_temp_jul,
    'Max. Temperature August (C)': list_max_temp_agt, 
    'Max. Temperature September (C)': list_max_temp_sep,
    'Max. Temperature October (C)': list_max_temp_oct, 
    'Max. Temperature November (C)': list_max_temp_nov,
    'Max. Temperature December (C)': list_max_temp_dec, 
    'Precipitation January': list_prec_jan,
    'Precipitation February': list_prec_feb, 
    'Precipitation March': list_prec_mar, 
    'Precipitation April': list_prec_apr,
    'Precipitation May': list_prec_may, 
    'Precipitation June': list_prec_jun, 
    'Precipitation July': list_prec_jul,
    'Precipitation August': list_prec_agt, 
    'Precipitation September': list_prec_sep,
    'Precipitation October': list_prec_oct, 
    'Precipitation November': list_prec_nov,
    'Precipitation December': list_prec_dec, 
    'Humidity (%) January': list_humid_jan,
    'Humidity (%) February': list_humid_feb, 
    'Humidity (%) March': list_humid_mar, 
    'Humidity (%) April': list_humid_apr,
    'Humidity (%) May': list_humid_may, 
    'Humidity (%) June': list_humid_jun, 
    'Humidity (%) July': list_humid_jul,
    'Humidity (%) August': list_humid_agt, 
    'Humidity (%) September': list_humid_sep, 
    'Humidity (%) October': list_humid_oct,
    'Humidity (%) November': list_humid_nov, 
    'Humidity (%) December': list_humid_dec,
    'Rainy days (d) January': list_rainy_days_jan, 
    'Rainy days (d) February': list_rainy_days_feb,
    'Rainy days (d) March': list_rainy_days_mar, 
    'Rainy days (d) April': list_rainy_days_apr, 
    'Rainy days (d) May': list_rainy_days_may,
    'Rainy days (d) June': list_rainy_days_jun, 
    'Rainy days (d) July': list_rainy_days_jul, 
    'Rainy days (d) August': list_rainy_days_agt,
    'Rainy days (d) September': list_rainy_days_sep, 
    'Rainy days (d) October': list_rainy_days_oct,
    'Rainy days (d) November': list_rainy_days_nov, 
    'Rainy days (d) December': list_rainy_days_dec,
    'Average Sun Hours January': list_sun_hour_jan, 
    'Average Sun Hours February': list_sun_hour_feb,
    'Average Sun Hours March': list_sun_hour_mar,
    'Average Sun Hours April': list_sun_hour_apr,
    'Average Sun Hours May': list_sun_hour_may,
    'Average Sun Hours June': list_sun_hour_jun,
    'Average Sun Hours July': list_sun_hour_jul,
    'Average Sun Hours August': list_sun_hour_agt,
    'Average Sun Hours September': list_sun_hour_sep,
    'Average Sun Hours October': list_sun_hour_oct,
    'Average Sun Hours November': list_sun_hour_nov,
    'Average Sun Hours December': list_sun_hour_dec
}
    
df_result = pd.DataFrame(result)

In [13]:
# Merge main dataset references with the external dataset
df_rabi_new = pd.merge(df_rabi, df_result, left_on='Index/ref', right_on='index_ref', how='left', suffixes=['', '_new'])
df_rabi_new

Unnamed: 0.1,Unnamed: 0,District ID,Threshold Yield,Loss Calculation,KMN Clusters,KMN Distance to Centroid,SHC Clusters,MSC Clusters,DBS Clusters,Country,...,Average Sun Hours March_new,Average Sun Hours April_new,Average Sun Hours May_new,Average Sun Hours June_new,Average Sun Hours July_new,Average Sun Hours August_new,Average Sun Hours September_new,Average Sun Hours October_new,Average Sun Hours November_new,Average Sun Hours December_new
0,303,Telangana_1.0_Adilabad,7.082626e-04,0.005499,5,268.000000,2,0,0,India,...,10.7,11.3,11.7,10.6,8.7,8.1,8.5,9.5,9.6,9.6
1,185,Madhya Pradesh_1.0_Agar Malwa,4.509780e-04,0.000000,5,0.000111,2,0,0,India,...,10.7,11.4,11.8,10.4,6.0,5.0,8.1,10.1,9.7,9.4
2,359,Uttar Pradesh_2.0_Agra,1.760558e-01,0.327953,4,0.026577,3,10,-1,India,...,10.6,11.5,12.1,11.7,9.3,8.6,9.1,10.0,9.6,9.0
3,96,Gujarat_4.0_Ahmedabad,4.040000e+09,0.137692,3,0.002406,0,0,3,India,...,10.8,11.4,11.3,9.5,7.1,6.2,8.0,10.1,9.9,9.6
4,236,Maharashtra_1.0_Ahmednagar,8.777762e-04,0.029695,5,0.000380,2,0,0,India,...,10.8,11.3,11.5,8.6,6.7,6.0,7.0,9.3,9.5,9.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
435,155,Jharkhand_4.0_West Singhbhum,3.230000e+10,0.009204,5,8.120000,2,0,0,India,...,10.6,11.2,11.4,9.8,7.6,7.1,7.5,8.6,9.2,9.2
436,312,Telangana_2.0_Yadadri,9.784601e-04,0.025987,5,0.000249,2,0,0,India,...,8.9,9.8,10.9,11.1,11.0,10.8,10.3,8.8,7.2,6.9
437,174,Karnataka_6.0_Yadgir,9.341962e-02,0.086040,3,0.003987,0,0,-1,India,...,10.7,11.2,11.4,9.1,8.0,7.8,7.9,9.0,9.2,9.3
438,131,Haryana_3.0_Yamunanagar,5.600000e+09,0.105680,3,0.001047,0,0,1,India,...,10.6,11.5,12.1,11.4,9.4,9.1,9.2,10.0,9.5,8.5


In [14]:
# Clean unecessary columns from external data
for columns in list_columns:
    df_rabi_new[columns] = df_rabi_new[columns+"_new"]
    df_rabi_new = df_rabi_new.drop(columns=[columns+"_new"])

In [15]:
# Export csv as checkpoint for further preprocessing
df_kharif_new.to_csv('cities_climate/SCOR_Cities_Climate_integrated_kharif.csv')
df_rabi_new.to_csv('cities_climate/SCOR_Cities_Climate_integrated_rabi.csv')