**Scenarios (SSPs):**

To capture the space of the possible futures, different Shared Socio-Economic Pathways
(SSPs) (Rosa et al. 2017), with their corresponding Representative Concentration Pathways
(RCPs) were considered. Several (4) climate models were taken into consideration for the
calculation of the bioclimatic variables corresponding to each RCPs – again to capture the
space of uncertainty.
SSPs Learn more
1. SSP1 “Sustainability », coupled with RCP 2.6
2. SSP2 « Middle of the road” coupled with RCP4.5
3. SSP3 “Regional rivalry” coupled generally with RCP7.0, here with RCP6.0
because bioclimatic data (Worldclim) availaibility
4. SSP4 “Inequality” coupled with RCP6.0
5. SSP5 “Fossil fuel development” coupled with RCP8.5


The corresponding RCP scenarios are named after the radiative forcing they model (e.g
RCP4.5 for a radiative forcing of +4,5 W/m2)
Climate models
Chosen to capture a wide space, and subject to data availaibility for the 4 RCPs:
- GISS-E2-R (gs)
- HadGEM2 ES (he)
- MIROC-ESM (mr)
- CCSM4 (cc)

If you want to simplify, just pick one climate model (e.g “he”) and focus on it.
Columns
Response variable:

In [1]:
import pandas as pd
import numpy as np

import json

Let have a look at two different scenario for the same climate model : he

In [2]:
# Load the dataset
data_folder = 'NewDataForStudents/'
raster_folder = 'rasters_for_sanity_check/'

ssp1 = pd.read_csv(data_folder+'2050_ssp1_he.csv')
ssp2 = pd.read_csv(data_folder+'2050_ssp2_he.csv')

**Response variable:** Production

You probably just want to visualize production and/or change in production (ΔCalories,
Calories 2000, Calories 2050), though yields may be interesting as well. Production is in
calories. Yields are in calories/hectare. Log of yield is also provided.
The symbol Δ means change between 2000 and 2050.
All production and yields are annual.

'%cropland 2050': % of the pixel that is cropland in 2050

'log(cal_per_ha) 2050' : log of the yields in 2050

'cal_per_ha 2050': yield in 2050

'Calories 2050' : Production in 2050

'ΔCalories' : Change in production between 2000 and 2050

**Predictors of yields**
To simplify, I gave you just a few predictors, for year 2000 and 2050 each time.
Climate variables: temperature and precipitation (Temperature in °C*10, precip in mm)
Soil variable: workability_index + Slope and altitude

'Temperature 2050','Precipitation 2050', 'workability_index 2050', 'slo
pe', 'altitude',

In [3]:
ssp1.head(5)

Unnamed: 0,pixel_id,Calories 2050,ΔCalories,%cropland 2050,log(cal_per_ha) 2050,cal_per_ha 2050,Δlog(cal_per_ha),Δcal_per_ha,lat,lon,ha_per_pixel,Temperature 2050,Precipitation 2050,workability_index 2050,slope,altitude,Population 2050
0,776595,55160180000.0,55160180000.0,0.005203,22.288948,4785939000.0,22.288948,4785939000.0,-75.0,96.25,2215.1946,-85,324,6,89.85227,106,0.0
1,780822,11767650000.0,11767650000.0,0.001041,22.348055,5077348000.0,22.348055,5077348000.0,-74.92,88.5,2227.2866,-74,313,6,89.91569,77,0.0
2,780823,127689100000.0,127689100000.0,0.011446,22.334404,5008508000.0,22.334404,5008508000.0,-74.92,88.58,2227.2866,-74,314,6,89.90728,90,0.0
3,780827,34350640000.0,34350640000.0,0.003122,22.32071,4940386000.0,22.320709,4940386000.0,-74.92,88.92,2227.2866,-75,313,6,89.12331,91,0.0
4,780915,77411260000.0,77411260000.0,0.007284,22.285923,4771483000.0,22.285923,4771483000.0,-74.92,96.25,2227.2866,-83,323,6,89.63344,88,0.0


In [4]:
ssp2.head()

Unnamed: 0,pixel_id,Calories 2050,ΔCalories,%cropland 2050,log(cal_per_ha) 2050,cal_per_ha 2050,Δlog(cal_per_ha),Δcal_per_ha,lat,lon,ha_per_pixel,Temperature 2050,Precipitation 2050,workability_index 2050,slope,altitude,Population 2050
0,359844,9029212000.0,9029212000.0,0.001041,22.848263,8372872000.0,22.848263,8372872000.0,-83.08,-73.0,1036.3317,-121,139,6,89.9716,116,0.0
1,359846,9067024000.0,9067024000.0,0.001041,22.852442,8407935000.0,22.852442,8407935000.0,-83.08,-72.83,1036.3317,-123,148,6,89.976685,179,0.0
2,359847,9542353000.0,9542353000.0,0.001041,22.903538,8848712000.0,22.903538,8848712000.0,-83.08,-72.75,1036.3317,-126,160,6,89.97851,245,0.0
3,359866,9679635000.0,9679635000.0,0.001041,22.917822,8976015000.0,22.917822,8976015000.0,-83.08,-71.17,1036.3317,-121,160,6,89.94319,206,0.0
4,364133,9200451000.0,9200451000.0,0.001041,22.855133,8430594000.0,22.855133,8430594000.0,-83.0,-75.58,1048.7557,-128,142,6,89.97267,182,0.0


In [5]:
ssp1= ssp1.dropna()

In [6]:
Test = pd.DataFrame()

In [7]:
Test['Calories'] = ssp1['Calories 2050'].apply(lambda x:int(np.log(x)*1000))
Test['Cropland'] = ssp1['%cropland 2050']
#Test['log(cal_per_ha) 2050'] = ssp1['log(cal_per_ha) 2050']
Test['Temp'] = ssp1['Temperature 2050'].apply(lambda x: x/10)
Test['Population'] = ssp1['Population 2050']
Test['lon'] = ssp1['lon']
Test['lat'] = -ssp1['lat']

In [8]:
Test.head()

Unnamed: 0,Calories,Cropland,Temp,Population,lon,lat
0,24733,0.005203,-8.5,0.0,96.25,75.0
1,23188,0.001041,-7.4,0.0,88.5,74.92
2,25572,0.011446,-7.4,0.0,88.58,74.92
3,24259,0.003122,-7.5,0.0,88.92,74.92
4,25072,0.007284,-8.3,0.0,96.25,74.92


In [9]:
Test.columns

Index(['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'], dtype='object')

In [10]:
# All different values of lat
vec_lat = Test['lat'].unique()

In [11]:
# Init the Dataframe 
reduced_calories = pd.DataFrame(columns=['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])

In [12]:
# This code will compress the data by taking the mean of 5 (sample) points having the same latitude 
# and a close longitude distance = 1 (distance)

sample = 5
distance = 1
iterat = 0

# Iterate through all the different 'lat' values
for lat in vec_lat:
    lat_ind = np.where(Test['lat']==lat)[0]
    temp_df_lat = Test.iloc[lat_ind]
    
    # index will iterate from 0 to 'sample' to create a mean of the values
    index = 0
    
    # For each group in samples using mean of near distance
    for i in range(temp_df_lat.shape[0]):
        
        temp_cal = temp_df_lat.iloc[i]['Calories']
        temp_crop = temp_df_lat.iloc[i]['Cropland']
        temp_temp= temp_df_lat.iloc[i]['Temp']
        temp_pop = temp_df_lat.iloc[i]['Population']
        temp_lon = temp_df_lat.iloc[i]['lon']
        temp_lat = temp_df_lat.iloc[i]['lat']
        
        # First value of the subsample
        if(index == 0):
            # the 'prev_X' values will be tested with the next point to check the longitude distance
            prev_cal = temp_cal
            prev_crop = temp_crop
            prev_temp = temp_temp
            prev_pop = temp_pop
            prev_lon = temp_lon
            prev_lat = temp_lat
            
            mean_df = pd.DataFrame([[temp_cal, temp_crop, temp_temp, temp_pop, temp_lon,temp_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
            index =index +1
        
        else:
            # Test if the longitude distance between the points is less than 'distance'
            if( (max(temp_lon,prev_lon) - min(temp_lon,prev_lon) <= distance) ):  
                if(index < sample):
                    temp_df = pd.DataFrame([[temp_cal, temp_crop, temp_temp, temp_pop, temp_lon,temp_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
                    mean_df = mean_df.append(temp_df)
                    index = index + 1
                    
                elif(index == sample):
                    # Add the last value of the subsample
                    temp_df = pd.DataFrame([[temp_cal, temp_crop, temp_temp, temp_pop, temp_lon,temp_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
                    mean_df = mean_df.append(temp_df)
                    
                    # Compute the mean for the 5 ('sample') subsamples 
                    temp_mean = mean_df.mean()
                    mean_cal = temp_mean[0]
                    mean_crop = temp_mean[1]
                    mean_temp = temp_mean[2]
                    mean_pop = temp_mean[3]
                    mean_long = temp_mean[4]
                    mean_lat = temp_mean[5]
                    
                    # Append the values to the dataframe
                    temp_df2 = pd.DataFrame([[mean_cal, mean_crop, mean_temp, mean_pop, mean_long,mean_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
                    reduced_calories = reduced_calories.append(temp_df2, ignore_index=True)
                    index = 0
                
            # The point is too far from the previous point : create a new subsample and store the previous one
            else:
                # Compute the mean for the 5 ('sample') subsamples 
                temp_mean = mean_df.mean()
                mean_cal = temp_mean[0]
                mean_crop = temp_mean[1]
                mean_temp = temp_mean[2]
                mean_pop = temp_mean[3]
                mean_long = temp_mean[4]
                mean_lat = temp_mean[5]
                
                # Append the values to the dataframe
                temp_df2 = pd.DataFrame([[mean_cal, mean_crop, mean_temp, mean_pop, mean_long,mean_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
                reduced_calories = reduced_calories.append(temp_df2, ignore_index=True)

                # Start new subsample
                prev_cal = temp_cal
                prev_crop = temp_crop
                prev_temp = temp_temp
                prev_pop = temp_pop
                prev_lon = temp_lon
                prev_lat = temp_lat
                mean_df = pd.DataFrame([[temp_cal, temp_crop, temp_temp, temp_pop, temp_lon,temp_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
                index = 1
                
        if(i == temp_df_lat.shape[0]-1):
            # If final point of the dataframe, store it
            temp_mean = mean_df.mean()
            mean_cal = temp_mean[0]
            mean_crop = temp_mean[1]
            mean_temp = temp_mean[2]
            mean_pop = temp_mean[3]
            mean_long = temp_mean[4] 
            mean_lat = temp_mean[5]
            
            temp_df2 = pd.DataFrame([[mean_cal, mean_crop, mean_temp, mean_pop, mean_long,mean_lat]], columns = ['Calories', 'Cropland', 'Temp', 'Population', 'lon', 'lat'])
            reduced_calories = reduced_calories.append(temp_df2, ignore_index=True)
                
    # Observe the progression
    iterat = iterat +1
    print("Iteration %d / %d" % (iterat, len(vec_lat)))
            

Iteration 1 / 1564
Iteration 2 / 1564
Iteration 3 / 1564
Iteration 4 / 1564
Iteration 5 / 1564
Iteration 6 / 1564
Iteration 7 / 1564
Iteration 8 / 1564
Iteration 9 / 1564
Iteration 10 / 1564
Iteration 11 / 1564
Iteration 12 / 1564
Iteration 13 / 1564
Iteration 14 / 1564
Iteration 15 / 1564
Iteration 16 / 1564
Iteration 17 / 1564
Iteration 18 / 1564
Iteration 19 / 1564
Iteration 20 / 1564
Iteration 21 / 1564
Iteration 22 / 1564
Iteration 23 / 1564
Iteration 24 / 1564
Iteration 25 / 1564
Iteration 26 / 1564
Iteration 27 / 1564
Iteration 28 / 1564
Iteration 29 / 1564
Iteration 30 / 1564
Iteration 31 / 1564
Iteration 32 / 1564
Iteration 33 / 1564
Iteration 34 / 1564
Iteration 35 / 1564
Iteration 36 / 1564
Iteration 37 / 1564
Iteration 38 / 1564
Iteration 39 / 1564
Iteration 40 / 1564
Iteration 41 / 1564
Iteration 42 / 1564
Iteration 43 / 1564
Iteration 44 / 1564
Iteration 45 / 1564
Iteration 46 / 1564
Iteration 47 / 1564
Iteration 48 / 1564
Iteration 49 / 1564
Iteration 50 / 1564
Iteration

Iteration 397 / 1564
Iteration 398 / 1564
Iteration 399 / 1564
Iteration 400 / 1564
Iteration 401 / 1564
Iteration 402 / 1564
Iteration 403 / 1564
Iteration 404 / 1564
Iteration 405 / 1564
Iteration 406 / 1564
Iteration 407 / 1564
Iteration 408 / 1564
Iteration 409 / 1564
Iteration 410 / 1564
Iteration 411 / 1564
Iteration 412 / 1564
Iteration 413 / 1564
Iteration 414 / 1564
Iteration 415 / 1564
Iteration 416 / 1564
Iteration 417 / 1564
Iteration 418 / 1564
Iteration 419 / 1564
Iteration 420 / 1564
Iteration 421 / 1564
Iteration 422 / 1564
Iteration 423 / 1564
Iteration 424 / 1564
Iteration 425 / 1564
Iteration 426 / 1564
Iteration 427 / 1564
Iteration 428 / 1564
Iteration 429 / 1564
Iteration 430 / 1564
Iteration 431 / 1564
Iteration 432 / 1564
Iteration 433 / 1564
Iteration 434 / 1564
Iteration 435 / 1564
Iteration 436 / 1564
Iteration 437 / 1564
Iteration 438 / 1564
Iteration 439 / 1564
Iteration 440 / 1564
Iteration 441 / 1564
Iteration 442 / 1564
Iteration 443 / 1564
Iteration 444

Iteration 788 / 1564
Iteration 789 / 1564
Iteration 790 / 1564
Iteration 791 / 1564
Iteration 792 / 1564
Iteration 793 / 1564
Iteration 794 / 1564
Iteration 795 / 1564
Iteration 796 / 1564
Iteration 797 / 1564
Iteration 798 / 1564
Iteration 799 / 1564
Iteration 800 / 1564
Iteration 801 / 1564
Iteration 802 / 1564
Iteration 803 / 1564
Iteration 804 / 1564
Iteration 805 / 1564
Iteration 806 / 1564
Iteration 807 / 1564
Iteration 808 / 1564
Iteration 809 / 1564
Iteration 810 / 1564
Iteration 811 / 1564
Iteration 812 / 1564
Iteration 813 / 1564
Iteration 814 / 1564
Iteration 815 / 1564
Iteration 816 / 1564
Iteration 817 / 1564
Iteration 818 / 1564
Iteration 819 / 1564
Iteration 820 / 1564
Iteration 821 / 1564
Iteration 822 / 1564
Iteration 823 / 1564
Iteration 824 / 1564
Iteration 825 / 1564
Iteration 826 / 1564
Iteration 827 / 1564
Iteration 828 / 1564
Iteration 829 / 1564
Iteration 830 / 1564
Iteration 831 / 1564
Iteration 832 / 1564
Iteration 833 / 1564
Iteration 834 / 1564
Iteration 835

Iteration 1171 / 1564
Iteration 1172 / 1564
Iteration 1173 / 1564
Iteration 1174 / 1564
Iteration 1175 / 1564
Iteration 1176 / 1564
Iteration 1177 / 1564
Iteration 1178 / 1564
Iteration 1179 / 1564
Iteration 1180 / 1564
Iteration 1181 / 1564
Iteration 1182 / 1564
Iteration 1183 / 1564
Iteration 1184 / 1564
Iteration 1185 / 1564
Iteration 1186 / 1564
Iteration 1187 / 1564
Iteration 1188 / 1564
Iteration 1189 / 1564
Iteration 1190 / 1564
Iteration 1191 / 1564
Iteration 1192 / 1564
Iteration 1193 / 1564
Iteration 1194 / 1564
Iteration 1195 / 1564
Iteration 1196 / 1564
Iteration 1197 / 1564
Iteration 1198 / 1564
Iteration 1199 / 1564
Iteration 1200 / 1564
Iteration 1201 / 1564
Iteration 1202 / 1564
Iteration 1203 / 1564
Iteration 1204 / 1564
Iteration 1205 / 1564
Iteration 1206 / 1564
Iteration 1207 / 1564
Iteration 1208 / 1564
Iteration 1209 / 1564
Iteration 1210 / 1564
Iteration 1211 / 1564
Iteration 1212 / 1564
Iteration 1213 / 1564
Iteration 1214 / 1564
Iteration 1215 / 1564
Iteration 

Iteration 1544 / 1564
Iteration 1545 / 1564
Iteration 1546 / 1564
Iteration 1547 / 1564
Iteration 1548 / 1564
Iteration 1549 / 1564
Iteration 1550 / 1564
Iteration 1551 / 1564
Iteration 1552 / 1564
Iteration 1553 / 1564
Iteration 1554 / 1564
Iteration 1555 / 1564
Iteration 1556 / 1564
Iteration 1557 / 1564
Iteration 1558 / 1564
Iteration 1559 / 1564
Iteration 1560 / 1564
Iteration 1561 / 1564
Iteration 1562 / 1564
Iteration 1563 / 1564
Iteration 1564 / 1564


In [13]:
reduced_calories.head()

Unnamed: 0,Calories,Cropland,Temp,Population,lon,lat
0,24733.0,0.005203,-8.5,0.0,96.25,75.0
1,24339.666667,0.005203,-7.433333,0.0,88.666667,74.92
2,24446.5,0.004683,-8.45,0.0,96.46,74.92
3,24938.0,0.006243,-9.5,0.0,101.67,74.92
4,24848.5,0.006764,-9.2,0.0,103.29,74.92


In [23]:
# Write to json file 
reduced_calories.to_json('data.json', orient='records')

In [25]:
# Transform the json file into geojson file

in_file = 'data.json'
out_file = 'final.geojson'


data = json.load(open(in_file))

geojson = {
    "type": "FeatureCollection",
    "features": [
    {
        "type": "Feature",
        "geometry" : {
            "type": "Point",
            "coordinates": [d["lon"], d["lat"]]
            },
        "properties" : {
            "calories": d["Calories"],
            "cropland": d["Cropland"],
            "temperature": d["Temp"],
            "population": d["Population"], 
            }
     } for d in data]
}


output = open(out_file, 'w')
json.dump(geojson, output)

In [14]:
normalized_df=(reduced_calories-reduced_calories.min())/(reduced_calories.max()-reduced_calories.min())

In [18]:
normalized_df['lon'] = reduced_calories['lon'];
normalized_df['lat'] = reduced_calories['lat'];
normalized_df.head()

Unnamed: 0,Calories,Cropland,Temp,Population,lon,lat
0,0.205251,0.004199,0.176121,0.0,96.25,75.0
1,0.164128,0.004199,0.19723,0.0,88.666667,74.92
2,0.175298,0.003678,0.177111,0.0,96.46,74.92
3,0.226684,0.005241,0.156332,0.0,101.67,74.92
4,0.217327,0.005762,0.162269,0.0,103.29,74.92


In [19]:
# Write to json file 
normalized_df.to_json('normalized_data.json', orient='records')

In [22]:
# Transform the json file into geojson file

in_file = 'normalized_data.json'
out_file = 'final_normalized.geojson'


data = json.load(open(in_file))

geojson = {
    "type": "FeatureCollection",
    "features": [
    {
        "type": "Feature",
        "geometry" : {
            "type": "Point",
            "coordinates": [d["lon"], d["lat"]]
            },
        "properties" : {
            "calories": d["Calories"],
            "cropland": d["Cropland"],
            "temperature": d["Temp"],
            "population": d["Population"], 
            }
     } for d in data]
}


output = open(out_file, 'w')
json.dump(geojson, output)