# Define the working environment

The following cells are used to: 
- Import required libraries
- Set the environment variables for Python, Anaconda, GRASS GIS and R 
- Define the ["GRASSDATA" folder](https://grass.osgeo.org/grass73/manuals/helptext.html), along with the name of the "location" and the "mapset" in which you will work.

**Import libraries**

In [1]:
## Import libraries needed for setting parameters of operating system 
import os
import sys

## Import library for temporary files creation 
import tempfile 

## Import Pandas library
import pandas as pd

## Import Numpy library
import numpy as np

## Import subprocess
import subprocess

## Import multiprocessing
import multiprocessing

**Set 'Python' and 'GRASS GIS' environment variables**

Here, we set [the environment variables allowing to use of GRASS GIS](https://grass.osgeo.org/grass64/manuals/variables.html) inside this Jupyter notebook. Please modify the directory paths, so that they match your own system configuration. 

If you are working on Windows, with the GRASS GIS [stand-alone installation](https://grass.osgeo.org/download/software/ms-windows/), the paths displayed below should be similar. 

The setting of environmental variables could be improved like proposed on [this GRASS wiki page](https://grasswiki.osgeo.org/wiki/Working_with_GRASS_without_starting_it_explicitly#Python:_GRASS_GIS_7_without_existing_location_using_metadata_only).

In [2]:
### Define GRASS GIS environment variables
os.environ['GISBASE'] = 'C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78'
os.environ['PATH'] = 'C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\lib;C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\bin;C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\extrabin' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = 'C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\etc;C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\etc\\python\\grass;C:\\Python27' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = 'C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\\etc\\python\\grass;C:\\Users\\s2080249\\AppData\\Roaming\\GRASS7\\addons\\scripts' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = 'C:\\ProgramData\\Anaconda3\\Lib\\site-packages' + os.pathsep + os.environ['PATH']
os.environ['PATH'] = 'C:\Program Files\QGIS 3.12\apps\grass\grass78\scripts' + os.pathsep + os.environ['PATH']
os.environ['PYTHONLIB'] = 'C:\\Python27' 
os.environ['PYTHONPATH'] = 'C:\\Program Files\\QGIS 3.12\\apps\grass\\grass78\\etc\\\python\\grass'
os.environ['GIS_LOCK'] = '$$'
os.environ['GISRC'] = 'C:\\Users\\s2080249\\AppData\\Roaming\\GRASS7\\rc'
os.environ['GDAL_DATA'] = 'C:\\Program Files\\QGIS 3.12\\apps\\grass\\grass78\share\\gdal'

In [3]:
## Define GRASS-Python environment
sys.path.append(os.path.join(os.environ['GISBASE'],'etc','python'))

Please notice that paths will differ if you installed GRASS through the 'OSGeo4W package'. Here are some identified environment variables to use with a OSGeo4W installation: 
- grass7bin_win = 'C:\\OSGeo4W64\\bin\\grass73svn.bat'
- os.environ['GISBASE'] = 'C:\\OSGeo4W64\\apps\\grass\\grass-7.3.svn'
- os.environ['PATH'] = 'C:\\OSGeo4W64\\bin' + os.pathsep + os.environ['PATH']
- os.environ['PYTHONLIB'] = 'C:\\OSGeo4W64\\apps\\Python27'
- os.environ['GDAL_DATA'] = 'C:\\OSGeo4W64\\share\\gdal'

**Set 'R statistical computing software' environment variables**

Here, we set [the environment variables allowing to use the R statistical computing software](https://stat.ethz.ch/R-manual/R-devel/library/base/html/EnvVar.html) inside this Jupyter notebook. Please change the directory path to match your system configuration. If you are working on Windows, the paths below should be similar. 

Please notice that you will probably have to set the path of R_LIBS_USER also directly in R interface. For that, open R software (or [Rstudio software](https://www.rstudio.com/)) and enter the following command in the command prompt (you should adapt this path to match your own configuration: **.libPaths('C:\\R_LIBS_USER\\win-library\\3.3')**

## Add the R software directory to the general PATH
os.environ['PATH'] = 'C:\\Program Files\\R\\R-3.3.0\\bin' + os.pathsep + os.environ['PATH']

## Set R software specific environment variables
os.environ['R_HOME'] = 'C:\Program Files\R\R-3.3.0'
os.environ['R_ENVIRON'] = 'C:\Program Files\R\R-3.3.0\etc\x64'
os.environ['R_DOC_DIR'] = 'C:\Program Files\R\R-3.3.0\doc'
os.environ['R_LIBS'] = 'C:\Program Files\R\R-3.3.0\library'
os.environ['R_LIBS_USER'] = 'C:\R_LIBS_USER\win-library\\3.3'

# User inputs

In [4]:
## Define a empty dictionnary for saving user inputs
user={}

**Display current environment variables of your computer**

In [5]:
## Display the current defined environment variables
for key in os.environ.keys():
    print ("%s = %s \t" % (key,os.environ[key]))

ALLUSERSPROFILE = C:\ProgramData 	
APPDATA = C:\Users\s2080249\AppData\Roaming 	
CLIENTNAME = DESKTOP-3DN038Q 	
COMMONPROGRAMFILES = C:\Program Files\Common Files 	
COMMONPROGRAMFILES(X86) = C:\Program Files (x86)\Common Files 	
COMMONPROGRAMW6432 = C:\Program Files\Common Files 	
COMPUTERNAME = UT153204 	
COMSPEC = C:\Windows\system32\cmd.exe 	
CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1 	
CUDA_PATH_V10_1 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1 	
DRIVERDATA = C:\Windows\System32\Drivers\DriverData 	
FPS_BROWSER_APP_PROFILE_STRING = Internet Explorer 	
FPS_BROWSER_USER_PROFILE_STRING = Default 	
HOMEDRIVE = C: 	
HOMEPATH = \Users\s2080249 	
LOCALAPPDATA = C:\Users\s2080249\AppData\Local 	
LOGONSERVER = \\DC23AD 	
NUMBER_OF_PROCESSORS = 32 	
NVCUDASAMPLES10_1_ROOT = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1 	
NVCUDASAMPLES_ROOT = C:\ProgramData\NVIDIA Corporation\CUDA Samples\v10.1 	
NVTOOLSEXT_PATH = C:\Program Files\NVIDIA Corpora

**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

## output location

In [6]:
#outputfolder="E:\\internship19\\Results_2017\classification_17"
# Define output folders for results
#outputfolder="E:\\internship19\\Results_2017\\classification17_supervised"
#outputfolder_classification17_supervised = os.path.join(outputfolder,'classification17_supervised')


Here after:
- Enter the path to the directory you want to use as "[GRASSDATA](https://grass.osgeo.org/programming7/loc_struct.png)". 
- Enter the name of the location in which you want to work and its projection information in [EPSG code](http://spatialreference.org/ref/epsg/) format. Please note that the GRASSDATA folder and locations will be automatically created if they do not yet exist. If the location name already exists, the projection information will not be used.  
- Enter the name you want for the mapsets which will be used later for Unsupervised Segmentation Parameter Optimization (USPO), Segmentation and Classification steps.

In [6]:
## Enter the path to GRASSDATA folder
user["gisdb"] = "E:\\UsersData\\owusu\\NEW_GRASSDATA"
## Enter the name of the location (existing or for a new one)
user["location"] = "Accra_32630"
## Enter the EPSG code for this location 
user["locationepsg"] = "32630"
## Enter the name of the mapset to use for Unsupervised Segmentation Parameter Optimization (USPO) step
#user["uspo_mapsetname"] = "accra_USPO13"
## Enter the name of the mapset to use for segmentation step
#user["segmentation_mapsetname"] = "accra_Seg13"
## Enter the name of the mapset to use for classification step
#user["CLASSIFICATION_LC_13_mapsetname"] = "CLASSIFICATION_LC_13"

**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

# Define the GRASSDATA folder and create GRASS location and mapsets

Here after, the python script will check if the GRASSDATA folder, locations and mapsets already exist. If not, they will be automatically created.

**Import GRASS Python packages**

In [7]:
## Import libraries needed to launch GRASS GIS in the jupyter notebook
import grass.script.setup as gsetup

## Import libraries needed to call GRASS using Python
import grass.script as grass

**Define GRASSDATA folder and create location and mapsets**

In [8]:
## Automatic creation of GRASSDATA folder
if os.path.exists(user["gisdb"]):
    print ("GRASSDATA folder already exists") 
else: 
    os.makedirs(user["gisdb"]) 
    print ("GRASSDATA folder created in "+user["gisdb"])

GRASSDATA folder already exists


In [9]:
## Automatic creation of GRASS location if it doesn't exist
if os.path.exists(os.path.join(user["gisdb"],user["location"])):
    print ("Location "+user["location"]+" already exists") 
else : 
    if sys.platform.startswith('win'):
        grass7bin = grass7bin_win
        startcmd = grass7bin + ' -c epsg:' + user["locationepsg"] + ' -e ' + os.path.join(user["gisdb"],user["location"])
        p = subprocess.Popen(startcmd, shell=True, 
                             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = p.communicate()
        if p.returncode != 0:
            print (sys.stderr, 'ERROR: %s' % err)
            print (sys.stderr, 'ERROR: Cannot generate location (%s)' % startcmd)
            sys.exit(-1)
        else:
            print ('Created location %s' % os.path.join(user["gisdb"],user["location"]))
    else:
        print ('This notebook was developed for use with Windows. It seems you are using another OS.')

Location Accra_32630 already exists


## Automatic creation of GRASS location if it doesn't exist
if os.path.exists(os.path.join(user["gisdb"],user["location"])):
    print ("Location "+user["location"]+" already exists") 
else : 
    if sys.platform.startswith('win'):
        grass7bin = grass7bin_win
        startcmd = grass7bin + ' -c epsg:' + user["locationepsg"] + ' -e ' + os.path.join(user["gisdb"],user["location"])
        p = subprocess.Popen(startcmd, shell=True, 
                             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        out, err = p.communicate()
        if p.returncode != 0:
            print (>>sys.stderr, 'ERROR: %s' % err)
            print (>>sys.stderr, 'ERROR: Cannot generate location (%s)' % startcmd)
            sys.exit(-1)
        else:
            print ('Created location %s' % os.path.join(user["gisdb"],user["location"]))
    else:
        print ('This notebook was developed for use with Windows. It seems you are using another OS.')

### Automatic creation of GRASS GIS mapsets

## Import library for file copying 
import shutil

## USPO mapset
mapsetname=user["uspo_mapsetname"]
if os.path.exists(os.path.join(user["gisdb"],user["location"],mapsetname)):
    print "'"+mapsetname+"' mapset already exists" 
else: 
    os.makedirs(os.path.join(user["gisdb"],user["location"],mapsetname))
    shutil.copy(os.path.join(user["gisdb"],user["location"],'PERMANENT','WIND'),
                os.path.join(user["gisdb"],user["location"],mapsetname,'WIND'))
    print "'"+mapsetname+"' mapset created in location "+user["gisdb"]

## SEGMENTATION mapset
mapsetname=user["segmentation_mapsetname"]
if os.path.exists(os.path.join(user["gisdb"],user["location"],mapsetname)):
    print "'"+mapsetname+"' mapset already exists" 
else: 
    os.makedirs(os.path.join(user["gisdb"],user["location"],mapsetname))
    shutil.copy(os.path.join(user["gisdb"],user["location"],'PERMANENT','WIND'),
                os.path.join(user["gisdb"],user["location"],mapsetname,'WIND'))
    print "'"+mapsetname+"' mapset created in location "+user["gisdb"]

## CLASSIFICATION mapset
mapsetname=user["classification_mapsetname"]
if os.path.exists(os.path.join(user["gisdb"],user["location"],mapsetname)):
    print "'"+mapsetname+"' mapset already exists" 
else: 
    os.makedirs(os.path.join(user["gisdb"],user["location"],mapsetname))
    shutil.copy(os.path.join(user["gisdb"],user["location"],'PERMANENT','WIND'),
                os.path.join(user["gisdb"],user["location"],mapsetname,'WIND'))
    print "'"+mapsetname+"' mapset created in location "+user["gisdb"]

**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

# Define functions

This section of the notebook is dedicated to defining functions which will then be called later in the script. If you want to create your own functions, define them here.

## Function for computing processing time

The "print_processing_time" function is used to calculate and display the processing time for various stages of the processing chain. At the beginning of each major step, the current time is stored in a new variable, using [time.time() function](https://docs.python.org/2/library/time.html). At the end of the stage in question, the "print_processing_time" function is called and takes as an argument, the name of this new variable containing the recorded time at the beginning of the stage, and an output message.

In [10]:
## Import library for managing time in python
import time  

## Function "print_processing_time()" compute processing time and print it.
# The argument "begintime" wait for a variable containing the begintime (result of time.time()) of the process for which to compute processing time.
# The argument "printmessage" wait for a string format with information about the process. 
def print_processing_time(begintime, printmessage):    
    endtime=time.time()           
    processtime=endtime-begintime
    remainingtime=processtime

    days=int((remainingtime)/86400)
    remainingtime-=(days*86400)
    hours=int((remainingtime)/3600)
    remainingtime-=(hours*3600)
    minutes=int((remainingtime)/60)
    remainingtime-=(minutes*60)
    seconds=round((remainingtime)%60,1)

    if processtime<60:
        finalprintmessage=str(printmessage)+str(seconds)+" seconds"
    elif processtime<3600:
        finalprintmessage=str(printmessage)+str(minutes)+" minutes and "+str(seconds)+" seconds"
    elif processtime<86400:
        finalprintmessage=str(printmessage)+str(hours)+" hours and "+str(minutes)+" minutes and "+str(seconds)+" seconds"
    elif processtime>=86400:
        finalprintmessage=str(printmessage)+str(days)+" days, "+str(hours)+" hours and "+str(minutes)+" minutes and "+str(seconds)+" seconds"
    
    return finalprintmessage

In [11]:
"""
This function is used to compute ERP (Equivalent Reference Probability) from a csv file with membership probabilities to different classes
"""

import csv, os, sys, math, tempfile

def ComputeERPfromCsv(in_path, delimiter=',', erp_name="ERP", start_index=1, stop_index=False):   # "start_index" should contain the index where probabilities starts, "stop_index" where it stops.
    # Open files and create reader
    infile = open(in_path, 'r')
    reader = csv.reader(infile, delimiter=delimiter)

    # Csv header
    out_header = reader.__next__() #Get the header (first line) of input csv
    out_header.append(erp_name)

    # Create list containing output content
    out_content = []
    out_content.append(out_header) #Add the output header

    # Compute ERP values from probabilities and store new lines in a list
    for row in reader:
        out_line = row  # Copy the input file line
        if stop_index:
            frow = [float(x) for x in row[start_index:stop_index+1]] #Make sure all columns are float except for Xth first column (parameter "start_index")
        else:
            frow = [float(x) for x in row[start_index:]] #Make sure all columns are float except for Xth first column (parameter "start_index")
        probs = [x/sum(frow) for x in frow] #Make sure that the sum of probabilities equal to 1
        maxpistar = max(probs) #Take the maximum probability
        temp = [x for x in probs if x!=maxpistar] #Take probability of all but the class with the max probability
        tempEnt = [y*math.log(y) for y in temp if y>0] # If the probability is equal to zero, then the entropy will be zero
        if (1-maxpistar) > 0:
            EDI = math.log(maxpistar)-sum(tempEnt)/(1-maxpistar)
            out_line.append(math.exp(EDI)/(math.exp(EDI)+len(probs)-1))
        else:
            out_line.append(1)
        out_content.append(out_line)

    # Create output file
    out_path = "%s_ERP.csv"%os.path.splitext(in_path)[0]
    outfile = open(out_path, 'w')
    writer = csv.writer(outfile, delimiter=delimiter)
    writer.writerows(out_content)

    # Return
    return out_path

In [12]:
## Saving current time for processing time management
begintime_full=time.time()

**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

# Preparing csv files for classification 

In [13]:
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
os.chdir("E:\\UsersData\\owusu\\new_results\\streetblocks_stats\\streetblock_stats_13")

In [14]:
## location of csv files 

segment_stat17="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\full_streetblocks17.csv" 
zonal_stat17="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\zonal_stat17.csv"  
Trainset_segment_stat17="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\sample_streetblocks17.csv" 
Trainset_zonal_stat17="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\sample_zonal_stat17.csv" 
all_outputcsv="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\merge_all_17.csv"
trainset_outputcsv="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\trainset_all_17.csv"

In [15]:
label17="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\labels17b.csv"

In [16]:
# import labels 
label17=pd.read_csv(label17, sep=',',header=0)


In [17]:
label17.head()

Unnamed: 0,cat,Class_num
0,7,5
1,96,5
2,120,6
3,138,5
4,166,2


In [18]:
# opening the files into 
segment_stat17=pd.read_csv(segment_stat17, sep=',',header=0)
zonal_stat17=pd.read_csv(zonal_stat17, sep=',',header=0)
# opening the files into 
Trainset_segment_stat17=pd.read_csv(Trainset_segment_stat17, sep=',',header=0)
Trainset_zonal_stat17=pd.read_csv(Trainset_zonal_stat17, sep=',',header=0)

In [19]:
segment_stat17.head()

Unnamed: 0,cat,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_min,nir17_27_Var_max,nir17_27_Var_mean,nir17_27_Var_stddev,nir17_27_Var_variance,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart
0,0,0.587133,1.378078,832588.087644,623953.605067,30,223,111.730995,25.866555,669.078664,...,14.914319,1691.458618,310.347842,284.356356,80858.536953,91.625047,7944284.0,137.551,213.016,358.889
1,1,0.810431,1.331882,832537.157363,624237.046915,35,195,123.057114,22.199643,492.824134,...,41.722458,541.961426,153.719555,77.097077,5943.959271,50.154372,2317322.0,108.406,135.481,168.209
2,2,0.656656,1.443556,832625.412855,624602.496236,55,221,124.240012,23.527559,553.546042,...,90.362854,359.71225,184.826422,53.815348,2896.091728,29.116697,638390.5,148.795,166.322,210.34
3,3,0.723537,1.312804,833320.933644,624886.522991,29,252,126.483011,27.426131,752.192653,...,7.140357,893.29718,228.623647,153.540157,23574.579945,67.158476,12797890.0,117.53,182.537,301.848
4,4,0.767842,1.378478,832641.231578,624820.628383,48,216,120.823769,18.215214,331.794016,...,52.436619,220.298401,113.934704,35.946147,1292.125486,31.549779,698875.5,82.5776,113.063,138.132


In [20]:
## Join between tables (pandas dataframe) on column 'cat'
training_sample_all=pd.merge(segment_stat17, zonal_stat17, on='cat')
training_sample_all.head()

Unnamed: 0,cat,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,0,0.587133,1.378078,832588.087644,623953.605067,30,223,111.730995,25.866555,669.078664,...,91.625047,7944284.0,137.551,213.016,358.889,1,0.85335,0.09489,0.03567,0.0161
1,1,0.810431,1.331882,832537.157363,624237.046915,35,195,123.057114,22.199643,492.824134,...,50.154372,2317322.0,108.406,135.481,168.209,1,0.88491,0.03502,0.07662,0.00345
2,2,0.656656,1.443556,832625.412855,624602.496236,55,221,124.240012,23.527559,553.546042,...,29.116697,638390.5,148.795,166.322,210.34,1,0.95455,0.00637,0.03764,0.00145
3,3,0.723537,1.312804,833320.933644,624886.522991,29,252,126.483011,27.426131,752.192653,...,67.158476,12797890.0,117.53,182.537,301.848,1,0.92701,0.01617,0.05625,0.00057
4,4,0.767842,1.378478,832641.231578,624820.628383,48,216,120.823769,18.215214,331.794016,...,31.549779,698875.5,82.5776,113.063,138.132,1,0.96299,0.00994,0.02543,0.00163


In [21]:
    
## Check and count for NaN values by column in the table
if training_sample_all.isnull().any().any():
    for colomn in list(training_sample_all.columns.values):
        if training_sample_all[colomn].isnull().any():
            print ("Column '"+str(colomn)+"' have "+str(training_sample_all[colomn].isnull().sum())+" NULL values")
else: print ("No missing values in dataframe") 
        
## Check and count for Inf values by column in the table
if np.isinf(training_sample_all).any().any():
    for colomn in list(training_sample_all.columns.values):
        if np.isinf(training_sample_all[colomn]).any():
            print ("Column '"+str(colomn)+"' have "+str(np.isinf(training_sample_all[colomn]).sum())+" Infinite values")
else: print ("No infinite values in dataframe") 

No missing values in dataframe
No infinite values in dataframe


In [22]:
## Check if there are NaN values in the table and print basic information
if training_sample_all.isnull().any().any():
    print ("WARNING: Some values are missing in the dataset")
else: 
    # Write dataframe in a .csv file
    training_sample_all.to_csv(path_or_buf="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\merge_all_17.csv", 
                       sep=',', header=True,  quoting=None, decimal='.', index=False)
    print ("A new csv table called 'merge_all_13', to be used for classification, have been created with "+str(len(training_sample_all))+" rows.")
    
## Display table
training_sample_all.head()

A new csv table called 'merge_all_13', to be used for classification, have been created with 18880 rows.


Unnamed: 0,cat,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,0,0.587133,1.378078,832588.087644,623953.605067,30,223,111.730995,25.866555,669.078664,...,91.625047,7944284.0,137.551,213.016,358.889,1,0.85335,0.09489,0.03567,0.0161
1,1,0.810431,1.331882,832537.157363,624237.046915,35,195,123.057114,22.199643,492.824134,...,50.154372,2317322.0,108.406,135.481,168.209,1,0.88491,0.03502,0.07662,0.00345
2,2,0.656656,1.443556,832625.412855,624602.496236,55,221,124.240012,23.527559,553.546042,...,29.116697,638390.5,148.795,166.322,210.34,1,0.95455,0.00637,0.03764,0.00145
3,3,0.723537,1.312804,833320.933644,624886.522991,29,252,126.483011,27.426131,752.192653,...,67.158476,12797890.0,117.53,182.537,301.848,1,0.92701,0.01617,0.05625,0.00057
4,4,0.767842,1.378478,832641.231578,624820.628383,48,216,120.823769,18.215214,331.794016,...,31.549779,698875.5,82.5776,113.063,138.132,1,0.96299,0.00994,0.02543,0.00163


In [23]:
## Join between tables (pandas dataframe) on column 'cat'
training_sample17=pd.merge(Trainset_segment_stat17, Trainset_zonal_stat17, on='cat')
training_sample17.head()

Unnamed: 0,cat,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,7,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,604.19522,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,96,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,723.865424,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,120,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,142.372332,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,138,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,2012.503428,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,166,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,679.682447,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


In [24]:
    
## Check and count for NaN values by column in the table
if training_sample17.isnull().any().any():
    for colomn in list(training_sample17.columns.values):
        if training_sample17[colomn].isnull().any():
            print ("Column '"+str(colomn)+"' have "+str(training_sample17[colomn].isnull().sum())+" NULL values")
else: print ("No missing values in dataframe") 
        
## Check and count for Inf values by column in the table
if np.isinf(training_sample17).any().any():
    for colomn in list(training_sample17.columns.values):
        if np.isinf(training_sample17[colomn]).any():
            print ("Column '"+str(colomn)+"' have "+str(np.isinf(training_sample17[colomn]).sum())+" Infinite values")
else: print ("No infinite values in dataframe") 

No missing values in dataframe
No infinite values in dataframe


In [25]:
## Check if there are NaN values in the table and print basic information
if training_sample17.isnull().any().any():
    print ("WARNING: Some values are missing in the dataset")
else: 
    # Write dataframe in a .csv file
    training_sample17.to_csv(path_or_buf="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\sample17_merge.csv", 
                       sep=',', header=True,  quoting=None, decimal='.', index=False)
    print ("A new csv table called 'sampled17_merge', to be used for classification, have been created with "+str(len(training_sample17))+" rows.")
    
## Display table
training_sample17.head()

A new csv table called 'sampled17_merge', to be used for classification, have been created with 735 rows.


Unnamed: 0,cat,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,7,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,604.19522,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,96,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,723.865424,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,120,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,142.372332,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,138,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,2012.503428,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,166,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,679.682447,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


In [26]:
## Join between tables (pandas dataframe) on column 'cat'
trainset17_label=pd.merge(label17, training_sample17, on='cat')
trainset17_label.head()

Unnamed: 0,cat,Class_num,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,7,5,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,96,5,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,120,6,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,138,5,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,166,2,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


In [27]:
## Check if there are NaN values in the table and print basic information
if trainset17_label.isnull().any().any():
    print ("WARNING: Some values are missing in the dataset")
else: 
    # Write dataframe in a .csv file
    trainset17_label.to_csv(path_or_buf="E:\\UsersData\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\labels_17.csv", 
                       sep=',', header=True,  quoting=None, decimal='.', index=False)
    print ("A new csv table called 'labels_17', to be used for classification and label, have been created with "+str(len(trainset17_label))+" rows.")
    
## Display table
trainset17_label.head()

A new csv table called 'labels_17', to be used for classification and label, have been created with 735 rows.


Unnamed: 0,cat,Class_num,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,7,5,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,96,5,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,120,6,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,138,5,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,166,2,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

**-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-**

# Land Use classification 

In [28]:
#PERFORMING THE CLASSIFICATION USING RANDOM FOREST

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier 
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
%matplotlib inline

## User input

In [29]:
##performing classification using the random forest module

segments_file="E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\merge_all_17.csv"
training_file="E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\labels_17.csv" 
outputcsv="E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\rules17"

In [30]:
raster_segments_map="sstreetblocks_all@PERMANENT"
#classified_map="indiv_classification17"
train_class_column="Class_num"

## Preprocessing of tables

In [31]:
# opening the files into 
training_csv=pd.read_csv(training_file, sep=',',header=0)
segments_csv=pd.read_csv(segments_file, sep=',',header=0)

In [32]:
training_csv.drop('cat',axis=1, inplace=True)
training_csv.head()

Unnamed: 0,Class_num,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,5,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,604.19522,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,5,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,723.865424,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,6,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,142.372332,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,5,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,2012.503428,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,2,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,679.682447,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


In [33]:
x=training_csv.drop(train_class_column,axis=1)
y=training_csv[train_class_column]
x.head()

Unnamed: 0,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,img17_red_coeff_var,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,0.594937,1.34319,832258.872221,628995.657224,45,240,127.151214,24.580383,604.19522,19.331615,...,72.446672,6171929.0,39.8849,78.725,128.976,1,0.44512,0.3196,0.23494,0.00035
1,0.718376,1.382459,807626.665123,618046.813705,57,228,139.900945,26.904747,723.865424,19.231283,...,65.439819,1119288.0,74.3441,115.694,175.373,1,0.94556,0.01273,0.04171,0.0
2,0.684108,1.318037,810405.520705,623055.27134,20,162,41.841665,11.931988,142.372332,28.517001,...,30.093968,31633500.0,377.518,467.476,565.526,2,0.00642,0.9813,0.00434,0.00794
3,0.784419,1.393579,812710.364779,620161.980701,24,248,115.590147,44.860934,2012.503428,38.810344,...,43.453157,4158810.0,687.83,1133.4,1427.13,1,0.8804,0.00914,0.00025,0.11021
4,0.763016,1.348544,821654.879173,619250.341108,38,239,117.496543,26.07072,679.682447,22.1885,...,36.489184,3126160.0,168.649,225.924,286.233,1,0.80619,0.10631,0.08081,0.00669


### Get train and test set

In [34]:
#use a train-test-split to split the data into training and testing (validation data)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=66)

## RANDOM FOREST

### Fit the model to train data

In [None]:
#WITH PARAMETER SEARCH

#hyperparameter search
rfc = RandomForestClassifier()
from sklearn.model_selection import RandomizedSearchCV
# number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# number of features at every split
max_features = ['auto', 'sqrt']

# max depth
max_depth = [int(x) for x in np.linspace(100, 500, num = 11)]
max_depth.append(None)
# create random grid
random_grid = {
 'n_estimators': n_estimators,
 'max_features': max_features,
 'max_depth': max_depth
 }
# Random search of parameters
rfc_random = RandomizedSearchCV(estimator = rfc, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1)
# Fit the model
rfc_random.fit(X_train, y_train)
# print results
print(rfc_random.best_params_)

### Independant test set accuracy assessment

In [36]:
#printing the accuracy metrics
#plugging back the parameters into the model
rfc = RandomForestClassifier(n_estimators=1000, max_depth=500, max_features='auto')
#rfc = RandomForestClassifier(n_estimators=400, max_features='sqrt')
rfc.fit(X_train,y_train)
rfc_predict = rfc.predict(X_test)
rfc_cv_score = cross_val_score(rfc, x, y, cv=10)
print("=== Confusion Matrix ===")
print(confusion_matrix(y_test, rfc_predict))
print('\n')
print("=== Classification Report ===")
print(classification_report(y_test, rfc_predict))
print('\n')
print("=== All AUC Scores ===")
print(rfc_cv_score)
print('\n')
print("=== Mean AUC Score ===")
print("Mean AUC Score - Random Forest: ", rfc_cv_score.mean())

=== Confusion Matrix ===
[[45  0  1  6  0  0]
 [ 1 34  0  0  4  1]
 [ 3  0  3  5  0  1]
 [ 2  0  0 44  0  0]
 [ 0  3  0  0 37  5]
 [ 0  1  1  1  3 42]]


=== Classification Report ===
              precision    recall  f1-score   support

           1       0.88      0.87      0.87        52
           2       0.89      0.85      0.87        40
           3       0.60      0.25      0.35        12
           4       0.79      0.96      0.86        46
           5       0.84      0.82      0.83        45
           6       0.86      0.88      0.87        48

    accuracy                           0.84       243
   macro avg       0.81      0.77      0.78       243
weighted avg       0.84      0.84      0.84       243



=== All AUC Scores ===
[0.80769231 0.89333333 0.90410959 0.78082192 0.90410959 0.78082192
 0.89041096 0.78082192 0.81944444 0.83333333]


=== Mean AUC Score ===
Mean AUC Score - Random Forest:  0.8394899309214379


## Prediction on all street blocks

In [37]:
# Drop the 'cat' column
segments_csv_cat=segments_csv['cat']
segments_csv.drop('cat',axis=1, inplace=True)
segments_csv.head()

Unnamed: 0,compact_square,fd,xcoords,ycoords,img17_red_min,img17_red_max,img17_red_mean,img17_red_stddev,img17_red_variance,img17_red_coeff_var,...,nir17_27_Var_coeff_var,nir17_27_Var_sum,nir17_27_Var_first_quart,nir17_27_Var_median,nir17_27_Var_third_quart,mode,prop_1,prop_2,prop_3,prop_4
0,0.587133,1.378078,832588.087644,623953.605067,30,223,111.730995,25.866555,669.078664,23.150743,...,91.625047,7944284.0,137.551,213.016,358.889,1,0.85335,0.09489,0.03567,0.0161
1,0.810431,1.331882,832537.157363,624237.046915,35,195,123.057114,22.199643,492.824134,18.040113,...,50.154372,2317322.0,108.406,135.481,168.209,1,0.88491,0.03502,0.07662,0.00345
2,0.656656,1.443556,832625.412855,624602.496236,55,221,124.240012,23.527559,553.546042,18.937184,...,29.116697,638390.5,148.795,166.322,210.34,1,0.95455,0.00637,0.03764,0.00145
3,0.723537,1.312804,833320.933644,624886.522991,29,252,126.483011,27.426131,752.192653,21.683648,...,67.158476,12797890.0,117.53,182.537,301.848,1,0.92701,0.01617,0.05625,0.00057
4,0.767842,1.378478,832641.231578,624820.628383,48,216,120.823769,18.215214,331.794016,15.075853,...,31.549779,698875.5,82.5776,113.063,138.132,1,0.96299,0.00994,0.02543,0.00163


In [38]:
#make a prediction of the segments_csv
rfc_predict = rfc.predict(segments_csv)

#PRINT LENGTH OF THE PREDICTIONS 
print ("%s records have been classified"%len(rfc_predict))
#Create a dataframe with the results and the 'cat' value
result=zip(segments_csv_cat,rfc_predict)
softmax = pd.DataFrame.from_dict(result) #list to df
softmax.columns=['cat','predicted_label'] #Define columns name
softmax.set_index('cat',inplace=True)
# Export results to file
softmax_csv = 'E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\LU_13_RF_softmax.csv'
softmax.to_csv(softmax_csv, index=True)
# Show table
softmax.head() 

18880 records have been classified


Unnamed: 0_level_0,predicted_label
cat,Unnamed: 1_level_1
0,5
1,1
2,1
3,5
4,4


## creation of reclass rules 

#CREATION OF THE RECLASS RULES

#df_dummy is the file containing the predictions
df_dummy=softmax

reclass_rules=[]
for i,m in df_dummy.iterrows():
    #print (str(m['id']) + '=' + str(m['rf']))
    
    x= (str(m['cat']) + '=' + str(m['preds_id']))
    reclass_rules.append(x)
reclass_rules.append('*'+'='+'NULL')


with open(outputcsv, 'w') as outfile:
    for s in reclass_rules:
        outfile.write("%s\n" % s)
    outfile.close()
    #print m

## Predict class membership probabilities and compute ERP index

In [39]:
#GENERATE THE CLASS PROBABILITIES
rfc_predict_proba = rfc.predict_proba(segments_csv)

In [40]:
# Formating results in a pandas dataframe 
df = pd.DataFrame(data=rfc_predict_proba[:,:],    # values
             index=segments_csv_cat,    # 1st column as index
             columns=['soft_prob_1','soft_prob_2','soft_prob_3','soft_prob_4','soft_prob_5','soft_prob_6'])  # 1st row as the column names
# Export results to file
softprob_csv = 'E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\LU_13_RF_softprob.csv'
df.to_csv(softprob_csv, index=True)

In [41]:
# Compute confidence index (ERP)
softprob_ERP_csv = ComputeERPfromCsv(softprob_csv, delimiter=',', erp_name="ERP", start_index=1, stop_index=False)
softprob_ERP = pd.read_csv(softprob_ERP_csv)
softprob_ERP.set_index('cat',inplace=True)
softprob_ERP.head()

Unnamed: 0_level_0,soft_prob_1,soft_prob_2,soft_prob_3,soft_prob_4,soft_prob_5,soft_prob_6,ERP
cat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.008,0.05,0.043,0.024,0.836,0.039,0.817962
1,0.351,0.085,0.084,0.254,0.184,0.042,0.311841
2,0.577,0.015,0.311,0.089,0.003,0.005,0.368444
3,0.0,0.0,0.0,0.0,1.0,0.0,1.0
4,0.366,0.016,0.083,0.515,0.011,0.009,0.318111


## Get all RF SoftMAX and SoftPROB and ERP in the same csv output

In [42]:
all_RF_outputs = pd.merge(softprob_ERP, softmax, left_index=True, right_index=True)
# Export results to file
all_RF_outputs_csv = 'E:\\UsersData\\owusu\\new_results\\landcover\\LCLU_17\\LCLU_clasif17b_csv\\LCLU_subclass.csv'
all_RF_outputs.to_csv(all_RF_outputs_csv, index=True)
# Show table
all_RF_outputs.head()

Unnamed: 0_level_0,soft_prob_1,soft_prob_2,soft_prob_3,soft_prob_4,soft_prob_5,soft_prob_6,ERP,predicted_label
cat,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,0.008,0.05,0.043,0.024,0.836,0.039,0.817962,5
1,0.351,0.085,0.084,0.254,0.184,0.042,0.311841,1
2,0.577,0.015,0.311,0.089,0.003,0.005,0.368444,1
3,0.0,0.0,0.0,0.0,1.0,0.0,1.0,5
4,0.366,0.016,0.083,0.515,0.011,0.009,0.318111,4


## Get feature importance of the model

In [43]:
#VIEW THE FEATURE IMPORTANCES
feature_importances = pd.DataFrame(rfc.feature_importances_,
                                   index = X_train.columns,
                                    columns=['importance']).sort_values('importance',ascending=False)
# Display
feature_importances

Unnamed: 0,importance
prop_1,0.045059
prop_2,0.034018
ndvi17_median,0.024421
ndvi17_mean,0.022790
mode,0.022260
...,...
nir17_21_MOC2_max,0.001656
nir17_21_Var_max,0.001580
img17_green_min,0.001523
img17_blue_min,0.001521
