# **Lecospy Data Munging**

## Notation:
Throughout this notebook, variables starting with <code>print(img_)</code> are UAV-based information (data, filepaths, etc) and variables starting with grd_ are related to data collected from the ground.

Also, some other naming conventions for variables with data transformations:
* `robust` in a variable name refers to data treated by center according to the median and scaling by teh inter-quartile range (a la sklearns RobustScaler)
* `minmax` (and its ilk) are min-max scaled data, i.e. scaled to the interval [0,1] by subtracting the minimum and dividing by the range.
* `standard(ized)` refers to data treated with with the z-score transform by centring using the mean and scaling y the standard deviation (like sklearns StandardScaler)
* `corrected` means that a linear transformation has been applied to account for differences in sensor calibration.
* `raw` refers to having no transformations applied
* `clipped` means that outliers have been clipped to the upper and lower fence values based on the Inter-Quartile Range method. 
* `imputed` means that outliers have been removed and imputed
* `dropped` means that dataframe rows containing outliers have been removed

Example: `img_robust_indices` refers to vegetation indices from the UAV images treated with the robust scaler. 

### Setting Working Directory

In [1]:
"""
Sets working directory as "../lecoscopy/"
"""
import os
os.chdir('../')
print(os.getcwd())
import spyndex as spy

/Users/kalyankhatiwada/lecospy


### Defining Data Locations

In [2]:
import pandas as pd
grd_speclib = pd.read_csv("Data/C_001_SC3_Cleaned_SpectralLib.csv")
grd_speclib.dropna(subset = ["Functional_group1"], inplace=True)
print(len(grd_speclib))
grd_speclib.head()


1343


  grd_speclib = pd.read_csv("Data/C_001_SC3_Cleaned_SpectralLib.csv")


Unnamed: 0.1,Unnamed: 0,ScanID,Area,Code_name,Species_name,Functional_group1,Functional_group2,Species_name_Freq,Functional_group1_Freq,Functional_group2_Freq,...,Radiometric.Calibration,Units,Latitude,Longitude,Altitude,GPS.Time,Satellites,Calibrated.Reference.Correction.File,Channels,ScanNum
0,1,aleoch_Murph_061,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
1,2,aleoch_Murph_063,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
2,3,aleoch_Murph_064,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
3,4,aleoch_Murph_065,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
4,5,aleoch_Murph_066,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,


In [3]:
img_speclib = pd.read_csv("Data/PFT_Image_SpectralLib_Clean.csv", header=0, low_memory=False)
img_speclib.head()


Unnamed: 0.1,Unnamed: 0,UID,ScanNum,sample_name,PFT,FncGrp1,X398,X399,X400,X401,...,X990,X991,X992,X993,X994,X995,X996,X997,X998,X999
0,1,BisonGulchPFTsBetula1,1,spec_1,Betula,TreeBroadleaf,0.05243,0.045161,0.039098,0.034829,...,0.563683,0.571786,0.56324,0.54851,0.538068,0.540019,0.556112,0.587042,0.633502,0.696187
1,2,BisonGulchPFTsBetula1,1,spec_2,Betula,TreeBroadleaf,0.032806,0.032797,0.03279,0.032783,...,0.465257,0.465524,0.465757,0.46596,0.466138,0.466296,0.46644,0.466572,0.466699,0.466825
2,3,BisonGulchPFTsBetula1,1,spec_3,Betula,TreeBroadleaf,0.024152,0.024453,0.024753,0.025051,...,0.471305,0.470406,0.469606,0.468903,0.468295,0.467775,0.46733,0.466943,0.4666,0.466283
3,4,BisonGulchPFTsBetula1,1,spec_4,Betula,TreeBroadleaf,0.030132,0.03042,0.030709,0.030979,...,0.428292,0.431782,0.438075,0.447661,0.461028,0.478373,0.499107,0.52251,0.547862,0.574442
4,5,BisonGulchPFTsBetula1,1,spec_5,Betula,TreeBroadleaf,0.027987,0.028189,0.028389,0.028585,...,0.434414,0.435332,0.436237,0.437123,0.437982,0.43881,0.439612,0.440393,0.441159,0.441917


In [4]:
grd_bands = grd_speclib.drop(columns=['Unnamed: 0',
        'ScanID',
        'Area',
        'Code_name',
        'Species_name',
        'Functional_group1',
        'Functional_group2',
        'Species_name_Freq',
        'Functional_group1_Freq',
        'Functional_group2_Freq',
        'Genus',
        'Version',
        'File.Name',
        'Instrument',
        'Detectors',
        'Measurement',
        'Date',
        'Time',
        'Battery.Voltage',
        'Averages',
        'Integration1',
        'Integration2',
        'Integration3',
        'Dark.Mode',
        'Foreoptic',
        'Radiometric.Calibration',
        'Units',
        'Latitude',
        'Longitude',
        'Altitude',
        'GPS.Time',
        'Satellites',
        'Calibrated.Reference.Correction.File',
        'Channels',
        'ScanNum'])

img_bands = img_speclib.drop(columns=[
        "Unnamed: 0",
    	"UID",
        "ScanNum",
    	"sample_name",
    	"PFT",
    	"FncGrp1"])
img_bands.head()

Unnamed: 0,X398,X399,X400,X401,X402,X403,X404,X405,X406,X407,...,X990,X991,X992,X993,X994,X995,X996,X997,X998,X999
0,0.05243,0.045161,0.039098,0.034829,0.032859,0.032877,0.034097,0.035726,0.037113,0.038184,...,0.563683,0.571786,0.56324,0.54851,0.538068,0.540019,0.556112,0.587042,0.633502,0.696187
1,0.032806,0.032797,0.03279,0.032783,0.032776,0.032769,0.032762,0.032755,0.032747,0.03274,...,0.465257,0.465524,0.465757,0.46596,0.466138,0.466296,0.46644,0.466572,0.466699,0.466825
2,0.024152,0.024453,0.024753,0.025051,0.025347,0.02564,0.02593,0.026219,0.026505,0.02679,...,0.471305,0.470406,0.469606,0.468903,0.468295,0.467775,0.46733,0.466943,0.4666,0.466283
3,0.030132,0.03042,0.030709,0.030979,0.031215,0.031402,0.031534,0.031601,0.031596,0.031526,...,0.428292,0.431782,0.438075,0.447661,0.461028,0.478373,0.499107,0.52251,0.547862,0.574442
4,0.027987,0.028189,0.028389,0.028585,0.028777,0.028963,0.029143,0.029315,0.02948,0.029636,...,0.434414,0.435332,0.436237,0.437123,0.437982,0.43881,0.439612,0.440393,0.441159,0.441917


### Getting vegetation indices

In [34]:
indices_to_calculate = ["NDVI","GNDVI", "AVI", "BNDVI", "CVI", "DVI", "DVIplus", "ExGR", "FCVI", "GARI", "GBNDVI", "GOSAVI", "GRNDVI", "GRVI", "GSAVI", "IPVI", "MGRVI", "MNLI", "MRBVI", "MSAVI", "MTVI1", "MTVI2", "NDVI705", "NDREI", "NDDI", "NDGI", "ND705", "MTCI", "MSR705", "MSR", "MCARIOSAVI705", "MCARIOSAVI", "MCARI705", "MCARI1" ,"MCARI2", "MCARI", "IRECI", "IKAW", "GM1", "GM2", "GLI", "GEMI", "GCC", "ExR", "ExG", "ExGR", "CIG", "CIRE", "BCC", "MGRVI", "MNLI", "MRBVI", "MSAVI", "MSR", "MSR705", "MTCI", "MTVI1", "MTVI2", "ND705", "NDDI", "NDGI", "NDREI", "NDVI705", "NDYI", "NGRDI", "NIRv", "NLI", "NormG", "NormNIR", "NormR", "OSAVI", "PSRI", "RCC", "RDVI", "REDSI", "RENDVI", "RGBVI", "RGRI", "RI", "RVI", "S2REP", "SARVI", "SAVI", "SI", "SIPI", "SR", "SR2", "SR555" , "SR705", "TCARI", "TCARIOSAVI", "TCARIOSAVI705", "TCI", "TDVI", "TGI", "TRRVI", "TVI", "TriVI", "VARI", "VARI700", "VI700", "VIG", "mND705", "mSR705"]
grd_indices = spy.computeIndex(
    index = indices_to_calculate,
    params = {
        "N": grd_bands["890"],
        "R": grd_bands["685"],
        "A" : grd_bands["550"],
        "G": grd_bands["540"],
        "G" : grd_bands["500"],
        "L" : 0.5,
        "RE1": grd_bands["705"],
        "RE2" : grd_bands["750"],
        "RE3": grd_bands["758"],
        "B" : grd_bands["480"],
        "lambdaN" : grd_bands["900"],
        "lambdaR" : grd_bands["650"],
        "lambdaG" : grd_bands["560"]
    }
)
img_indices = spy.computeIndex(
    index = indices_to_calculate,
    params = {
        "N": img_bands["X890"],
        "R": img_bands["X685"],
        "A" : img_bands["X550"],
        "G": img_bands["X540"],
        "G" : img_bands["X500"],
        "L" : 0.5,
        "RE1": img_bands["X705"],
        "RE2" : img_bands["X750"],
        "RE3": img_bands["X758"],
        "B" : img_bands["X480"],
        "lambdaN" : img_bands["X900"],
        "lambdaR" : img_bands["X650"],
        "lambdaG" : img_bands["X560"]
    }
)
grd_indices.head()

Unnamed: 0,NDVI,GNDVI,AVI,BNDVI,CVI,DVI,DVIplus,ExGR,FCVI,GARI,...,TGI,TRRVI,TVI,TriVI,VARI,VARI700,VI700,VIG,mND705,mSR705
0,0.239575,0.394829,,0.407701,3.258889,9.9765,-2.399951,-13.6817,13.179367,0.492114,...,-141.9245,0.14463,0.859985,413.186,-0.286623,-0.009905,0.092524,-0.171474,0.128388,0.269826
1,0.269978,0.355078,,0.360375,2.537779,8.7445,-1.586315,-7.49678,10.139967,0.667107,...,-64.1015,0.165153,0.877484,443.306,-0.170349,0.065008,0.100022,-0.094123,0.170055,0.239739
2,0.198892,0.294267,,0.304904,2.247387,10.5446,-2.739754,-13.7841,13.282567,0.540784,...,-112.72,0.147609,0.835998,476.404,-0.180567,0.041694,0.092201,-0.101304,0.133925,0.207331
3,0.203858,0.279593,,0.290261,2.086426,11.452,-2.835785,-12.92146,13.814267,0.594159,...,-90.1515,0.151501,0.838962,554.124,-0.145834,0.059025,0.091566,-0.080313,0.145703,0.197409
4,0.205305,0.3087,,0.322026,2.362926,9.8383,-2.494809,-12.83943,12.5106,0.526088,...,-105.816,0.151956,0.839824,438.858,-0.194293,0.032037,0.092289,-0.110391,0.141724,0.221882


In [35]:
grd_indices.to_csv("Data/training/grd_indices.csv")
img_indices.to_csv("Data/training/img_indices.csv")

In [26]:
grd_resampled_indices = grd_indices.resample


Unnamed: 0,NDVI,GNDVI,AVI,BNDVI,CVI,DVI,DVIplus,ExGR,FCVI,GARI,...,TGI,TRRVI,TVI,TriVI,VARI,VARI700,VI700,VIG,mND705,mSR705
0,0.744172,0.842745,0.60223,0.857122,20.141023,0.449427,-0.074898,-0.083362,0.472439,0.858308,...,-0.865184,0.413593,1.115424,25.673394,-0.395534,0.300295,0.369112,-0.264376,0.573968,0.638023
1,0.729619,0.828165,0.587085,0.850347,17.694219,0.430725,-0.084002,-0.080891,0.454171,0.846258,...,-0.712032,0.411338,1.108882,24.570629,-0.367865,0.298182,0.369644,-0.249006,0.555922,0.630714
2,0.723168,0.823058,0.589451,0.848364,17.0541,0.432938,-0.086334,-0.082718,0.457389,0.839815,...,-0.683254,0.409238,1.105969,24.66417,-0.36201,0.306456,0.378278,-0.24677,0.533042,0.630924
3,0.744675,0.844864,0.591241,0.855437,20.695684,0.436703,-0.079636,-0.082994,0.458988,0.860958,...,-0.925029,0.421778,1.11565,24.928265,-0.408163,0.29847,0.365216,-0.270159,0.605303,0.64496
4,0.764536,0.847255,0.608662,0.873762,19.517152,0.458526,-0.077829,-0.066791,0.479131,0.865626,...,-0.45373,0.41641,1.124516,26.4373,-0.341193,0.333218,0.401413,-0.234837,0.557785,0.657176
