# **Lecospy Data Munging**

## Notation:
Throughout this notebook, variables starting with <code>print(img_)</code> are UAV-based information (data, filepaths, etc) and variables starting with grd_ are related to data collected from the ground.

Also, some other naming conventions for variables with data transformations:
* `robust` in a variable name refers to data treated by center according to the median and scaling by teh inter-quartile range (a la sklearns RobustScaler)
* `minmax` (and its ilk) are min-max scaled data, i.e. scaled to the interval [0,1] by subtracting the minimum and dividing by the range.
* `standard(ized)` refers to data treated with with the z-score transform by centring using the mean and scaling y the standard deviation (like sklearns StandardScaler)
* `corrected` means that a linear transformation has been applied to account for differences in sensor calibration.
* `raw` refers to having no transformations applied
* `clipped` means that outliers have been clipped to the upper and lower fence values based on the Inter-Quartile Range method. 
* `imputed` means that outliers have been removed and imputed
* `dropped` means that dataframe rows containing outliers have been removed

Example: `img_robust_indices` refers to vegetation indices from the UAV images treated with the robust scaler. 

### Setting Working Directory

In [1]:
"""
Sets working directory as "../lecoscopy/"
"""
import os
os.chdir('../')
print(os.getcwd())
import Functions.spectral_operations as fncs

/Users/kalyankhatiwada/lecospy


### Defining Data Locations

In [2]:
import pandas as pd
grd_speclib = pd.read_csv("Data/C_001_SC3_Cleaned_SpectralLib.csv")
grd_speclib.dropna(subset = ["Functional_group1"], inplace=True)
print(len(grd_speclib))
grd_speclib.head()


1343


  grd_speclib = pd.read_csv("Data/C_001_SC3_Cleaned_SpectralLib.csv")


Unnamed: 0.1,Unnamed: 0,ScanID,Area,Code_name,Species_name,Functional_group1,Functional_group2,Species_name_Freq,Functional_group1_Freq,Functional_group2_Freq,...,Radiometric.Calibration,Units,Latitude,Longitude,Altitude,GPS.Time,Satellites,Calibrated.Reference.Correction.File,Channels,ScanNum
0,1,aleoch_Murph_061,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
1,2,aleoch_Murph_063,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
2,3,aleoch_Murph_064,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
3,4,aleoch_Murph_065,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,
4,5,aleoch_Murph_066,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6.0,453.0,118.0,...,,,,,,,,,,


In [11]:
img_speclib = pd.read_csv("Data/PFT_Image_SpectralLib_Clean.csv", header=0, low_memory=False)
img_speclib.head()


Unnamed: 0.1,Unnamed: 0,UID,ScanNum,sample_name,PFT,FncGrp1,X398,X399,X400,X401,...,X990,X991,X992,X993,X994,X995,X996,X997,X998,X999
0,1,BisonGulchPFTsBetula1,1,spec_1,Betula,TreeBroadleaf,0.05243,0.045161,0.039098,0.034829,...,0.563683,0.571786,0.56324,0.54851,0.538068,0.540019,0.556112,0.587042,0.633502,0.696187
1,2,BisonGulchPFTsBetula1,1,spec_2,Betula,TreeBroadleaf,0.032806,0.032797,0.03279,0.032783,...,0.465257,0.465524,0.465757,0.46596,0.466138,0.466296,0.46644,0.466572,0.466699,0.466825
2,3,BisonGulchPFTsBetula1,1,spec_3,Betula,TreeBroadleaf,0.024152,0.024453,0.024753,0.025051,...,0.471305,0.470406,0.469606,0.468903,0.468295,0.467775,0.46733,0.466943,0.4666,0.466283
3,4,BisonGulchPFTsBetula1,1,spec_4,Betula,TreeBroadleaf,0.030132,0.03042,0.030709,0.030979,...,0.428292,0.431782,0.438075,0.447661,0.461028,0.478373,0.499107,0.52251,0.547862,0.574442
4,5,BisonGulchPFTsBetula1,1,spec_5,Betula,TreeBroadleaf,0.027987,0.028189,0.028389,0.028585,...,0.434414,0.435332,0.436237,0.437123,0.437982,0.43881,0.439612,0.440393,0.441159,0.441917


In [4]:
grd_bands = grd_speclib.drop(columns=['Unnamed: 0',
        'ScanID',
        'Area',
        'Code_name',
        'Species_name',
        'Functional_group1',
        'Functional_group2',
        'Species_name_Freq',
        'Functional_group1_Freq',
        'Functional_group2_Freq',
        'Genus',
        'Version',
        'File.Name',
        'Instrument',
        'Detectors',
        'Measurement',
        'Date',
        'Time',
        'Battery.Voltage',
        'Averages',
        'Integration1',
        'Integration2',
        'Integration3',
        'Dark.Mode',
        'Foreoptic',
        'Radiometric.Calibration',
        'Units',
        'Latitude',
        'Longitude',
        'Altitude',
        'GPS.Time',
        'Satellites',
        'Calibrated.Reference.Correction.File',
        'Channels',
        'ScanNum'])

img_bands = img_speclib.drop(columns=[
        "Unnamed: 0",
    	"UID",
        "ScanNum",
    	"sample_name",
    	"PFT",
    	"FncGrp1"])
grd_speclib.head()

Unnamed: 0.1,Unnamed: 0,ScanID,Area,Code_name,Species_name,Functional_group1,Functional_group2,Species_name_Freq,Functional_group1_Freq,Functional_group2_Freq,...,Radiometric.Calibration,Units,Latitude,Longitude,Altitude,GPS.Time,Satellites,Calibrated.Reference.Correction.File,Channels,ScanNum


### Getting vegetation indices

In [5]:
img_indices = fncs.get_vegetation_indices()