# **Lecospy Data Munging**

## Notation:
Throughout this notebook, variables starting with <code>print(img_)</code> are UAV-based information (data, filepaths, etc) and variables starting with grd_ are related to data collected from the ground.

Also, some other naming conventions for variables with data transformations:
* `robust` in a variable name refers to data treated by center according to the median and scaling by teh inter-quartile range (a la sklearns RobustScaler)
* `minmax` (and its ilk) are min-max scaled data, i.e. scaled to the interval [0,1] by subtracting the minimum and dividing by the range.
* `standard(ized)` refers to data treated with with the z-score transform by centring using the mean and scaling y the standard deviation (like sklearns StandardScaler)
* `corrected` means that a linear transformation has been applied to account for differences in sensor calibration.
* `raw` refers to having no transformations applied
* `clipped` means that outliers have been clipped to the upper and lower fence values based on the Inter-Quartile Range method. 
* `imputed` means that outliers have been removed and imputed
* `dropped` means that dataframe rows containing outliers have been removed

Example: `img_robust_indices` refers to vegetation indices from the UAV images treated with the robust scaler. 

### Setting Working Directory

In [None]:
"""
Sets working directory as "../lecoscopy/"
"""
import os
os.chdir('../')
print(os.getcwd())
import Functions.spectral_operations as fncs

### Defining Data Locations

In [None]:
import pandas as pd
grd_base_path = pd.read_csv("Data/C_001_SC3_Cleaned_SpectralLib.csv", header=0, low_memory=False)
print(len(grd_base_path))
grd_speclib = grd_base_path[grd_base_path["Functional_group1"] == ""]
print(len(grd_speclib))
grd_base_path.head()


In [None]:
img_speclib = pd.read_csv("Data/PFT_Image_SpectralLib_Clean.csv", header=0, low_memory=False)
img_speclib.head()



In [None]:
grd_bands = grd_base_path.drop(columns=['Unnamed: 0',
        'ScanID',
        'Area',
        'Code_name',
        'Species_name',
        'Functional_group1',
        'Functional_group2',
        'Species_name_Freq',
        'Functional_group1_Freq',
        'Functional_group2_Freq',
        'Genus',
        'Version',
        'File.Name',
        'Instrument',
        'Detectors',
        'Measurement',
        'Date',
        'Time',
        'Battery.Voltage',
        'Averages',
        'Integration1',
        'Integration2',
        'Integration3',
        'Dark.Mode',
        'Foreoptic',
        'Radiometric.Calibration',
        'Units',
        'Latitude',
        'Longitude',
        'Altitude',
        'GPS.Time',
        'Satellites',
        'Calibrated.Reference.Correction.File',
        'Channels',
        'ScanNum'])

img_bands = img_speclib.drop(columns=[
        "Unnamed: 0",
    	"UID",
        "ScanNum",
    	"sample_name",
    	"PFT",
    	"FncGrp1"])
grd_speclib.head()

### Getting vegetation indices

In [None]:
img_indices = fncs.get_vegetation_indices()