# LecoSpec Data Munging

In [1]:
#source("Functions/lecospectR.R", echo = FALSE)
packageVersion("tidyverse")

[1] ‘2.0.0’

In [2]:
# notebooks use their location as their working directory, so
# if we are in a subfolder, move to the main folder.  
# This however can safely be run multiple times
#setwd(M:/lecospec/lecospec)
if(!dir.exists("Functions/")){
    setwd("../../")
    if(!dir.exists("Functions")){
        setwd("M:/lecospec/lecospec/")
    }
}
source("Functions/lecospectR.R", echo = FALSE)



Loading required package: tidyverse

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.2     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.0
[32m✔[39m [34mggplot2  [39m 3.4.2     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.2     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.1     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
Loading required package: compiler

Loading required package: raster

Loading required package: sp


Attaching package: ‘raster’


T

## Notation

Throughout the notebook, variables starting with `img_` are UAV image-based information (data, filepaths, etc).  Similarly, variables beginning with `grd_` related to data collected on the ground.  

Also, some other naming conventions for variables with data transformations:
* `robust` in a variable name refers to data treated by center according to the median and scaling by teh inter-quartile range (a la sklearns RobustScaler)
* `minmax` (and its ilk) are min-max scaled data, i.e. scaled to the interval [0,1] by subtracting the minimum and dividing by the range.
* `standard(ized)` refers to data treated with with the z-score transform by centring using the mean and scaling y the standard deviation (like sklearns StandardScaler)
* `corrected` means that a linear transformation has been applied to account for differences in sensor calibration.
* `raw` refers to having no transformations applied
* `clipped` means that outliers have been clipped to the upper and lower fence values based on the Inter-Quartile Range method. 
* `imputed` means that outliers have been removed and imputed
* `dropped` means that dataframe rows containing outliers have been removed

Example: `img_robust_indices` refers to vegetation indices from the UAV images treated with the robust scaler. 

## Define data locations


In [3]:
# spectral library
grd_base_path <- "./Output/C_001_SC3_Cleaned_SpectralLib.csv"
grd_speclib <- read.csv(grd_base_path, header = TRUE)
#grd_index_path <- ./Data/D_002_SpecLib_Derivs.csv
#grd_indices <- read.csv(grd_index_path)
# this data has some lines that have no labels, so we remove them 
grd_speclib <- grd_speclib[!is.na(grd_speclib$Functional_group1),]
head(grd_speclib)

Unnamed: 0_level_0,X,ScanID,Area,Code_name,Species_name,Functional_group1,Functional_group2,Species_name_Freq,Functional_group1_Freq,Functional_group2_Freq,⋯,Radiometric.Calibration,Units,Latitude,Longitude,Altitude,GPS.Time,Satellites,Calibrated.Reference.Correction.File,Channels,ScanNum
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<int>,<int>,⋯,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<chr>,<chr>,<chr>,<int>,<int>
1,1,aleoch_Murph_061,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6,453,118,⋯,,,,,,,,,,
2,2,aleoch_Murph_063,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6,453,118,⋯,,,,,,,,,,
3,3,aleoch_Murph_064,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6,453,118,⋯,,,,,,,,,,
4,4,aleoch_Murph_065,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6,453,118,⋯,,,,,,,,,,
5,5,aleoch_Murph_066,Murphy,aleoch,Alectoria ochroleuca,Lichen,LightTerrestrialMacrolichen,6,453,118,⋯,,,,,,,,,,
6,6,alnfru_00003,Yukon_Delta,alnfru,Alnus sp.,ShrubDecid,ShrubAlder,82,360,82,⋯,,,,,,,,,,


In [4]:
img_base_path <- "Data/Ground_Validation/PFT_image_spectra/PFT_Image_SpectralLib_Clean.csv"
img_speclib <- read.csv(img_base_path)

# currently, not using the old pre-proccessing scheme and just doing it here.
#img_index_path <- Data/D_002_Image_SpecLib_Derivs.csv
#img_speclib <- read.csv(img_base_path)
head(img_speclib)

Unnamed: 0_level_0,X,UID,ScanNum,sample_name,PFT,FncGrp1,Site,X398,X399,X400,⋯,X990,X991,X992,X993,X994,X995,X996,X997,X998,X999
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,1,BisonGulchPFTsBetula1,1,spec_1,Betula,TreeBroadleaf,BisonGulch,0.05814769,0.05926529,0.06028869,⋯,0.6815182,0.681166,0.689047,0.7040298,0.7249807,0.7507566,0.7801884,0.8121027,0.8453261,0.8786852
2,2,BisonGulchPFTsBetula1,1,spec_2,Betula,TreeBroadleaf,BisonGulch,0.04456014,0.04778814,0.05079318,⋯,0.6706666,0.6683159,0.6786394,0.7000307,0.7308801,0.7695067,0.8140391,0.8625739,0.9132079,0.9640378
3,3,BisonGulchPFTsBetula1,1,spec_3,Betula,TreeBroadleaf,BisonGulch,0.03929324,0.04265593,0.04557066,⋯,0.5152525,0.5091915,0.5178217,0.5395294,0.5726982,0.6156166,0.6663192,0.7227978,0.7830447,0.845052
4,4,BisonGulchPFTsBetula1,1,spec_4,Betula,TreeBroadleaf,BisonGulch,0.13230228,0.11122692,0.09129034,⋯,0.5120581,0.511388,0.5348292,0.5745538,0.6227243,0.6723311,0.718586,0.7570701,0.7833644,0.7930498
5,5,BisonGulchPFTsBetula1,1,spec_5,Betula,TreeBroadleaf,BisonGulch,0.05211388,0.05565497,0.05878525,⋯,0.6863419,0.6680365,0.6509006,0.634445,0.6181806,0.6017555,0.5851848,0.5685449,0.5519121,0.5353626
6,6,BisonGulchPFTsBetula1,1,spec_6,Betula,TreeBroadleaf,BisonGulch,0.06955397,0.06788242,0.06631141,⋯,0.7354495,0.7371508,0.7445194,0.7567953,0.7732173,0.7930235,0.8154512,0.8397375,0.8651196,0.8908347


Okay, there are some metadata columns that should not be there for the next step - lets remove them with `subset`

In [5]:
RawUID<- img_speclib %>% 
  dplyr::select(UID) %>% as.data.frame() #%>%

SiteNames<-str_split(RawUID[,1], "PFT") %>% 
  as.data.frame() %>% 
  t %>% 
  as.data.frame() %>%
  dplyr::rename(Site = V1) %>% 
  dplyr::select(Site)
print(unique(SiteNames))

                                      Site
c..BisonGulch....sBetula1..     BisonGulch
c..Chatanika....sBetula_nana1..  Chatanika
c..EightMile....sBetula_nana1..  EightMile
c..Bonanza....sLarix1..            Bonanza


In [6]:
bg_speclib <- img_speclib[img_speclib$Site == "BisonGulch",]
ch_speclib <- img_speclib[img_speclib$Site == "Chatanika",]
em_speclib <- img_speclib[img_speclib$Site == "EightMile",]
bz_speclib <- img_speclib[img_speclib$Site == "Bonanza",]

In [7]:
unique(bz_speclib$FncGrp1)
unique(bg_speclib$FncGrp1)
unique(em_speclib$FncGrp1)
unique(ch_speclib$FncGrp1)

In [8]:
img_bands <- subset(
    img_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))


grd_bands <- subset(
    grd_speclib, 
    select=-c(
        X,
        ScanID,
        Area,
        Code_name,
        Species_name,
        Functional_group1,
        Functional_group2,
        Species_name_Freq,
        Functional_group1_Freq,
        Functional_group2_Freq,
        Genus,
        Version,
        File.Name,
        Instrument,
        Detectors,
        Measurement,
        Date,
        Time,
        Battery.Voltage,
        Averages,
        Integration1,
        Integration2,
        Integration3,
        Dark.Mode,
        Foreoptic,
        Radiometric.Calibration,
        Units,
        Latitude,
        Longitude,
        Altitude,
        GPS.Time,
        Satellites,
        Calibrated.Reference.Correction.File,
        Channels,
        ScanNum
    )
)

bg_bands <- subset(
    bg_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))


em_bands <- subset(
    em_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))
    
bz_bands <- subset(
    bz_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))
    
ch_bands <- subset(
    ch_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))

In [9]:
bg_bands <- subset(
    bg_speclib, 
    select=-c(
        X,
    	UID,
        ScanNum,
    	sample_name,
    	PFT,
    	FncGrp1,
        Site
    ))

Calculate the vegetation indices from the spectral libraries - its easy with lecospectR!

Note that the image-based scpectra are normalized from zero to one, and the ground specctra are on the range zero to one hundred.  

In [10]:
img_indices <- get_vegetation_indices(img_bands, NULL)# should have a default of NULL, you know?
grd_indices <- get_vegetation_indices(grd_bands, NULL)
bg_indices <- get_vegetation_indices(bg_speclib, NULL)
ch_indices <- get_vegetation_indices(ch_speclib, NULL)
bz_indices <- get_vegetation_indices(bz_speclib, NULL)
em_indices <- get_vegetation_indices(em_speclib, NULL)

In [11]:
write.csv(img_indices, file="Data/gs/x_train/img_indices_only.csv")

write.csv(grd_indices, file="Data/gs/x_train/grd_indices_only.csv")

write.csv(bg_indices, file = "Data/gs/x_train/bison_gulch_indices")

write.csv(ch_indices, file = "Data/gs/x_train/chatanika_indices.csv")

write.csv(em_indices, file = "Data/gs/x_train/eight_mile_indices.csv")

write.csv(bz_indices, file = "Data/gs/x_train/bonanza_indices.csv")

In [12]:
head(img_indices)
head(img_indices)

Unnamed: 0_level_0,Boochs,Boochs2,CARI,Carter,Carter2,Carter3,Carter4,Carter5,Carter6,CI,⋯,TCARI,TCARIOSAVI,TCARI2,TCARI2OSAVI2,TGI,TVI,Vogelmann,Vogelmann2,Vogelmann3,Vogelmann4
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.01118062,0.01252303,0.5520621,2.540064,0.2065954,0.1994095,0.4599336,1.64092,0.1548045,1.212547,⋯,0.2615574,0.3456065,0.4178572,0.9235753,8.286956,35.80558,1.439152,-0.08239895,1.064248,-0.0900997
2,0.0111179,0.01355575,0.5766599,2.62515,0.2065916,0.2070232,0.4513357,1.656177,0.1700914,1.175201,⋯,0.2831688,0.3720026,0.4496971,0.9654107,9.286506,38.17785,1.446272,-0.0868058,1.135065,-0.09518428
3,0.0108476,0.01378545,0.5614974,3.251237,0.19875,0.202689,0.4400645,1.585012,0.1803148,1.173642,⋯,0.2865432,0.3797346,0.4659473,0.9777083,10.063824,38.92187,1.454211,-0.07940521,1.153111,-0.0877541
4,0.01181043,0.01221932,0.5747244,2.602032,0.1809264,0.1767072,0.4379739,1.650827,0.1572204,1.084016,⋯,0.2773505,0.3524937,0.4235281,0.8672684,8.906298,40.04949,1.449666,-0.10812234,1.245379,-0.11775728
5,0.01119492,0.01259771,0.4934165,2.215177,0.1763305,0.1782588,0.4198506,1.671405,0.1517899,1.214389,⋯,0.253067,0.3235553,0.3943958,0.7973731,8.621081,37.23711,1.486982,-0.09404361,1.060612,-0.10349363
6,0.01124955,0.01239871,0.5616888,2.578908,0.2044972,0.192817,0.4591589,1.658333,0.1515471,1.234711,⋯,0.2609731,0.3423476,0.4145336,0.9106479,8.084553,36.27321,1.438798,-0.08098956,1.066427,-0.08849283


Unnamed: 0_level_0,Boochs,Boochs2,CARI,Carter,Carter2,Carter3,Carter4,Carter5,Carter6,CI,⋯,TCARI,TCARIOSAVI,TCARI2,TCARI2OSAVI2,TGI,TVI,Vogelmann,Vogelmann2,Vogelmann3,Vogelmann4
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.01118062,0.01252303,0.5520621,2.540064,0.2065954,0.1994095,0.4599336,1.64092,0.1548045,1.212547,⋯,0.2615574,0.3456065,0.4178572,0.9235753,8.286956,35.80558,1.439152,-0.08239895,1.064248,-0.0900997
2,0.0111179,0.01355575,0.5766599,2.62515,0.2065916,0.2070232,0.4513357,1.656177,0.1700914,1.175201,⋯,0.2831688,0.3720026,0.4496971,0.9654107,9.286506,38.17785,1.446272,-0.0868058,1.135065,-0.09518428
3,0.0108476,0.01378545,0.5614974,3.251237,0.19875,0.202689,0.4400645,1.585012,0.1803148,1.173642,⋯,0.2865432,0.3797346,0.4659473,0.9777083,10.063824,38.92187,1.454211,-0.07940521,1.153111,-0.0877541
4,0.01181043,0.01221932,0.5747244,2.602032,0.1809264,0.1767072,0.4379739,1.650827,0.1572204,1.084016,⋯,0.2773505,0.3524937,0.4235281,0.8672684,8.906298,40.04949,1.449666,-0.10812234,1.245379,-0.11775728
5,0.01119492,0.01259771,0.4934165,2.215177,0.1763305,0.1782588,0.4198506,1.671405,0.1517899,1.214389,⋯,0.253067,0.3235553,0.3943958,0.7973731,8.621081,37.23711,1.486982,-0.09404361,1.060612,-0.10349363
6,0.01124955,0.01239871,0.5616888,2.578908,0.2044972,0.192817,0.4591589,1.658333,0.1515471,1.234711,⋯,0.2609731,0.3423476,0.4145336,0.9106479,8.084553,36.27321,1.438798,-0.08098956,1.066427,-0.08849283


This is actually enough to start training models.  We have the vegetation indices, but instead of doing that, let's transform the data and write it to file.  Then we will proceed to creating the model corrections, etc.

In [13]:
img_resampled_bands <- resample_df(img_bands, drop_existing=TRUE)# corrects scale difference (poorly)
grd_resampled_bands <- resample_df(0.01*grd_bands, drop_existing=TRUE)
bg_resampled_bands <- resample_df(bg_bands, drop_existing=TRUE)# corrects scale difference (poorly)
ch_resampled_bands <- resample_df(ch_bands, drop_existing=TRUE)# corrects scale difference (poorly)
bz_resampled_bands <- resample_df(bz_bands, drop_existing=TRUE)# corrects scale difference (poorly)
em_resampled_bands <- resample_df(em_bands, drop_existing=TRUE)# corrects scale difference (poorly)

head(img_resampled_bands)
head(grd_resampled_bands)

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.



Unnamed: 0_level_0,X402.593_5nm,X407.593_5nm,X412.593_5nm,X417.593_5nm,X422.593_5nm,X427.593_5nm,X432.593_5nm,X437.593_5nm,X442.593_5nm,X447.593_5nm,⋯,X947.593_5nm,X952.593_5nm,X957.593_5nm,X962.593_5nm,X967.593_5nm,X972.593_5nm,X977.593_5nm,X982.593_5nm,X987.593_5nm,X992.593_5nm
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.06212404,0.06180653,0.05693654,0.05199935,0.04943319,0.04810434,0.04814491,0.04839997,0.04902924,0.05122705,⋯,0.7765625,0.6580751,0.5365841,0.5196822,0.6809964,0.8697628,0.9139739,0.8369247,0.7141234,0.6972216
2,0.05749476,0.06297455,0.05918532,0.05290789,0.05137537,0.0521278,0.05244476,0.05221252,0.05233717,0.05398912,⋯,0.7950162,0.6789633,0.605308,0.6241,0.7584388,0.9404683,1.0002115,0.9083915,0.7245195,0.6902076
3,0.04999572,0.04854917,0.04249125,0.03980269,0.04341043,0.04719443,0.04936688,0.05086627,0.05222365,0.05477898,⋯,0.7504691,0.7452264,0.6078203,0.5917521,0.8167859,0.9920923,1.0107302,0.8651387,0.5893411,0.529233
4,0.05353346,0.04877967,0.05867343,0.04561053,0.04907692,0.04761848,0.04514567,0.04761746,0.05044233,0.04838392,⋯,0.8047186,0.7281673,0.4420411,0.4745701,0.8036229,0.8446532,0.8059575,0.8769885,0.6174775,0.5569951
5,0.06426926,0.06406918,0.05702433,0.05276894,0.04848679,0.0442458,0.04301885,0.04311201,0.04557514,0.04929684,⋯,0.7491398,0.737952,0.6675188,0.6551306,0.7406565,0.8998303,0.9689108,0.8887908,0.7388946,0.6410263
6,0.06286275,0.05956411,0.05707863,0.05208934,0.04850874,0.04746565,0.04748539,0.04717902,0.04801729,0.05065653,⋯,0.7746607,0.676634,0.5459924,0.5157587,0.6739425,0.8702676,0.9066815,0.8294759,0.7525363,0.7513382


Unnamed: 0_level_0,X402.593_5nm,X407.593_5nm,X412.593_5nm,X417.593_5nm,X422.593_5nm,X427.593_5nm,X432.593_5nm,X437.593_5nm,X442.593_5nm,X447.593_5nm,⋯,X947.593_5nm,X952.593_5nm,X957.593_5nm,X962.593_5nm,X967.593_5nm,X972.593_5nm,X977.593_5nm,X982.593_5nm,X987.593_5nm,X992.593_5nm
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.04786391,0.04478547,0.04214955,0.04121219,0.04261843,0.04609276,0.05056049,0.05609938,0.06384817,0.0735287,⋯,0.2659656,0.267123,0.2678679,0.2684674,0.2692454,0.2704128,0.2720405,0.2738601,0.275566,0.2769153
2,0.04820802,0.04623885,0.04461647,0.04424838,0.04589012,0.04933364,0.05350729,0.05836299,0.06480323,0.07251967,⋯,0.2141355,0.2152685,0.216214,0.2170847,0.2180113,0.2190688,0.2202912,0.2216226,0.2229784,0.2242359
3,0.05849977,0.05496151,0.05154779,0.04986053,0.05175844,0.05731791,0.06499972,0.07520754,0.09011172,0.10881299,⋯,0.3184974,0.3190839,0.3198678,0.3205158,0.3208263,0.3212036,0.3219709,0.3229237,0.3238387,0.3246974
4,0.06252788,0.0581724,0.0539931,0.05201525,0.05424306,0.06092052,0.07068518,0.08364273,0.10123249,0.12223676,⋯,0.3418603,0.3421727,0.3423974,0.3425062,0.3425074,0.3425434,0.3427278,0.3430264,0.3434254,0.344041
5,0.05055413,0.047328,0.04457956,0.04357744,0.04533789,0.04989059,0.05639766,0.06534486,0.07839488,0.09474391,⋯,0.2914092,0.2920809,0.2927031,0.2931446,0.2934322,0.2940614,0.2953721,0.2970353,0.2986498,0.2999403
6,0.06056797,0.05514715,0.05001802,0.04665174,0.04565842,0.04598097,0.04511881,0.04293868,0.04154127,0.04142498,⋯,0.4787045,0.4774562,0.4759385,0.4748448,0.4745815,0.4744645,0.4739159,0.4731926,0.4726754,0.4726835


In [14]:
img_raw_with_na <- cbind(img_resampled_bands, img_indices)
grd_raw_with_na <- cbind(grd_resampled_bands, grd_indices)
bg_raw_with_na <- cbind(bg_resampled_bands, bg_indices)
ch_raw_with_na <- cbind(ch_resampled_bands, ch_indices)
em_raw_with_na <- cbind(em_resampled_bands, em_indices)
bz_raw_with_na <- cbind(bz_resampled_bands, bz_indices)

In [15]:
img_raw <- impute_spectra(img_raw_with_na)
grd_raw <- impute_spectra(inf_to_na(grd_raw_with_na))# note also dropping an Inf (liekly div by 0 in veg index)
bg_raw <- impute_spectra(bg_raw_with_na)
bz_raw <- impute_spectra(bz_raw_with_na)
em_raw <- impute_spectra(em_raw_with_na)
ch_raw <- impute_spectra(ch_raw_with_na)

In [16]:
write.csv(bg_raw, file="Data/gs/x_train/bison_gulch.csv")
write.csv(as.data.frame(bg_speclib$FncGrp1), file="Data/gs/y_train/bison_gulch.csv")
write.csv(bz_raw, file="Data/gs/x_train/bonanza.csv")
write.csv(as.data.frame(bz_speclib$FncGrp1), file="Data/gs/y_train/bonanza.csv")
write.csv(ch_raw, file="Data/gs/x_train/chatanika.csv")
write.csv(as.data.frame(ch_speclib$FncGrp1), file="Data/gs/y_train/chatanika.csv")
write.csv(em_raw, file="Data/gs/x_train/eight_mile.csv")
write.csv(as.data.frame(em_speclib$FncGrp1), file="Data/gs/y_train/eight_mile.csv")

Apply the outlier transforms

In [17]:
grd_clipped <- clip_outliers(grd_raw)
grd_imputed <- impute_outliers_and_na(grd_raw)
grd_dropped <- grd_raw[detect_outliers_columnwise(grd_raw),]
img_clipped <- clip_outliers(img_raw)
img_imputed <- impute_outliers_and_na(img_raw)
img_dropped <- img_raw[detect_outliers_columnwise(img_raw),]

Now the center/scale transforms

In [18]:
grd_raw_robust <- columnwise_robust_scale(grd_raw)
img_raw_robust <- columnwise_robust_scale(img_raw)
grd_raw_minmax <- columnwise_min_max_scale(grd_raw)
img_raw_minmax <- columnwise_min_max_scale(img_raw)
grd_raw_standard <- standardize_df(grd_raw)
img_raw_standard <- standardize_df(img_raw)

grd_clipped_robust <- columnwise_robust_scale(grd_clipped)
grd_imputed_robust <- columnwise_robust_scale(grd_imputed)
grd_dropped_robust <- columnwise_robust_scale(grd_dropped)
img_clipped_robust <- columnwise_robust_scale(img_clipped)
img_imputed_robust <- columnwise_robust_scale(img_imputed)
img_dropped_robust <- columnwise_robust_scale(img_dropped)

grd_clipped_minmax <- columnwise_min_max_scale(grd_clipped)
grd_imputed_minmax <- columnwise_min_max_scale(grd_imputed)
grd_dropped_minmax <- columnwise_min_max_scale(grd_dropped)
img_clipped_minmax <- columnwise_min_max_scale(img_clipped)
img_imputed_minmax <- columnwise_min_max_scale(img_imputed)
img_dropped_minmax <- columnwise_min_max_scale(img_dropped)

grd_clipped_standard <- standardize_df(grd_clipped)
grd_imputed_standard <- standardize_df(grd_imputed)
grd_dropped_standard <- standardize_df(grd_imputed)
img_clipped_standard <- standardize_df(img_clipped)
img_imputed_standard <- standardize_df(img_imputed)
img_dropped_standard <- standardize_df(img_dropped)


Now, let's save all these data to disk

In [19]:
BASE_PATH <- "Data/gs/"
X_TRAIN_PATH <- paste0(BASE_PATH, "x_train/")
Y_TRAIN_PATH <- paste0(BASE_PATH, "y_train/")

X_TEST_PATH <- paste0(BASE_PATH, "x_test/")
Y_TEST_PATH <- paste0(BASE_PATH, "y_test/")

if(!dir.exists(BASE_PATH)){
    dir.create(BASE_PATH)
}
if(!dir.exists(X_TRAIN_PATH)){
    dir.create(X_TRAIN_PATH)
}
if(!dir.exists(Y_TRAIN_PATH)){
    dir.create(Y_TRAIN_PATH)
}
if(!dir.exists(X_TEST_PATH)){
    dir.create(X_TEST_PATH)
}
if(!dir.exists(Y_TEST_PATH)){
    dir.create(Y_TEST_PATH)
}


In [20]:
write.csv(grd_clipped, file=paste0(X_TRAIN_PATH, "grd_clipped_raw.csv"))
write.csv(grd_clipped_minmax, file=paste0(X_TRAIN_PATH, "grd_clipped_minmax.csv"))
write.csv(grd_clipped_robust, file=paste0(X_TRAIN_PATH, "grd_clipped_robust.csv"))
write.csv(grd_clipped_standard, file=paste0(X_TRAIN_PATH, "grd_clipped_standard.csv"))

write.csv(grd_imputed, file=paste0(X_TRAIN_PATH, "grd_imputed_raw.csv"))
write.csv(grd_imputed_minmax, file=paste0(X_TRAIN_PATH, "grd_imputed_minmax.csv"))
write.csv(grd_imputed_robust, file=paste0(X_TRAIN_PATH, "grd_imputed_robust.csv"))
write.csv(grd_imputed_standard, file=paste0(X_TRAIN_PATH, "grd_imputed_standard.csv"))

write.csv(grd_dropped, file=paste0(X_TRAIN_PATH, "grd_dropped_raw.csv"))
write.csv(grd_dropped_minmax, file=paste0(X_TRAIN_PATH, "grd_dropped_minmax.csv"))
write.csv(grd_dropped_robust, file=paste0(X_TRAIN_PATH, "grd_dropped_robust.csv"))
write.csv(grd_dropped_standard, file=paste0(X_TRAIN_PATH, "grd_dropped_standard.csv"))

write.csv(grd_raw, file=paste0(X_TRAIN_PATH, "grd_raw_raw.csv"))
write.csv(grd_raw_minmax, file=paste0(X_TRAIN_PATH, "grd_raw_minmax.csv"))
write.csv(grd_raw_robust, file=paste0(X_TRAIN_PATH, "grd_raw_robust.csv"))
write.csv(grd_raw_standard, file=paste0(X_TRAIN_PATH, "grd_raw_standard.csv"))

In [21]:
write.csv(grd_raw[,colnames(grd_indices)], file=paste0(X_TRAIN_PATH, "grd_indices_only.csv"))


## Labels for the above Data

In [22]:
img_targets <- img_speclib$FncGrp1 %>% as.factor()
grd_targets <- grd_speclib$Functional_group1 %>% as.factor()

In [23]:
write.csv(img_targets, file="Data/gs/y_train/img_indices_only.csv")
write.csv(grd_targets, file="Data/gs/y_train/grd_indices_only.csv")

In [24]:
img_targets %>% table()

.
       Abiotic      Graminoid         Lichen           Moss     ShrubDecid 
           797            145             97             92            107 
ShrubEvergreen  TreeBroadleaf    TreeConifer 
           137            100           2401 

In [25]:
grd_targets %>% table()

.
       Abiotic           Forb      Graminoid         Lichen           Moss 
            94            158            112            417            122 
    ShrubDecid ShrubEvergreen  TreeBroadleaf    TreeConifer 
           297            105             21             17 

In [26]:
# drop entries with outliers to match training data
img_targets_dropped <- img_targets[detect_outliers_columnwise(img_raw)]
grd_targets_dropped <- grd_targets[detect_outliers_columnwise(grd_raw)]

In [27]:
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_clipped_raw.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_clipped_minmax.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_clipped_robust.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_clipped_standard.csv"))

write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_imputed_raw.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_imputed_minmax.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_imputed_robust.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_imputed_standard.csv"))

write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_raw_raw.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_raw_minmax.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_raw_robust.csv"))
write.csv(grd_targets, file=paste0(Y_TRAIN_PATH, "grd_raw_standard.csv"))

write.csv(grd_targets_dropped, file=paste0(Y_TRAIN_PATH, "grd_dropped_raw.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TRAIN_PATH, "grd_dropped_minmax.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TRAIN_PATH, "grd_dropped_robust.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TRAIN_PATH, "grd_dropped_standard.csv"))

In [28]:
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_clipped_raw.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_clipped_minmax.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_clipped_robust.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_clipped_standard.csv"))

write.csv(img_imputed, file=paste0(Y_TRAIN_PATH, "img_imputed_raw.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_imputed_minmax.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_imputed_robust.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_imputed_standard.csv"))

write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_raw_raw.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_raw_minmax.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_raw_robust.csv"))
write.csv(img_targets, file=paste0(Y_TRAIN_PATH, "img_raw_standard.csv"))

write.csv(img_targets_dropped, file=paste0(Y_TRAIN_PATH, "img_dropped_raw.csv"))
write.csv(img_targets_dropped, file=paste0(Y_TRAIN_PATH, "img_dropped_minmax.csv"))
write.csv(img_targets_dropped, file=paste0(Y_TRAIN_PATH, "img_dropped_robust.csv"))
write.csv(img_targets_dropped, file=paste0(Y_TRAIN_PATH, "img_dropped_standard.csv"))

## Test Data

Build the test data, and save it with the same names as the training data

In [29]:
set.seed(61718L)

permutation <-  permute::shuffle(length(img_targets))
sample <- create_stratified_sample(
    img_targets, 
    permutation = permutation,
    samples_per_pft = 15)
# split the data based on the above sample
img_targets_test <- img_targets[permutation][sample]
img_targets_train <- img_targets[permutation][-sample]
img_raw_test <- img_raw[permutation,][sample,]
img_raw_train <- img_raw[permutation,][-sample,]


[1] "Moss"        "Abiotic"     "ShrubDecid"  "TreeConifer" "Abiotic"    


In [30]:
img_targets_test %>% as.factor() %>% table()

.
       Abiotic      Graminoid         Lichen           Moss     ShrubDecid 
            15             15             15             15             15 
ShrubEvergreen  TreeBroadleaf    TreeConifer 
            15             15             15 

In [31]:
# create the subsampled data and save them for each processing type/treatment

# clipped
img_clipped_train <- img_clipped[permutation,][-sample,]
img_clipped_test <- img_clipped[permutation,][sample,]
img_clipped_minmax_train <- img_clipped_minmax[permutation,][-sample,]
img_clipped_minmax_test <- img_clipped_minmax[permutation,][sample,]
img_clipped_robust_train <- img_clipped_robust[permutation,][-sample,]
img_clipped_robust_test <- img_clipped_robust[permutation,][sample,]
img_clipped_standard_train <- img_clipped_standard[permutation,][-sample]
img_clipped_standard_test <- img_clipped_standard[permutation,][sample,]

# raw (note one is done in the previous cell)
img_raw_minmax_train <- img_raw_minmax[permutation,][-sample,]
img_raw_minmax_test <- img_raw_minmax[permutation,][sample,]
img_raw_robust_train <- img_raw_robust[permutation,][-sample,]
img_raw_robust_test <- img_raw_robust[permutation,][sample,]
img_raw_standard_train <- img_raw_standard[permutation,][sample,]
img_raw_standard_test <- img_raw_standard[permutation,][sample,]

#imputed
img_imputed_train <- img_imputed[permutation,][-sample,]
img_imputed_test <- img_imputed[permutation,][sample,]
img_imputed_minmax_train <- img_imputed_minmax[permutation,][-sample,]
img_imputed_minmax_test <- img_imputed_minmax[permutation,][sample,]
img_imputed_robust_train <- img_imputed_robust[permutation,][-sample,]
img_imputed_robust_test <- img_imputed_robust[permutation,][sample,]
img_imputed_standard_train <- img_imputed_standard[permutation,][-sample,]
img_imputed_standard_test <- img_imputed_standard[permutation,][sample,]



In [32]:
print(length(img_targets_test))
print(nrow(img_clipped_robust_test))

[1] 120
[1] 120


### Image-based Training Data

In [33]:
write.csv(img_clipped_train, file=paste0(X_TRAIN_PATH, "img_clipped_raw.csv"))
write.csv(img_clipped_minmax_train, file=paste0(X_TRAIN_PATH, "img_clipped_minmax.csv"))
write.csv(img_clipped_robust_train, file=paste0(X_TRAIN_PATH, "img_clipped_robust.csv"))
write.csv(img_clipped_standard_train, file=paste0(X_TRAIN_PATH, "img_clipped_standard.csv"))

write.csv(img_imputed_train, file=paste0(X_TRAIN_PATH, "img_imputed_raw.csv"))
write.csv(img_imputed_minmax_train, file=paste0(X_TRAIN_PATH, "img_imputed_minmax.csv"))
write.csv(img_imputed_robust_train, file=paste0(X_TRAIN_PATH, "img_imputed_robust.csv"))
write.csv(img_imputed_standard_train, file=paste0(X_TRAIN_PATH, "img_imputed_standard.csv"))

write.csv(img_dropped, file=paste0(X_TRAIN_PATH, "img_dropped_raw.csv"))
write.csv(img_dropped_minmax, file=paste0(X_TRAIN_PATH, "img_dropped_minmax.csv"))
write.csv(img_dropped_robust, file=paste0(X_TRAIN_PATH, "img_dropped_robust.csv"))
write.csv(img_dropped_standard, file=paste0(X_TRAIN_PATH, "img_dropped_standard.csv"))

write.csv(img_raw_train, file=paste0(X_TRAIN_PATH, "img_raw_raw.csv"))
write.csv(img_raw_minmax_train, file=paste0(X_TRAIN_PATH, "img_raw_minmax.csv"))
write.csv(img_raw_robust_train, file=paste0(X_TRAIN_PATH, "img_raw_robust.csv"))
write.csv(img_raw_standard_train, file=paste0(X_TRAIN_PATH, "img_raw_standard.csv"))

In [34]:
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_clipped_raw.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_clipped_minmax.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_clipped_robust.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_clipped_standard.csv"))

write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_imputed_raw.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_imputed_minmax.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_imputed_robust.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_imputed_standard.csv"))

#write.csv(img_dropped, file=paste0(X_TRAIN_PATH, "img_dropped_raw.csv"))
#write.csv(img_dropped_minmax, file=paste0(X_TRAIN_PATH, "img_dropped_minmax.csv"))
#write.csv(img_dropped_robust, file=paste0(X_TRAIN_PATH, "img_dropped_robust.csv"))
#write.csv(img_dropped_standard, file=paste0(X_TRAIN_PATH, "img_dropped_standard.csv"))

write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_raw_raw.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_raw_minmax.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_raw_robust.csv"))
write.csv(img_targets_train, file=paste0(Y_TRAIN_PATH, "img_raw_standard.csv"))

### Image Based Test Data
Note: this image-based test set is used for all the models (ground included)

In [35]:
write.csv(img_clipped_test, file=paste0(X_TEST_PATH, "img_clipped_raw.csv"))
write.csv(img_clipped_minmax_test, file=paste0(X_TEST_PATH, "img_clipped_minmax.csv"))
write.csv(img_clipped_robust_test, file=paste0(X_TEST_PATH, "img_clipped_robust.csv"))
write.csv(img_clipped_standard_test, file=paste0(X_TEST_PATH, "img_clipped_standard.csv"))

write.csv(img_imputed_test, file=paste0(X_TEST_PATH, "img_imputed_raw.csv"))
write.csv(img_imputed_minmax_test, file=paste0(X_TEST_PATH, "img_imputed_minmax.csv"))
write.csv(img_imputed_robust_test, file=paste0(X_TEST_PATH, "img_imputed_robust.csv"))
write.csv(img_imputed_standard_test, file=paste0(X_TEST_PATH, "img_imputed_standard.csv"))

write.csv(img_dropped, file=paste0(X_TEST_PATH, "img_dropped_raw.csv"))
write.csv(img_dropped_minmax, file=paste0(X_TEST_PATH, "img_dropped_minmax.csv"))
write.csv(img_dropped_robust, file=paste0(X_TEST_PATH, "img_dropped_robust.csv"))
write.csv(img_dropped_standard, file=paste0(X_TEST_PATH, "img_dropped_standard.csv"))

write.csv(img_raw_test, file=paste0(X_TEST_PATH, "img_raw_raw.csv"))
write.csv(img_raw_minmax_test, file=paste0(X_TEST_PATH, "img_raw_minmax.csv"))
write.csv(img_raw_robust_test, file=paste0(X_TEST_PATH, "img_raw_robust.csv"))
write.csv(img_raw_standard_test, file=paste0(X_TEST_PATH, "img_raw_standard.csv"))

### Ground test (from the images)

In [36]:
write.csv(img_clipped_test, file=paste0(X_TEST_PATH, "grd_clipped_raw.csv"))
write.csv(img_clipped_minmax_test, file=paste0(X_TEST_PATH, "grd_clipped_minmax.csv"))
write.csv(img_clipped_robust_test, file=paste0(X_TEST_PATH, "grd_clipped_robust.csv"))
write.csv(img_clipped_standard_test, file=paste0(X_TEST_PATH, "grd_clipped_standard.csv"))

write.csv(img_imputed_test, file=paste0(X_TEST_PATH, "grd_imputed_raw.csv"))
write.csv(img_imputed_minmax_test, file=paste0(X_TEST_PATH, "grd_imputed_minmax.csv"))
write.csv(img_imputed_robust_test, file=paste0(X_TEST_PATH, "grd_imputed_robust.csv"))
write.csv(img_imputed_standard_test, file=paste0(X_TEST_PATH, "grd_imputed_standard.csv"))

write.csv(img_dropped, file=paste0(X_TEST_PATH, "grd_dropped_raw.csv"))
write.csv(img_dropped_minmax, file=paste0(X_TEST_PATH, "grd_dropped_minmax.csv"))
write.csv(img_dropped_robust, file=paste0(X_TEST_PATH, "grd_dropped_robust.csv"))
write.csv(img_dropped_standard, file=paste0(X_TEST_PATH, "grd_dropped_standard.csv"))

write.csv(img_raw_test, file=paste0(X_TEST_PATH, "grd_raw_raw.csv"))
write.csv(img_raw_minmax_test, file=paste0(X_TEST_PATH, "grd_raw_minmax.csv"))
write.csv(img_raw_robust_test, file=paste0(X_TEST_PATH, "grd_raw_robust.csv"))
write.csv(img_raw_standard_test, file=paste0(X_TEST_PATH, "grd_raw_standard.csv"))

In [37]:
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_clipped_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_clipped_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_clipped_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_clipped_standard.csv"))

write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_imputed_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_imputed_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_imputed_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_imputed_standard.csv"))

write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_raw_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_raw_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_raw_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "grd_raw_standard.csv"))

write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "grd_dropped_raw.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "grd_dropped_minmax.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "grd_dropped_robust.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "grd_dropped_standard.csv"))

In [38]:
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_clipped_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_clipped_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_clipped_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_clipped_standard.csv"))

write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_imputed_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_imputed_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_imputed_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_imputed_standard.csv"))

write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_raw_raw.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_raw_minmax.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_raw_robust.csv"))
write.csv(img_targets_test, file=paste0(Y_TEST_PATH, "img_raw_standard.csv"))

write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "img_dropped_raw.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "img_dropped_minmax.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "img_dropped_robust.csv"))
write.csv(grd_targets_dropped, file=paste0(Y_TEST_PATH, "img_dropped_standard.csv"))

In [39]:
bg_raw <- read.csv("Data/gs/x_train/bison_gulch.csv", header = TRUE)
bg_targets <- read.csv("Data/gs/y_train/bison_gulch.csv")$bg_speclib.FncGrp1

bz_raw <- read.csv("Data/gs/x_train/bonanza.csv", header = TRUE)
bz_targets <- read.csv("Data/gs/y_train/bonanza.csv")$bz_speclib.FncGrp1

ch_raw <- read.csv("Data/gs/x_train/chatanika.csv", header = TRUE)
ch_targets <- read.csv("Data/gs/y_train/chatanika.csv")$ch_speclib.FncGrp1

em_raw <- read.csv("Data/gs/x_train/eight_mile.csv", header = TRUE)
em_targets <- read.csv("Data/gs/y_train/eight_mile.csv")$em_speclib.FncGrp1

In [40]:
print(bg_targets)

   [1] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
   [5] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
   [9] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [13] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [17] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [21] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [25] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [29] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [33] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [37] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [41] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [45] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [49] "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf"  "TreeBroadleaf" 
  [53] "TreeBroadleaf"  "

In [41]:
print(length(bg_targets))
print(length(bz_targets))
print(length(ch_targets))
print(length(em_targets))

[1] 1778
[1] 519
[1] 1143
[1] 436


In [42]:
bg_permutation <-  permute::shuffle(length(bg_targets)) %>% as.vector()
bg_sample <- create_stratified_sample(
    bg_targets, 
    permutation = bg_permutation,
    samples_per_pft = 18)

bz_permutation <-  permute::shuffle(length(bz_targets)) %>% as.vector()
bz_sample <- create_stratified_sample(
    bz_targets, 
    permutation = bz_permutation,
    samples_per_pft = 18)

ch_permutation <-  permute::shuffle(length(ch_targets)) %>% as.vector()
ch_sample <- create_stratified_sample(
    ch_targets, 
    permutation = ch_permutation,
    samples_per_pft = 18)

em_permutation <-  permute::shuffle(length(em_targets)) %>% as.vector()
em_sample <- create_stratified_sample(
    em_targets, 
    permutation = em_permutation,
    samples_per_pft = 18)

[1] "TreeBroadleaf" "TreeConifer"   "TreeConifer"   "TreeConifer"  
[5] "TreeConifer"  
[1] "TreeConifer" "TreeConifer" "TreeConifer" "TreeConifer" "TreeConifer"
[1] "Abiotic"    "Abiotic"    "Abiotic"    "Abiotic"    "ShrubDecid"
[1] "Lichen"      "TreeConifer" "Lichen"      "TreeConifer" "Graminoid"  


In [43]:
print(bg_permutation)

   [1]   27  601  377  854  582 1601 1046 1058 1368 1323   90 1104 1570 1473
  [15] 1127  302   68 1676 1136  357  767 1406   81 1038 1284 1071 1225  931
  [29]  631  549 1639  757 1300  733 1385  748  390 1495 1091  222  569  795
  [43] 1191  501  815 1308   96 1141  259  622  680 1221   25 1084 1062 1767
  [57] 1699 1408 1766  801 1093 1303 1521 1170  576 1524   12 1117  324 1190
  [71] 1266 1644 1185   37 1507  360   79  838  714 1007 1343  465  234  547
  [85]  513 1296 1392 1412 1463 1713 1378  398  936   88 1434  917  132 1383
  [99] 1120 1768  662 1241 1331 1674 1607  849 1720 1410  492  912  274  655
 [113] 1743  870 1594  852  348  788 1515  495 1474 1319 1206  677 1041 1330
 [127] 1167 1741 1081  588  277  974  496  735  657  697  632  416  503 1312
 [141] 1556  208  165 1672 1131 1440   11  937  554 1027  962 1327 1197 1765
 [155] 1160  102 1731  166 1624  292  282  312  471 1678 1494 1549  925  520
 [169] 1637  804 1298  340  874  123 1586 1506 1128 1199 1215 1000  863 1024

In [44]:
bg_targets_test <- bg_targets[bg_permutation][bg_sample]
bg_targets_train <- bg_targets[bg_permutation][-bg_sample]
bg_raw_test <- bg_raw[bg_permutation,][bg_sample,]
bg_raw_train <- bg_raw[bg_permutation,][-bg_sample,]

bz_targets_test <- bz_targets[bz_permutation][bz_sample]
bz_targets_train <- bz_targets[bz_permutation][-bz_sample]
bz_raw_test <- bz_raw[bz_permutation,][bz_sample,]
bz_raw_train <- bz_raw[bz_permutation,][-bz_sample,]

ch_targets_test <- ch_targets[ch_permutation][ch_sample]
ch_targets_train <- ch_targets[ch_permutation][-ch_sample]
ch_raw_test <- ch_raw[ch_permutation,][ch_sample,]
ch_raw_train <- ch_raw[ch_permutation,][-ch_sample,]

em_targets_test <- em_targets[em_permutation][em_sample]
em_targets_train <- em_targets[em_permutation][-em_sample]
em_raw_test <- em_raw[em_permutation,][em_sample,]
em_raw_train <- em_raw[em_permutation,][-em_sample,]

In [45]:
bg_targets_test %>% as.factor() %>% table()
bz_targets_test %>% as.factor() %>% table()
ch_targets_test %>% as.factor() %>% table()
em_targets_test %>% as.factor() %>% table()

.
       Abiotic         Lichen     ShrubDecid ShrubEvergreen  TreeBroadleaf 
            18             18             18             18             18 
   TreeConifer 
            18 

.
       Moss TreeConifer 
         18          18 

.
       Abiotic      Graminoid     ShrubDecid ShrubEvergreen  TreeBroadleaf 
            18             18             18             12             18 
   TreeConifer 
            18 

.
       Abiotic      Graminoid         Lichen           Moss     ShrubDecid 
            10             18             18             18              6 
ShrubEvergreen    TreeConifer 
             4             18 

In [46]:
write.csv(bg_targets_test, file=paste0(Y_TRAIN_PATH, "bison_gulch_stratified.csv"), row.names=FALSE )
write.csv(bg_raw_test, file=paste0(X_TRAIN_PATH, "bison_gulch_stratified.csv"), row.names = FALSE )

write.csv(bz_targets_test, file=paste0(Y_TRAIN_PATH, "bonanza_stratified.csv"), row.names=FALSE )
write.csv(bz_raw_test, file=paste0(X_TRAIN_PATH, "bonanza_stratified.csv"), row.names = FALSE )

write.csv(ch_targets_test, file=paste0(Y_TRAIN_PATH, "chatanika_stratified.csv"), row.names=FALSE )
write.csv(ch_raw_test, file=paste0(X_TRAIN_PATH, "chatanika_stratified.csv"), row.names = FALSE )

write.csv(em_targets_test, file=paste0(Y_TRAIN_PATH, "eight_mile_stratified.csv"), row.names=FALSE )
write.csv(em_raw_test, file=paste0(X_TRAIN_PATH, "eight_mile_stratified.csv"), row.names = FALSE )

write.csv(bg_targets_train, file=paste0(X_TEST_PATH, "bison_gulch.csv"))
write.csv(bg_raw_train, file=paste0(Y_TEST_PATH, "bison_gulch.csv"))

write.csv(bz_targets_train, file=paste0(X_TEST_PATH, "bonanza.csv"))
write.csv(bg_raw_train, file=paste0(Y_TEST_PATH, "bonanza.csv"))

write.csv(ch_targets_train, file=paste0(X_TEST_PATH, "chatanika.csv"))
write.csv(ch_raw_train, file=paste0(Y_TEST_PATH, "chatanika.csv"))

write.csv(em_targets_train, file=paste0(X_TEST_PATH, "eight_mile.csv"))
write.csv(em_raw_train, file=paste0(Y_TEST_PATH, "eight_mile.csv"))



In [47]:
# need to write the targets fror training
clip_transform <- create_clip_transform(
    img_raw
)

“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not a multiple of replacement length”
“number of items to replace is not

In [48]:
save(clip_transform, file="./mle/clip_transform.rda")

In [49]:
clipped_2 <- clip_transform(img_raw)# clipped 2

## Sensor Correction

In this section, we create the models (and do some data transforms) to make the sensor-correction models and create the corrected data (only three times).  

We do this first for the raw (including outliers) data.

In [50]:
grd_resampled_to_match_img_bands <- resample_df(
    grd_bands,
    min_wavelength = 398,
    max_wavelength = 999,
    delta=1,
    drop_existing = TRUE
)
head(grd_resampled_to_match_img_bands)
head(img_bands)

Unnamed: 0_level_0,X398_5nm,X399_5nm,X400_5nm,X401_5nm,X402_5nm,X403_5nm,X404_5nm,X405_5nm,X406_5nm,X407_5nm,⋯,X990_5nm,X991_5nm,X992_5nm,X993_5nm,X994_5nm,X995_5nm,X996_5nm,X997_5nm,X998_5nm,X999_5nm
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,4.8846,4.8925,4.9184,4.9628,4.9102,4.7963,4.6673,4.5396,4.4311,4.373,⋯,27.6272,27.6523,27.6757,27.6977,27.7181,27.737,27.7544,27.7703,27.7846,27.7974
2,4.9516,4.943,4.9209,4.8555,4.8264,4.7953,4.7073,4.6387,4.5811,4.5128,⋯,22.3594,22.3834,22.4069,22.4301,22.4529,22.4753,22.4974,22.5191,22.5404,22.5613
3,6.0398,6.0197,6.0054,5.988,5.9352,5.839,5.6852,5.5654,5.4625,5.3369,⋯,32.4257,32.4426,32.4593,32.4758,32.4922,32.5085,32.5246,32.5405,32.5563,32.5719
4,6.4706,6.4441,6.4293,6.4678,6.3939,6.2293,5.9987,5.8117,5.6694,5.5691,⋯,34.3678,34.381,34.3951,34.4103,34.4264,34.4435,34.4616,34.4807,34.5008,34.5219
5,5.2403,5.224,5.2112,5.175,5.125,5.0416,4.8941,4.7927,4.72,4.6447,⋯,29.9326,29.9557,29.9775,29.9979,30.017,30.0348,30.0513,30.0665,30.0803,30.0928
6,6.3948,6.3681,6.3422,6.3502,6.2172,5.9899,5.724,5.5414,5.4176,5.3077,⋯,47.2582,47.2609,47.266,47.2734,47.2832,47.2953,47.3097,47.3265,47.3457,47.3671


Unnamed: 0_level_0,X398,X399,X400,X401,X402,X403,X404,X405,X406,X407,⋯,X990,X991,X992,X993,X994,X995,X996,X997,X998,X999
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.05814769,0.05926529,0.06028869,0.06116253,0.06183523,0.06229323,0.06254515,0.06259993,0.06246344,0.06212942,⋯,0.6815182,0.681166,0.689047,0.7040298,0.7249807,0.7507566,0.7801884,0.8121027,0.8453261,0.8786852
2,0.04456014,0.04778814,0.05079318,0.0535672,0.05609795,0.0583302,0.06018362,0.06157761,0.06245665,0.06286571,⋯,0.6706666,0.6683159,0.6786394,0.7000307,0.7308801,0.7695067,0.8140391,0.8625739,0.9132079,0.9640378
3,0.03929324,0.04265593,0.04557066,0.04787494,0.04942413,0.05025526,0.05051135,0.05033675,0.04985763,0.04912738,⋯,0.5152525,0.5091915,0.5178217,0.5395294,0.5726982,0.6156166,0.6663192,0.7227978,0.7830447,0.845052
4,0.13230228,0.11122692,0.09129034,0.07379609,0.05996264,0.05014222,0.04418159,0.04192114,0.04302032,0.04641466,⋯,0.5120581,0.511388,0.5348292,0.5745538,0.6227243,0.6723311,0.718586,0.7570701,0.7833644,0.7930498
5,0.05211388,0.05565497,0.05878525,0.06139855,0.06339694,0.06476632,0.06554153,0.06575801,0.06545919,0.06472041,⋯,0.6863419,0.6680365,0.6509006,0.634445,0.6181806,0.6017555,0.5851848,0.5685449,0.5519121,0.5353626
6,0.06955397,0.06788242,0.06631141,0.06486368,0.0635617,0.06242523,0.06147244,0.06072147,0.06017513,0.05977473,⋯,0.7354495,0.7371508,0.7445194,0.7567953,0.7732173,0.7930235,0.8154512,0.8397375,0.8651196,0.8908347


In [51]:
colnames(grd_resampled_to_match_img_bands) <- colnames(img_bands)

In [52]:
grd_resampled_to_match_img_bands$targets <- grd_targets
img_bands_with_targets <- img_bands
img_bands_with_targets$targets <- img_targets

In [53]:
matched_data <- create_matched_data(
    img_bands_with_targets,
    grd_resampled_to_match_img_bands,
    cols=c("targets","targets")# assumes joining on columns named "targets" in each data.frame
)
head(matched_data$left)
head(matched_data$right)

Unnamed: 0_level_0,X398,X399,X400,X401,X402,X403,X404,X405,X406,X407,⋯,X991,X992,X993,X994,X995,X996,X997,X998,X999,targets
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
2723,0.04412797,0.04412019,0.04411244,0.04410477,0.04409721,0.04408985,0.04408278,0.04407612,0.04406997,0.04406448,⋯,0.5532735,0.5553518,0.5574322,0.5595137,0.5615955,0.5636775,0.5657596,0.5678421,0.5699247,Abiotic
2385,0.10629752,0.10681751,0.10733749,0.10785748,0.10837747,0.10889748,0.10941753,0.10993765,0.11045786,0.11097821,⋯,0.4422921,0.4437139,0.4451362,0.4465589,0.4479816,0.4494044,0.4508273,0.4522501,0.4536729,Abiotic
2215,0.11031005,0.11054407,0.11077813,0.11101224,0.11124644,0.11148078,0.11171536,0.11195024,0.11218553,0.11242134,⋯,0.5079262,0.5095,0.511071,0.5126393,0.5142051,0.5157688,0.5173308,0.5188918,0.5204521,Abiotic
2283,0.1092469,0.10969933,0.11015177,0.11060425,0.11105678,0.11150939,0.11196211,0.11241499,0.11286806,0.11332138,⋯,0.4870817,0.4891108,0.4911408,0.4931714,0.4952025,0.4972338,0.4992654,0.501297,0.5033286,Abiotic
2008,0.10056707,0.10114187,0.10171667,0.10229147,0.1028663,0.10344113,0.104016,0.10459089,0.10516583,0.10574082,⋯,0.3719111,0.3727113,0.3735114,0.3743114,0.3751113,0.3759113,0.3767111,0.377511,0.3783108,Abiotic
1972,0.10202284,0.10261381,0.10320478,0.10379573,0.10438667,0.10497759,0.1055685,0.10615941,0.10675031,0.10734122,⋯,0.4244295,0.4254504,0.4264714,0.4274926,0.4285137,0.4295349,0.4305561,0.4315774,0.4325985,Abiotic


Unnamed: 0_level_0,X398,X399,X400,X401,X402,X403,X404,X405,X406,X407,⋯,X991,X992,X993,X994,X995,X996,X997,X998,X999,targets
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1298,40.9935,41.0121,41.0342,41.0064,40.9953,41.0124,41.0756,41.097,41.1055,41.1534,⋯,53.7841,53.7964,53.8082,53.8195,53.8303,53.8405,53.8503,53.8596,53.8684,Abiotic
1311,47.4986,47.604,47.7318,47.8013,47.8371,47.848,47.8429,47.8557,47.8743,47.8745,⋯,62.0103,62.041,62.0654,62.0836,62.0955,62.1011,62.1004,62.0935,62.0803,Abiotic
1310,41.0942,41.2318,41.4002,41.5877,41.7252,41.8295,41.9273,41.9425,41.9317,41.9846,⋯,65.7259,65.7487,65.7651,65.7751,65.7786,65.7757,65.7663,65.7504,65.7281,Abiotic
370,4.9008,4.8571,4.8127,4.7923,4.6549,4.4511,4.2475,4.0447,3.8735,3.7895,⋯,43.6444,43.645,43.6477,43.6525,43.6594,43.6685,43.6796,43.6929,43.7083,Abiotic
1305,61.5948,61.7196,61.9005,62.0813,62.2871,62.4971,62.6899,62.7786,62.7855,62.7352,⋯,65.3148,65.3324,65.3441,65.3498,65.3494,65.3431,65.3308,65.3125,65.2883,Abiotic
114,7.5235,7.5526,7.5827,7.5996,7.5863,7.5462,7.4829,7.4147,7.361,7.3573,⋯,21.191,21.208,21.2243,21.2399,21.2548,21.269,21.2825,21.2952,21.3073,Abiotic


In [54]:
correction_model <- build_columnwise_sensor_correction_model(
    matched_data$left,
    matched_data$right,
    grouping_variables =c("targets","targets")
)


Call:
lm(formula = left_vec ~ right_vec)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.054047 -0.015424 -0.009652  0.010052  0.076724 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.909e-02  1.177e-03   24.72   <2e-16 ***
right_vec   2.149e-03  8.887e-05   24.19   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0228 on 643 degrees of freedom
Multiple R-squared:  0.4764,	Adjusted R-squared:  0.4756 
F-statistic:   585 on 1 and 643 DF,  p-value: < 2.2e-16


Call:
lm(formula = left_vec ~ right_vec)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.054036 -0.015260 -0.009461  0.010419  0.066147 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.926e-02  1.151e-03   25.43   <2e-16 ***
right_vec   2.137e-03  8.691e-05   24.59   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 

In [55]:
print(correction_model)

$X398

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029093     0.002149  


$X399

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029257     0.002137  


$X400

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029317     0.002133  


$X401

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029305     0.002134  


$X402

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029450     0.002135  


$X403

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.029747     0.002134  


$X404

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.030121     0.002131  


$X405

Call:
lm(formula = left_vec ~ right_vec)

Coefficients:
(Intercept)    right_vec  
   0.030453     0.002129  


$X406

Call:
lm(formula = left_vec ~ right_vec)


In [56]:
grd_corrected_bands <- apply_sensor_correction_model(
    correction_model,
    grd_resampled_to_match_img_bands,
    ignore_cols=c("targets")
)
head(grd_corrected_bands)

[1] "Correcting X398"
[1] "Correcting X399"
[1] "Correcting X400"
[1] "Correcting X401"
[1] "Correcting X402"
[1] "Correcting X403"
[1] "Correcting X404"
[1] "Correcting X405"
[1] "Correcting X406"
[1] "Correcting X407"
[1] "Correcting X408"
[1] "Correcting X409"
[1] "Correcting X410"
[1] "Correcting X411"
[1] "Correcting X412"
[1] "Correcting X413"
[1] "Correcting X414"
[1] "Correcting X415"
[1] "Correcting X416"
[1] "Correcting X417"
[1] "Correcting X418"
[1] "Correcting X419"
[1] "Correcting X420"
[1] "Correcting X421"
[1] "Correcting X422"
[1] "Correcting X423"
[1] "Correcting X424"
[1] "Correcting X425"
[1] "Correcting X426"
[1] "Correcting X427"
[1] "Correcting X428"
[1] "Correcting X429"
[1] "Correcting X430"
[1] "Correcting X431"
[1] "Correcting X432"
[1] "Correcting X433"
[1] "Correcting X434"
[1] "Correcting X435"
[1] "Correcting X436"
[1] "Correcting X437"
[1] "Correcting X438"
[1] "Correcting X439"
[1] "Correcting X440"
[1] "Correcting X441"
[1] "Correcting X442"
[1] "Corre

Unnamed: 0_level_0,X398,X399,X400,X401,X402,X403,X404,X405,X406,X407,⋯,X991,X992,X993,X994,X995,X996,X997,X998,X999,targets
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,0.03959205,0.03971448,0.03980649,0.03989402,0.03993598,0.03998372,0.04006842,0.04011957,0.0401284,0.04011659,⋯,0.3481901,0.3478993,0.3477545,0.3474089,0.3466302,0.3449703,0.3420228,0.3383148,0.333472,Lichen
2,0.03973606,0.03982242,0.03981182,0.03966507,0.03975703,0.03998159,0.04015367,0.04033059,0.04044774,0.04041452,⋯,0.3224225,0.321807,0.3212139,0.3202917,0.3188401,0.3162827,0.3121268,0.3070582,0.3006102,Lichen
3,0.04207511,0.04212382,0.04212466,0.04208156,0.04212483,0.0422091,0.04223788,0.04230394,0.04232418,0.04217074,⋯,0.3716171,0.3715887,0.3718289,0.3719968,0.3718313,0.3710014,0.3691804,0.3667553,0.3634369,Lichen
4,0.04300111,0.04303096,0.04302868,0.04310534,0.04310437,0.0430421,0.04290604,0.04282841,0.04276466,0.04266558,⋯,0.3810969,0.3811752,0.3815758,0.3819585,0.3820511,0.3815716,0.3802263,0.378345,0.3756751,Lichen
5,0.04035661,0.04042305,0.04043093,0.04034681,0.04039468,0.04050725,0.0405518,0.04065852,0.04074345,0.04069561,⋯,0.3594549,0.3592983,0.359344,0.3592489,0.3587662,0.3575045,0.3550955,0.3519978,0.347878,Lichen
6,0.04283818,0.04286851,0.04284293,0.04285441,0.04272704,0.04253116,0.04232057,0.04225283,0.04222859,0.04210852,⋯,0.4440861,0.4449147,0.4463862,0.4481746,0.4499289,0.451684,0.4533596,0.4549035,0.4562919,ShrubDecid


In [57]:
grd_corrected_indices <- get_vegetation_indices(grd_corrected_bands, NULL)
head(grd_corrected_indices)

Unnamed: 0_level_0,Boochs,Boochs2,CARI,Carter,Carter2,Carter3,Carter4,Carter5,Carter6,CI,⋯,TCARI,TCARIOSAVI,TCARI2,TCARI2OSAVI2,TGI,TVI,Vogelmann,Vogelmann2,Vogelmann3,Vogelmann4
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.002196477,0.002079549,0.3630003,3.801885,0.5772805,0.4101567,0.6963317,1.274685,0.10026139,0.9978871,⋯,0.07574705,0.2195668,0.1087256,0.6744134,1.288693,7.251936,1.160498,-0.04383203,1.0033465,-0.04507436
2,0.002378005,0.001991773,0.344496,3.428038,0.5608689,0.3976169,0.6955992,1.317687,0.09504679,0.9959828,⋯,0.08382284,0.2349098,0.1066267,0.6560503,1.447752,7.553373,1.157619,-0.04230283,0.9407876,-0.04351081
3,0.002176439,0.002301126,0.3881572,4.024595,0.5876588,0.4509694,0.6959063,1.210377,0.11828953,1.0006837,⋯,0.0684369,0.2088921,0.1262637,0.7663169,1.392134,7.37529,1.167095,-0.04484599,1.081629,-0.04617443
4,0.002182063,0.00236265,0.3932392,4.060701,0.5846644,0.4542775,0.6906997,1.198023,0.12285862,0.9998079,⋯,0.06707799,0.2052467,0.132689,0.7924633,1.429377,7.502604,1.169579,-0.04547061,1.1231909,-0.04683782
5,0.002243419,0.002256761,0.3763932,3.980261,0.5794007,0.4373599,0.6944381,1.226075,0.11204526,0.9981685,⋯,0.07051883,0.2115107,0.1215889,0.7373261,1.383191,7.374976,1.164694,-0.04408384,1.0444781,-0.04538139
6,0.004361676,0.004650146,0.3681003,2.965488,0.3517078,0.2270121,0.5313232,1.550199,0.09025157,0.9819091,⋯,0.12975551,0.2322985,0.2130935,0.7039762,2.820055,15.597647,1.331485,-0.07767473,1.0358824,-0.08243052


In [58]:
grd_corrected_resampled_bands <- resample_df(grd_corrected_bands, drop_existing=TRUE)
head(grd_corrected_resampled_bands)

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.



Unnamed: 0_level_0,X402.593_5nm,X407.593_5nm,X412.593_5nm,X417.593_5nm,X422.593_5nm,X427.593_5nm,X432.593_5nm,X437.593_5nm,X442.593_5nm,X447.593_5nm,⋯,X947.593_5nm,X952.593_5nm,X957.593_5nm,X962.593_5nm,X967.593_5nm,X972.593_5nm,X977.593_5nm,X982.593_5nm,X987.593_5nm,X992.593_5nm
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,0.03997865,0.04008886,0.0398418,0.03998454,0.04040493,0.04103805,0.0423296,0.04407765,0.04668026,0.04952288,⋯,0.3485803,0.3433482,0.3397158,0.3444559,0.354293,0.3606696,0.3612798,0.3625277,0.353809,0.3479747
2,0.03986755,0.0404323,0.04040138,0.04070967,0.04105187,0.04175938,0.04304328,0.04463349,0.04683042,0.04931992,⋯,0.3275439,0.3227699,0.3194364,0.3239677,0.3321934,0.335955,0.3348371,0.336612,0.3284429,0.3216728
3,0.04217691,0.04215716,0.04205933,0.04182551,0.04238416,0.04357998,0.04566227,0.04853224,0.0526642,0.05755068,⋯,0.3699315,0.3639649,0.3600244,0.3653635,0.3764961,0.3850428,0.3868898,0.3868407,0.377073,0.3718431
4,0.04306894,0.04269936,0.04269213,0.04238961,0.04277229,0.04441601,0.04704013,0.05043478,0.05520521,0.06063779,⋯,0.3794017,0.3729807,0.3691143,0.3740085,0.3857844,0.3954036,0.3974995,0.3967954,0.3865012,0.3815084
5,0.04044334,0.04064261,0.04036975,0.04061239,0.04085267,0.04197409,0.04361247,0.04633942,0.04990837,0.05436886,⋯,0.3589756,0.3531977,0.349439,0.3543469,0.3647115,0.3720984,0.3730677,0.3741297,0.3649251,0.3594685
6,0.04260478,0.04208439,0.04176607,0.04124917,0.0409275,0.04095372,0.04119483,0.04112227,0.04142324,0.04211313,⋯,0.4350184,0.4266156,0.4214468,0.4270427,0.4424289,0.4591485,0.4644475,0.4614233,0.4487721,0.4457366


In [59]:
write.csv(
    cbind(grd_corrected_resampled_bands, grd_corrected_indices), 
    file=paste0(X_TRAIN_PATH, "grd_raw_corrected.csv")
    )

# save labels also
write.csv(
    grd_resampled_to_match_img_bands$targets,
    file=paste0(Y_TRAIN_PATH, "grd_raw_corrected.csv"))

Now that that is done, we will move on to the clipped data

In [60]:
grd_resampled_to_img_clipped <- resample_df(
    clip_outliers(grd_bands),
    min_wavelength = 398,
    max_wavelength = 999,
    delta=1,
    drop_existing = TRUE
)

colnames(grd_resampled_to_img_clipped) <- colnames(img_bands)

grd_resampled_to_img_clipped$targets <- grd_targets
img_bands_with_targets <- img_bands
img_bands_with_targets$targets <- img_targets

matched_data_clipped <- create_matched_data(
    img_bands_with_targets,
    grd_resampled_to_img_clipped,
    cols=c("targets","targets")# assumes joining on columns named "targets" in each data.frame
)

correction_model <- build_columnwise_sensor_correction_model(
    matched_data_clipped$left,
    matched_data_clipped$right
)
grd_corrected_clipped_bands <- apply_sensor_correction_model(
    correction_model,
    grd_resampled_to_match_img_bands,
    ignore_cols=c("targets")
)
grd_corrected_clipped_indices <- get_vegetation_indices(grd_corrected_bands, NULL)
grd_corrected_clipped_resampled_bands <- resample_df(grd_corrected_bands, drop_existing=TRUE)





Call:
lm(formula = left_vec ~ right_vec)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.062877 -0.018860 -0.005963  0.015414  0.145496 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.0242788  0.0019965   12.16   <2e-16 ***
right_vec   0.0033725  0.0002498   13.50   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02754 on 643 degrees of freedom
Multiple R-squared:  0.2209,	Adjusted R-squared:  0.2197 
F-statistic: 182.3 on 1 and 643 DF,  p-value: < 2.2e-16


Call:
lm(formula = left_vec ~ right_vec)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.062523 -0.018941 -0.005552  0.015477  0.138100 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.024197   0.001966   12.31   <2e-16 ***
right_vec   0.003410   0.000247   13.81   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.

Using spline to predict value at new bands...

Beware the spectra are now partially smoothed.



In [61]:
write.csv(
    cbind(
        grd_corrected_clipped_indices,
        grd_corrected_clipped_resampled_bands
    ),
    file=paste0(
        X_TRAIN_PATH,
        "grd_clipped_corrected.csv"
    )
)

# save labels also
write.csv(
    grd_resampled_to_match_img_bands$targets,
    file=paste0(Y_TRAIN_PATH, "grd_clipped_corrected.csv"))

And finally the dropped outlier one

notes for later: should probably try PCA here.  clip -> scale -> PCA -> subset (and scale again for models like SVM and kNN)

...to be continued