# Hyperspectral Band Selection

In this notebook, we'll demonstrate how to use our Inter-band redundancy method (IBRA) and the [Greedy Spectral Selection](https://www.mdpi.com/2072-4292/13/18/3649) (GSS) for hyperspectral band selection. IBRA can also be used in combination with Principal Component Analysis (PCA) and Partial Least Squares (PLS) for dimensionality reduction.

## Installation

Execute `!pip install git+https://github.com/NISL-MSU/HSI-BandSelection`



In [None]:
import torch
torch.cuda.is_available()

In [None]:
# !pip install -q git+https://github.com/NISL-MSU/HSI-BandSelection
import warnings
warnings.filterwarnings("ignore")

## Load your data

You can bring your own HSI classification dataset. Format the input data as a set of image data cubes of shape $(N, w, h, b)$, where $N$ is the number of data cubes, $w$ and $h$ are the width and the height of the cubes, and $b$ is the number of spectral bands. You could use the `createImageCubes` method, provided [here](https://github.com/NISL-MSU/HSI-BandSelection/blob/master/src/HSIBandSelection/readSAT.py#L47), as a reference to format your data.

In this example, we will load the Indian Pines dataset, which is an image with shape `(145, 145, 200)`.

In [None]:
from HSIBandSelection.readSAT import loadata, createImageCubes
# X, Y = loadata(name='IP')
X, Y = loadata(name='SA')
print('Initial image shape: ' + str(X.shape))

X, Y = createImageCubes(X, Y, window=5)
print('Processed dataset shape: ' + str(X.shape))

In this case, we loaded a HS image saved in our package. It doesn't matter where you bring the data from, you only need to provide the $X$ (input data) and $Y$ (target labels) matrices. In addition, assign your dataset a name; otherwise, it will be called `temp`. With these three elements, we create a data object:

In [None]:
from HSIBandSelection.utils import Dataset
# dataset = Dataset(train_x=X, train_y=Y, name='IP')
dataset = Dataset(train_x=X, train_y=Y, name='SA')

## Execute the Band Selection / Dimensionality Reduction Algorithm

We'll use the `SelectBands` class. **Parameters**:

*   `dataset`: utils.Dataset object
*   `method`: Method name. Options: 'IBRA', 'GSS' (IBRA+GSS), 'PCA' (IBRA+PCA), and 'PLS' (IBRA+PLS)
*   `classifier`: Classifier type. Options: 'CNN' (if data is 2D), 'ANN', 'RF', 'SVM'. *Default:* 'CNN'
*   `nbands`: How many spectral bands you want to select or reduce to. *Default:* 5
*   `transform`: If True, the final selected bands will suffer a Gaussian transformation to simulate being a multispectral band. *Default:* False
*   `average`: If True, average consecutive bands to reduce the initial total # of bands to half. *Default:* False
*   `epochs`: Number of iterations used to train the NN models. *Default:* 150
*   `batch_size`: Batch size used to train the NN models. *Default:* 128
*   `scratch`: If True, execute the IBRA process from scratch and replace previously saved results. *Default:* True












In [None]:
from HSIBandSelection.SelectBands import SelectBands
selector = SelectBands(dataset=dataset, method='GSS', nbands=5)
print('Selected bands: ' + str(selector))

From the SelectBands class, we call the `run_selection` method. **Parameters**:

*   `init_vf`: Initial Variance Inflation Factor threshold (used for IBRA). *Default: 12*
*   `final_vf`: Final Variance Inflation Factor threshold (used for IBRA). *Default: 5*



**Return**:

If the selected method is IBRA:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the selected bands

If the selected method is GSS:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `GSS_best`: The best combination of bands obtained using GSS
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the selected bands

If the selected method is PCA or PLS:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `reduced_dataset`: The reduced dataset after applying PCA or PLS to the pre-selected bands
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the reduced bands

**IMPORTANT**: This code is implemented using Pytorch. If you're running this on Google Colab, change the runtime type to GPU to accelerate the training process!

In [12]:
VIF_best, IBRA_best, GSS_best, stats_best =selector.run_selection(init_vf=11, final_vf=9)

*************************************
Testing VIF threshold: 11
*************************************


100%|██████████| 204/204 [10:51<00:00,  3.19s/it]


Selecting bands:  [2, 17, 22, 28, 37, 60, 63, 91, 104, 106, 127, 147, 175, 202]
Executing IBRA + GSS (Greddy Spectral Selection)
	Analyzing candidate combination 1. 5x2 CV using bands: [22, 28, 37, 60, 63]


100%|██████████| 10/10 [25:03<00:00, 150.31s/it]


	Mean F1: 0.993396685934445
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 251.57
			Multicolinearity analysis. Variance Inflation Factor of band 28 is 255.49
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 21.3
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 270.36
			Multicolinearity analysis. Variance Inflation Factor of band 63 is 283.75
	Analyzing candidate combination 2. 5x2 CV using bands: [22, 28, 37, 60, 91]


100%|██████████| 10/10 [24:44<00:00, 148.45s/it]


	Mean F1: 0.9946371705559475
	Best selection so far: [22, 28, 37, 60, 91]with an F1 score of 0.9946371705559475
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 266.17
			Multicolinearity analysis. Variance Inflation Factor of band 28 is 291.75
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.26
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 23.11
			Multicolinearity analysis. Variance Inflation Factor of band 91 is 22.81
	Analyzing candidate combination 3. 5x2 CV using bands: [17, 22, 37, 60, 91]


100%|██████████| 10/10 [24:30<00:00, 147.03s/it]


	Mean F1: 0.993061268723444
	Best selection so far: [22, 28, 37, 60, 91]with an F1 score of 0.9946371705559475
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 170.25
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 248.79
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.66
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 30.9
			Multicolinearity analysis. Variance Inflation Factor of band 91 is 20.82
	Analyzing candidate combination 4. 5x2 CV using bands: [17, 37, 60, 91, 175]


100%|██████████| 10/10 [24:23<00:00, 146.32s/it]


	Mean F1: 0.994193484073962
	Best selection so far: [22, 28, 37, 60, 91]with an F1 score of 0.9946371705559475
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 20.58
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 21.27
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 17.04
			Multicolinearity analysis. Variance Inflation Factor of band 91 is 23.04
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 11.38
	Analyzing candidate combination 5. 5x2 CV using bands: [17, 37, 60, 127, 175]


100%|██████████| 10/10 [24:22<00:00, 146.21s/it]


	Mean F1: 0.9946080504878069
	Best selection so far: [22, 28, 37, 60, 91]with an F1 score of 0.9946371705559475
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 17.39
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.48
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 2.91
			Multicolinearity analysis. Variance Inflation Factor of band 127 is 74.59
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 66.56
	Analyzing candidate combination 6. 5x2 CV using bands: [17, 37, 60, 104, 175]


100%|██████████| 10/10 [24:22<00:00, 146.22s/it]


	Mean F1: 0.9949819332179792
	Best selection so far: [17, 37, 60, 104, 175]with an F1 score of 0.9949819332179792
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 23.24
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 23.98
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 5.76
			Multicolinearity analysis. Variance Inflation Factor of band 104 is 15.07
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 17.63
	Analyzing candidate combination 7. 5x2 CV using bands: [2, 17, 60, 104, 175]


100%|██████████| 10/10 [24:26<00:00, 146.66s/it]


	Mean F1: 0.9941119026456293
	Best selection so far: [17, 37, 60, 104, 175]with an F1 score of 0.9949819332179792
			Multicolinearity analysis. Variance Inflation Factor of band 2 is 4.0
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 8.84
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 5.22
			Multicolinearity analysis. Variance Inflation Factor of band 104 is 5.06
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 9.52
	Analyzing candidate combination 8. 5x2 CV using bands: [2, 17, 60, 104, 202]


100%|██████████| 10/10 [24:32<00:00, 147.25s/it]


	Mean F1: 0.9919818007762276
	Best selection so far: [17, 37, 60, 104, 175]with an F1 score of 0.9949819332179792
			Multicolinearity analysis. Variance Inflation Factor of band 2 is 3.82
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 7.63
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 2.41
			Multicolinearity analysis. Variance Inflation Factor of band 104 is 4.43
			Multicolinearity analysis. Variance Inflation Factor of band 202 is 2.5
	Analyzing candidate combination 9. 5x2 CV using bands: [2, 60, 104, 147, 202]


100%|██████████| 10/10 [24:36<00:00, 147.66s/it]


	Mean F1: 0.9847654558893577
	Best selection so far: [17, 37, 60, 104, 175]with an F1 score of 0.9949819332179792
			Multicolinearity analysis. Variance Inflation Factor of band 2 is 2.39
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 2.31
			Multicolinearity analysis. Variance Inflation Factor of band 104 is 2.91
			Multicolinearity analysis. Variance Inflation Factor of band 147 is 2.27
			Multicolinearity analysis. Variance Inflation Factor of band 202 is 2.5
	Analyzing candidate combination 10. 5x2 CV using bands: [2, 60, 106, 147, 202]


100%|██████████| 10/10 [24:35<00:00, 147.58s/it]


	Mean F1: 0.9318810117085363
Selecting bands:  [17, 37, 60, 104, 175]

 Training a model using 5x2 CV using the final selected or reduced bands...
Mean F1 score: 99.50417325483932

*************************************
Testing VIF threshold: 10
*************************************


100%|██████████| 204/204 [00:13<00:00, 15.46it/s]


Selecting bands:  [2, 17, 22, 29, 37, 60, 64, 92, 103, 106, 127, 147, 175, 202]
Executing IBRA + GSS (Greddy Spectral Selection)
	Analyzing candidate combination 1. 5x2 CV using bands: [22, 29, 37, 64, 92]


100%|██████████| 10/10 [24:32<00:00, 147.30s/it]


	Mean F1: 0.9940913159993954
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 229.62
			Multicolinearity analysis. Variance Inflation Factor of band 29 is 251.58
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.39
			Multicolinearity analysis. Variance Inflation Factor of band 64 is 41.49
			Multicolinearity analysis. Variance Inflation Factor of band 92 is 39.9
	Analyzing candidate combination 2. 5x2 CV using bands: [22, 37, 60, 64, 92]


100%|██████████| 10/10 [24:26<00:00, 146.67s/it]


	Mean F1: 0.9941600304998535
	Best selection so far: [22, 37, 60, 64, 92]with an F1 score of 0.9941600304998535
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 15.29
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 20.13
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 344.76
			Multicolinearity analysis. Variance Inflation Factor of band 64 is 604.76
			Multicolinearity analysis. Variance Inflation Factor of band 92 is 57.36
	Analyzing candidate combination 3. 5x2 CV using bands: [17, 22, 37, 60, 92]


100%|██████████| 10/10 [24:36<00:00, 147.65s/it]


	Mean F1: 0.9931382094204737
	Best selection so far: [22, 37, 60, 64, 92]with an F1 score of 0.9941600304998535
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 170.37
			Multicolinearity analysis. Variance Inflation Factor of band 22 is 247.99
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.78
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 30.87
			Multicolinearity analysis. Variance Inflation Factor of band 92 is 20.79
	Analyzing candidate combination 4. 5x2 CV using bands: [17, 37, 60, 92, 175]


100%|██████████| 10/10 [24:25<00:00, 146.55s/it]


	Mean F1: 0.9944234228352459
	Best selection so far: [17, 37, 60, 92, 175]with an F1 score of 0.9944234228352459
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 20.72
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 21.29
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 16.98
			Multicolinearity analysis. Variance Inflation Factor of band 92 is 22.88
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 11.28
	Analyzing candidate combination 5. 5x2 CV using bands: [17, 37, 60, 127, 175]


100%|██████████| 10/10 [24:28<00:00, 146.80s/it]


	Mean F1: 0.9946080504878069
	Best selection so far: [17, 37, 60, 127, 175]with an F1 score of 0.9946080504878069
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 17.39
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 17.48
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 2.91
			Multicolinearity analysis. Variance Inflation Factor of band 127 is 74.59
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 66.56
	Analyzing candidate combination 6. 5x2 CV using bands: [17, 37, 60, 103, 175]


100%|██████████| 10/10 [24:26<00:00, 146.67s/it]


	Mean F1: 0.9947591954003474
	Best selection so far: [17, 37, 60, 103, 175]with an F1 score of 0.9947591954003474
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 22.99
			Multicolinearity analysis. Variance Inflation Factor of band 37 is 23.89
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 7.29
			Multicolinearity analysis. Variance Inflation Factor of band 103 is 15.07
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 15.5
	Analyzing candidate combination 7. 5x2 CV using bands: [2, 17, 60, 103, 175]


100%|██████████| 10/10 [24:28<00:00, 146.85s/it]


	Mean F1: 0.9944733183310219
	Best selection so far: [17, 37, 60, 103, 175]with an F1 score of 0.9947591954003474
			Multicolinearity analysis. Variance Inflation Factor of band 2 is 4.0
			Multicolinearity analysis. Variance Inflation Factor of band 17 is 8.91
			Multicolinearity analysis. Variance Inflation Factor of band 60 is 5.96
			Multicolinearity analysis. Variance Inflation Factor of band 103 is 5.09
			Multicolinearity analysis. Variance Inflation Factor of band 175 is 9.08
	Analyzing candidate combination 8. 5x2 CV using bands: [2, 17, 60, 103, 202]


 20%|██        | 2/10 [06:29<25:59, 194.97s/it]


KeyboardInterrupt: 

In [None]:
print('The best metrics were obtained using a VIF value of {}'.format(VIF_best))
print('The pre-selected bands obtained by IBRA wew {}'.format(IBRA_best))
print('The pre-selected bands obtained by IBRA+GSS wew {}'.format(GSS_best))
print('The best classification metrics were as follows:')
print(stats_best)