# Hyperspectral Band Selection

In this notebook, we'll demonstrate how to use our Inter-band redundancy method (IBRA) and the [Greedy Spectral Selection](https://www.mdpi.com/2072-4292/13/18/3649) (GSS) for hyperspectral band selection. IBRA can also be used in combination with Principal Component Analysis (PCA) and Partial Least Squares (PLS) for dimensionality reduction.

## Installation

Execute `!pip install git+https://github.com/NISL-MSU/HSI-BandSelection`



In [1]:
import torch
torch.cuda.is_available()

True

In [18]:
# !pip install -q git+https://github.com/NISL-MSU/HSI-BandSelection
import warnings
warnings.filterwarnings("ignore")

## Load your data

You can bring your own HSI classification dataset. Format the input data as a set of image data cubes of shape $(N, w, h, b)$, where $N$ is the number of data cubes, $w$ and $h$ are the width and the height of the cubes, and $b$ is the number of spectral bands. You could use the `createImageCubes` method, provided [here](https://github.com/NISL-MSU/HSI-BandSelection/blob/master/src/HSIBandSelection/readSAT.py#L47), as a reference to format your data.

In this example, we will load the Indian Pines dataset, which is an image with shape `(145, 145, 200)`.

In [4]:
from HSIBandSelection.readSAT import loadata, createImageCubes
X, Y = loadata(name='IP')
print('Initial image shape: ' + str(X.shape))

X, Y = createImageCubes(X, Y, window=5)
print('Processed dataset shape: ' + str(X.shape))

Initial image shape: (145, 145, 200)
Processed dataset shape: (10249, 5, 5, 200)


In this case, we loaded a HS image saved in our package. It doesn't matter where you bring the data from, you only need to provide the $X$ (input data) and $Y$ (target labels) matrices. In addition, assign your dataset a name; otherwise, it will be called `temp`. With these three elements, we create a data object:

In [10]:
from HSIBandSelection.utils import Dataset
dataset = Dataset(train_x=X, train_y=Y, name='IP')

## Execute the Band Selection / Dimensionality Reduction Algorithm

We'll use the `SelectBands` class. **Parameters**:

*   `dataset`: utils.Dataset object
*   `method`: Method name. Options: 'IBRA', 'GSS' (IBRA+GSS), 'PCA' (IBRA+PCA), and 'PLS' (IBRA+PLS)
*   `classifier`: Classifier type. Options: 'CNN' (if data is 2D), 'ANN', 'RF', 'SVM'. *Default:* 'CNN'
*   `nbands`: How many spectral bands you want to select or reduce to. *Default:* 5
*   `transform`: If True, the final selected bands will suffer a Gaussian transformation to simulate being a multispectral band. *Default:* False
*   `average`: If True, average consecutive bands to reduce the initial total # of bands to half. *Default:* False
*   `epochs`: Number of iterations used to train the NN models. *Default:* 150
*   `batch_size`: Batch size used to train the NN models. *Default:* 128
*   `scratch`: If True, execute the IBRA process from scratch and replace previously saved results. *Default:* True












In [20]:
from HSIBandSelection.SelectBands import SelectBands
selector = SelectBands(dataset=dataset, method='GSS', nbands=5)
print('Selected bands: ' + str(selector))

Selected bands: <HSIBandSelection.SelectBands.SelectBands object at 0x00000212192B3490>


From the SelectBands class, we call the `run_selection` method. **Parameters**:

*   `init_vf`: Initial Variance Inflation Factor threshold (used for IBRA). *Default: 12*
*   `final_vf`: Final Variance Inflation Factor threshold (used for IBRA). *Default: 5*



**Return**:

If the selected method is IBRA:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the selected bands

If the selected method is GSS:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `GSS_best`: The best combination of bands obtained using GSS
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the selected bands

If the selected method is PCA or PLS:

*   `VIF_best`: The VIF threshold at which the best results were obtained
*   `IBRA_best`: The best pre-selected bands using Iner-band redundancy
*   `reduced_dataset`: The reduced dataset after applying PCA or PLS to the pre-selected bands
*   `stats_best`: The best performance metric values obtained after 5x2 CV using the reduced bands

**IMPORTANT**: This code is implemented using Pytorch. If you're running this on Google Colab, change the runtime type to GPU to accelerate the training process!

In [None]:
VIF_best, IBRA_best, GSS_best, stats_best =selector.run_selection(init_vf=11, final_vf=9)

*************************************
Testing VIF threshold: 11
*************************************
Selecting bands:  [0, 7, 11, 15, 17, 20, 26, 34, 37, 39, 47, 56, 58, 60, 67, 74, 78, 89, 99, 104, 109, 125, 142, 144, 146, 148, 150, 169, 191, 198]
Executing IBRA + GSS (Greddy Spectral Selection)
	Analyzing candidate combination 1. 5x2 CV using bands: [17, 20, 26, 37, 47]


  0%|          | 0/10 [00:00<?, ?it/s]

In [None]:
print('The best metrics were obtained using a VIF value of {}'.format(VIF_best))
print('The pre-selected bands obtained by IBRA wew {}'.format(IBRA_best))
print('The pre-selected bands obtained by IBRA+GSS wew {}'.format(GSS_best))
print('The best classification metrics were as follows:')
print(stats_best)