<a href="https://colab.research.google.com/github/piyuss/hyperspectral-image-demo/blob/main/hyper_image_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Tutorial: Hyperspectral Image Analysis

In this tutorial, we will go through steps in hyperspectral image processing using plant images as demo images. First, let's download the files we will work with. Hyperspectral image files tend to be large and will take a while to download. Run the code cell below by clicking on the **Play icon** to the left of the cell. You can also run the cell by clicking on the cell and pressing **CTRL/CMD+ENTER**.

In [None]:
!wget -O download.zip https://www.dropbox.com/s/s21q6uoococleai/images.zip?dl=1
!unzip download.zip

Let us look at the files that we downloaded. 

In [85]:
from glob import glob #Function to retrieve filenames from directory
file_list = glob("images/" + "*.*", recursive = True)
print("List of filenames:", file_list)

List of filenames: ['images/N0_A_B.bil', 'images/N8_A_B.bil.hdr', 'images/N8_A_B.bil', 'images/N0_A_B.bil.hdr']


We have two different types of files in the directory:
1. BIL files are binary files containing uncompressed image pixel data. BIL files simply store pixel values row-wise for each channel. We need additional information on the number of rows, columns, and bands to read the BIL files.
2. HDR files, or header files, contain the information about the BIL files.

<p align="center"> <img src= "https://github.com/piyuss/hyperspectral-image-demo/blob/main/illustrations/bil_format.PNG?raw=true">

<p align="center"> <i> Structure of a Band Interleaved by Line (BIL) file


**Image source:** https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/bil-bip-and-bsq-raster-files.htm +

+ *This link will also be useful to read about other image formats you may come across depending on the type of data and the sensor you are using*

We can look at the size of these files to understand that one is a huge binary file and the other is a text file with metadata.

In [113]:
import os
print("The size of ", file_list[0], " is ", os.path.getsize (file_list[0]), "bytes.")
print("The size of ", file_list[1], " is ", os.path.getsize (file_list[1]), "bytes.")

The size of  images/N0_A_B.bil  is  2956800000 bytes.
The size of  images/N8_A_B.bil.hdr  is  4053 bytes


Let us look at one of the header files to understand the file format before we read in the data.

In [83]:
import pandas as pd
header_file = pd.read_table(file_list[1], names=None)
print(header_file)

                                                 ENVI
0                                    interleave = bil
1                                      data type = 12
2                                        lines = 2000
3                                      samples = 1600
4                                         bands = 462
5                                      bit depth = 12
6                                      ceiling = 4095
7                                  sample binning = 1
8                                spectral binning = 2
9                                    line binning = 1
10                                    shutter = 19.56
11                                         gain = 0.0
12                     framerate = 50.090162292125825
13                   imager serial number = 100124-73
14                                     byte order = 0
15                                  header offset = 0
16  wavelength = {389.63, 390.95, 392.27, 393.58, ...
17                          

The header file contains everything one would like to know about the image. The size of the image can be deduced from lines, samples, and bands. The bit depth and byte order can be important depending on how we decide to read in the data. 

From the first line of the header file, we learn that the format of the header is ENVI. Detailed information about this format can be found here: https://www.l3harrisgeospatial.com/docs/enviheaderfiles.html

**INCLUDE AN IMAGE SHOWING WHAT THE LINES, SAMPLES, AND BANDS MEAN.**

We can use this information and write a program to read BIL files, but a smarter way is to use existing libraries. For this demonstration, we are going to use the open source Spectral Python (SPy) Library which is designed for hyperspectral image analysis. The documentation for the library is here: http://www.spectralpython.net/

The source code is here: https://github.com/spectralpython 

While the specific functions will be different depending on the library used, the general principles remain the same. SPy has a lot of nice abstractions. The user can read most common hyperspectral image formats with just the filename. Functions are provided for many common image operations so that almost everything can be achieved with a basic knowledge of Python and its libraries for scientific computing such as Numpy. Resources for learning Python are everywhere; one free resource that I recommend is here: https://greenteapress.com/wp/think-python/

We will first install SPy using the Python Package Index (PyPi) and import it.

In [None]:
!pip install spectral
from spectral import *

To read a file with an ENVI header, SPy provides a special function:

In [89]:
img = envi.open(file_list[1])

You will notice that envi.open() function was extremely fast. This is because the hyperspectral data is not read into the computer's memory; only the metadata in the header file has been read in. However, we can use the object img to extract information abou the image as well as to extract specific channels or pixels.

It supports common Numpy methods and operators.

We can check the dimensions of the image:

In [94]:
img.shape

(2000, 1600, 462)

We can inspect the image intensity at the 200th row and 1000th column of the 100th channel:

In [107]:
img[200,1000,100]

732

Now that we are able to access the numbers in the hyperspectral image cube, we can begin to do more interesting things. Before we start doing that, it is a good idea to read the entire image into memory so that operations can be faster. Until now, the program read values into memory only when specifically requested with the subscript ([]) notation. You can check the RAM usage of your Colab session from the menu on the top right now and after we run the next code cell to see that the image is read into memory only when the load() method is used. It is a good idea to be mindful of the size of your data, the computer's memory, and how it is being used when dealing with such large files

In [91]:
arr = img.load()

In [92]:
arr.shape

(2000, 1600, 462)

In [102]:
np.where(arr==np.max(arr))

(array([145]), array([1025]), array([122]))