# Part 1: Raster data

### An introduction to working with raster data in rasterio

In the first part of this workshop, we will explore raster data. 

Raster data is a digital data representation in which information is organized in a grid of cells. These cells can be rendered on a map as pixels, in which each pixel value corresponds to an actual area on the Earth’s surface. 

The satellite imagery that we will be working with today are (an example of) raster data. In order to analyse the imagery and therefore the impact of wildfires, we need to learn how to work with raster data in Python.

We will use the open-source geospatial Python library [rasterio](https://rasterio.readthedocs.io/en/stable/), that allows for the reading and writing of raster data. 

Rasterio is a GDAL and NumPy-based library that provides a straightforward and efficient way to handle raster datasets.

In this notebook, we will work with a Planet satellite image captured on August 20, 2023, displaying an area right outside the city of Alexandroupoli, during the wildfire. 


Let's start by importing the rasterio library

In [None]:
import rasterio

## Opening the raster dataset

Next, open a Planet's satellite image of the scene, which is stored in the `data` folder.

For now, we will look at the image captured before the forest fire occured.

In [None]:
dataset = rasterio.open('data/20230820_visual_RGB.tif')
dataset

The `open()` function of Rasterio takes a path string or path-like object, pointing towards a raster datasats that is stored in any of the various supported file formats. The function mimics Python's built-in `open()` function and it returns a dataset object, that has some of the same attributes as a Python file object. Such as the `name`, `mode` and `closed`.

In [None]:
dataset.name

In [None]:
dataset.mode

`mode` refers to the file access mode used when opening the raster dataset using the `rasterio.open()` function.

In this case, it indicates that the file is being opened for reading.

In [None]:
dataset.closed

`dataset.closed` checks if the dataset is closed (`True`) or not (`False`). After the `rasterio.open()` function is called, `dataset.closed` will be `False`, indicating that the dataset is open and operational for reading or writing operations. Once you are done working with the dataset, it is advisable to close it, using the `close()` method, to release the associated memory resources. Once the dataset is closed, it cannot be accessed for further operations.


## Plot the dataset

Rasterio has its own `rasterio.plot.show()`, that allows for common tasks such as displaying multiband rasters as RGB images and labeling the axes with georeferenced extents. 

In [None]:
from rasterio.plot import show
show(dataset)


Notice that the axes labels generated by the `show` function automatically correspond to the geospatial coordinates of the data.

RGB images consist of multiple bands. Therefore, every pixel in an RGB image contains 3 values: one for red, one for green and one for blue. By combining these values we get a visual image as you might know them from the pictures you make with your smartphone.

You can generate a figure containing multiple subplots by providing the `ax=ax1` argument in the `show` function. The example below illustrates how you can plot the bands in our raster dataset individually.

In [None]:
from matplotlib import pyplot as plt

# prepare plot
fig, (axr, axg, axb) = plt.subplots(1,3, figsize=(21,7))

# plot red band
show((dataset, 1), ax=axr, cmap='Reds', title='red band')

# plot green band
show((dataset, 2), ax=axg, cmap='Greens', title='green band')

# plot blue band
show((dataset, 3), ax=axb, cmap='Blues', title='blue band')


plt.show()
plt.close()

We have created a subplot for every band. The data in each band represents the light intensity of a specific color band. 

As we are looking at an 8-bit per band image, the values range from 0 to 255. 0 indicates zero light and 255 indicates maximum light. 

Each band in an RGB image contributes vital color information, and the combination of these bands creates the colorful and visually interpretable representation of the image.

For more information, see rasterio's [documentation](https://rasterio.readthedocs.io/en/stable/topics/plotting.html) on plotting raster data.

## Attributes

The dataset object contains the properties that were stored in the GeoTIFF file. These properties can be accessed through attributes of the opened dataset object.

Some examples:

In [None]:
dataset.shape

The `shape` method returns a tuple representing the dimensions of the raster dataset. Specifically, it returns a tuple in the form of (height, width), indicating the number of rows (height) and columns (width) in the raster dataset. The height and the width are also attributes that can be accessed separately.

In [None]:
dataset.height, dataset.width

In [None]:
dataset.count

`count` represents the number of bands in the raster dataset. This raster has three bands, Red, Green and Blue. All bands of a dataset have the same height, width and cell/pixel size.

In [None]:
dataset.dtypes

`dtypes` provides information on the data types of the individual bands in the raster dataset. 

The three bands in our example image all contain 8-bit unsigned integers (supports the value range 0 to 255). 

The GeoTIFF format can also accommodate signed integers and floats of varying sizes.

In [None]:
dataset.crs

Geospatial raster data is different from a regular image, as it's pixels are mapped, through a coordinate reference system (CRS), to locations on the earth's surface.
A CRS defines how spatial data relates to real-world geographic locations.

You can access the CRS of the raster dataset through the `crs` attribute.

EPSG:32635 refers to the [WGS 84/UTM zone 35N](https://epsg.io/32635) coordinate reference system. The unit of measurement is meters.  

In [None]:
dataset.meta

A comprehensive dictionary containing a summary of essential metadata information can be accessed through the `meta` attribute. This attribute makes it easy to inspect various properties of the dataset all together.

The `transform` describes the mapping of the pixel locations in the array (row, col) to the real world spatial coordinates (x, y). More information on the transform and an exercise to work witht the transform are provided in the "Extra" section at the bottom of this notebook.

## Reading the data

Rasterio's `read` function can be used to load the data of all the raster bands into a multidimensional NumPy array's.

Working with imagery poses several computational challenges due to the large volume of data and the complex operations involved in image processing and analysis. 
Numpy provides multidimensional arrays (ndarrays) that allow efficient storage and manipulation of large datasets. In addition, it supports element-wise operations.
NumPy's arrays enable fast computations and seamless integration with various scientific computing and image processing tools. 

In [None]:
array = dataset.read()
array.shape

The first value (in our case "3") corresponds to the number of bands of the NumPy array of our raster dataset (this is the same number as the `count` attribute).

The second value (in our case "2554") refers to the number of cells/pixels in the height direction of the NumPy array of our raster dataset (this is the same number as the `height` attribute).

The third value (in our case "3864") refers to the number of cells/pixels in the width direction of the NumPy array of our raster dataset (this is the same number as the `width` attribute).

In [None]:
# Let's print the Numpy array containing the values of all the bands
array

Multi-band imagery often contains more information than is necessary for a specific analysis. Therefore, it is also possible to load a single band into one NumPy array. The bands can be accessed by their index number, starting from 1 (following GDAL's convention).


In [None]:
red_band = dataset.read(1)
red_band.shape

In [None]:
# Let's print the Numpy array containing the red band data
red_band

As with every Numpy array, you can access values by using row and column indexes.

We have seen that at the top left part of the image a forest area is present. Let's extract the RGB values of a forest pixel and plot the color!

In [None]:
# extract RGB values from NumPy array containing all the bands
forest_pixel = array[:, 10, 100]
forest_pixel

In [None]:
# plot the values as a color
plt.imshow([[forest_pixel]])

plt.axis('off')
plt.show()
plt.close()

The main advantages of using rasterio to load raster data into NumPy arrays is the seamless integration between geospatial data and the powerful numerical computing capabilities of NumPy. 

### Exercise

Above we have extract a forest pixel. Now try to plot the color of a smoke pixel!
A smoke pixel can be found at row `1684` and column `2234` in our NumPy array.


In [None]:
# Fill in your solution below
# Extract smoke pixel from our NumPy array containing all 3 bands
# smoke_pixel=[YOUR SOLUTION]

# Plot the pixel
# [YOUR SOLUTION]


In order to get the array indices corresponding to points in georeferences space, the `index()` method can be used. More information and an exercise can be found below the "Extra" section, under "Spatial indexing".

## Writing data

Rasterio also allows for the creation of new raster data files. 

In order to do so, `rasterio.open()` can be used including the path to the to be created file and `'w'` to specify writing mode (instead of reading).

In the example below, we will save the green band of our input raster dataset as a separate file. 

Multi-band imagery often contains more information than is necessary for a specific analysis. Extracting only the relevant bands reduces the dataset's size, making it more manageable and faster to process. Working with fewer bands speeds up processing times. For applications where real-time or near-real-time analysis is crucial, extracting specific bands ensures faster results.

In [None]:
# read the green band into a NumPy array
green_band = dataset.read(2)

# make a copy of the meta data of our input dataset
meta_green_band = dataset.meta.copy()

# update the meta data: our new raster data file will only contain one band, the green band.
meta_green_band.update(
    count=1,
    )

# open a new GeoTIFF file in writer mode, and write the data to the file
with rasterio.open('data/green_band.tif', 'w', **meta_green_band) as dst:
    print(f"Mode of dataset 'dst': '{dst.mode}'")
    dst.write(green_band, 1)

Opening the raster dataset in a `with` statement ensures that the dataset object will be closed automatically. 

It is the not needed to use the `close()` method to sync the data to disk.

The "Extra" section at the bottom of this notebook contains an exercise in which you will work with the newly created dataset. Feel free to take a look after the workshop!

## Closing the dataset

The last step is to close the dataset after you're done with operations. 

By calling `dataset.close()`, you are ensuring that the opened file is properly closed and all associated memory resources are released, making your code more efficient and reliable.

In [None]:
dataset.close()

In [None]:
dataset.closed

## Extra 

If you want to explore and practise more, go throught the sections below!

### Georeferencing

Geospatial raster data is different from a regular image, as it's pixels are mapped, through a coordinate reference system (CRS), to locations on the earth's surface.

You can access the CRS of the raster dataset through the `crs` attribute.

In [None]:
dataset.crs

EPSG:32635 refers to the [WGS 84/UTM zone 35N](https://epsg.io/32635) coordinate reference system. The unit of measurement is meters.  

The geographic area covered by the raster dataset can be extracted through the `bounds` attribute. 

In [None]:
dataset.bounds

This information tells us that our raster image covers the region on the earth's surface 413661 meters to 425253 meters, left to right, and 4028985 meters to 4265115 meters bottom to top. But what does this mean exactly? 

These coordinates are defined in relation to the origin point (the 0,0 location) of the dataset's CRS. Therefore, '413661' refers to 413661 meters away from the origin point of the EPSG:32635 CRS.

In [None]:
dataset.transform

The `transform` describes the mapping of the pixel locations in the array (row, col) to the real world spatial coordinates (x, y). The returned affine transformation matrix consists of a set of six coefficients used to map pixel coordinates to geographic coordinates and vice versa. The transform matrix allows you to perform spatial analysis and geoprocessing tasks on the raster data.


So what are we looking at?

Affine(a, b, c,
        d, e, f)

* `a` = Horizontal Pixel Resolution (X-axis): This represents the width of a pixel in the dataset, usually in the units of the coordinate reference system (e.g., meters).
* `b` = Rotation Term: This term accounts for any rotation or skew in the dataset. Typically, this term is 0 if the dataset is not rotated.
* `c` = Horizontal Coordinate of the Top-left Pixel (X-coordinate): This represents the X-coordinate in the coordinate reference system of the top-left corner of the top-left pixel of the dataset.
* `d` = Vertical Rotation Term: This term accounts for any rotation or skew in the dataset. Typically, this term is 0 if the dataset is not rotated.
* `e` = Vertical Pixel Resolution (Y-axis): This represents the height of a pixel in the dataset, usually in the units of the coordinate reference system (e.g., meters). It is negative if the dataset is oriented from the top-left corner.
* `f` = Vertical Coordinate of the Top-left Pixel (Y-coordinate): This represents the Y-coordinate in the coordinate reference system of the top-left corner of the top-left pixel of the dataset.

Therefore, the spatial coordinates of the upper left corner can be obtained through: 

In [None]:
dataset.transform * (0, 0)

For more information on the Affine transformation matrix, see [rasterio's documentation](https://rasterio.readthedocs.io/en/stable/topics/transforms.html#using-affine-transformation-matrix) and [rasterio's GitHub page](https://github.com/rasterio/affine) on the topic.

### Exercise

Try to access the spatial coordinates of the lower rigth corner of the dataset, using the affine transformation matrix.


Hint: Remember, the spatial coordinates of the upper left corner can be obtained through:  `dataset.transform * (0, 0)`

In [None]:
# Fill in your solution below
#[YOUR SOLUTION]

## Spatial indexing

In order to get the array indices corresponding to points in georeferences space, the `index()` method can be used.

Let's say you are interested in the light intensity value of the red band at a specific location in your image, namely for the pixel that is 1 km east and 7 km south of the upper left corner of our dataset. How would you be able to retrieve this information from the numpy array? 

First we will use the `index()` method to get the array indices.

In [None]:
# define x and y
x, y = (dataset.bounds.left + 1000, dataset.bounds.top - 7000)
print(f"The x coordinate is {x}, the y coordinate is {y}")

# extract array indices corresponding to spatial coordinates
row, col = dataset.index(x, y)
print(f"The corresponding row and column are ({row, col})")

With these indices, we can now extract the value from the `red_band` array.

In [None]:
red_band[row,col]

The `xy()` method can be used to obtain the spatial coordinates of a pixel. 

The example below shows how to extract the coordinates of the center of the raster image. 

In [None]:
dataset.xy(dataset.height // 2, dataset.width // 2)

### Exercise 

What is the light intensity value of the blue band at the spatial coordinates `x=417110.0` and `y=4530136.6`?

In [None]:
# Fill in your solution below
# open the dataset
#[YOUR SOLUTION]

# set x and y
#[YOUR SOLUTION]

# extract array indices corresponding to spatial coordinates using the .index() method
#[YOUR SOLUTION]

# read the blue band into a NumPy array, using the .read() method
#[YOUR SOLUTION]

# extract value from blue band array
#[YOUR SOLUTION]

# close the dataset
#[YOUR SOLUTION]

# Hint: follow the same steps as presented above with extracting a value at x and y for the red band
# The band index for the red was 1, what is will then be the band index of the blue band?

### Exercise

In the "Writing raster data" section, we have created and saved a GeoTIFF containing the green band data (`'data/green_band.tif'`).

As an exercise, open this raster dataset and retrieve the amount of bands.

In [None]:
# Fill in your solution below
# open the newly created raster dataset and check the number of bands
#[YOUR SOLUTION]

# retrieve the bands count
#[YOUR SOLUTION]

# Don't forget to close the dataset
#[YOUR SOLUTION]


The newly created raster file has a `count` of 1, meaning that the dataset has one band. 

### Exercise

Try to print the meta data of both the original raster dataset (stored in `dataset`) and of your newly created dataset containing only the green band. 

Is the output what you expected?

In [None]:
# Fill in your solution below
#[YOUR SOLUTION]

When printing both the metadata of the opened raster file, and the metadata of the original rasterfile, we can see that only `count` has changed. This is was expected, as the only change we made is storing one band (green), instead of all three (red, green & blue).