# Development and Application of a Machine Learning-based Remote Sensing System for Deforestation Monitoring

#### _This project focuses on using remote sensing data and machine learning algorithms to detect deforestation in a specific region. It will involve programming, handling GIS data, processing remote sensing data, developing algorithms, and using machine learning models. It will provide a tangible demonstration of my ability to handle the types of tasks I might encounter in a new role as a GIS and Remote Sensing Specialist._

## Step 1 Data Collection: 
Start by acquiring satellite images of a specific area over a set period from Nasa's LandSat on USGIS.org. These satellite platforms provide multi-spectral images that can be used to detect changes in vegetation cover.

In [None]:
#I did an USGIS API call on satellite imagery from Loreto to Maynas. I downloaded 3 years of imagery from EROS.
# https://www.usgs.gov/centers/eros/science/usgs-eros-archive-vegetation-monitoring-eros-visible-infrared-imaging

## Step 2 Data Preprocessing:
Use GIS software (QGIS) and programming languages (Python or R) to preprocess the satellite images. This could involve tasks such as cloud removal, atmospheric correction, and image normalization.

In [None]:
#Loading the images and importing the Metadata

In [2]:
#Metadata
import xml.etree.ElementTree as ET
tree = ET.parse('C:/Users/calve/Documents/DeforestationProject/eviirs_ndvi/eviirs_ndvi_64666a87ad722cd3.xml')
root = tree.getroot()

In [12]:
root

<Element 'fgdcs' at 0x000001AF818D4090>

In [13]:
tree

<xml.etree.ElementTree.ElementTree at 0x1af814a19a0>

In [6]:
root.tag

'fgdcs'

In [8]:
root.attrib

{}

In [14]:
for child in root:
    print(child.tag, child.attrib)

fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}
fgdc {}


Remember, always start by processing a small chunk of your data to make sure your code is working correctly before scaling up to the full dataset.

In [None]:
#Loading and Reading the Images

Load the GeoTIFF files: Use the rasterio library to load the GeoTIFF files, which might include the NDVI image, the quality band image, and the acquisition band image.

In [None]:
import rasterio

ndvi_file = "path_to_your_ndvi_file"
ndvi_data = rasterio.open(ndvi_file)

quality_file = "path_to_your_quality_file"
quality_data = rasterio.open(quality_file)

acquisition_file = "path_to_your_acquisition_file"
acquisition_data = rasterio.open(acquisition_file)

In [None]:
#Chunking: You can process your data in smaller chunks instead of loading the whole dataset into memory at once. Python libraries such as Dask and xarray are designed for this type of work and integrate well with existing data science libraries in Python.

#Rasterio Windows: The rasterio library, which is used to read GeoTIFF files, includes a feature that lets you read a small window of the data at a time, greatly reducing memory usage.
from rasterio.windows import Window

with rasterio.open('large_image.tif') as src:
    w = src.read(1, window=Window(0, 0, 1024, 1024))  # Adjust the size of the window as needed

#Downsampling: If the high resolution of your data isn't necessary for your analysis, you can downsample the data to a lower resolution to decrease its size.

#Use GDAL's virtual file systems: GDAL (and by extension rasterio) supports reading files in chunks directly from compressed files or over the network, which can reduce the amount of space needed on your local machine.
import rasterio

# Open a dataset from a .zip file
with rasterio.open('/vsizip/path/to/archive.zip/large_image.tif') as src:
    w = src.read(1, window=Window(0, 0, 1024, 1024))

Interpret the Quality Band: The Quality Band GeoTIFF contains important information about the quality of each pixel in the NDVI image. You'll need to interpret this file to identify which pixels are of good quality, which are cloudy, which have bad band quality, and which are filled with snow or are empty (fill). You could create a mask to separate the good quality pixels from the bad.

In [None]:
# Assuming a quality value of 0 indicates good quality
good_quality_mask = quality_data.read(1) == 0

#Apply the Quality Mask to the NDVI Data: Using the good quality mask, you can filter out the bad quality pixels in the NDVI data.
ndvi_data_masked = np.where(good_quality_mask, ndvi_data.read(1), np.nan)
#Cloud Removal
#Atmospheric Correction

#Normalize the NDVI Data: The NDVI data provided is scaled by a factor of 10,000. You'll need to rescale it to the typical range of -1 to +1.
python
Copy code
ndvi_data_normalized = ndvi_data_masked / 10000.0

#Deal with Special NDVI Values: According to your information, some NDVI values are flagged with special values to indicate undefined/background (-2000) or negative surface reflectance (-3000). You'll need to handle these values appropriately, perhaps by setting them to NaN.
ndvi_data_final = np.where((ndvi_data_normalized == -0.2) | (ndvi_data_normalized == -0.3), np.nan, ndvi_data_normalized)

Interpret the Acquisition Band: The Acquisition Band GeoTIFF provides information about the acquisition date (day of year, DOY) and number for each pixel. You might want to extract this information if you're interested in when the data for each pixel was collected.

In [None]:
#Calculating Vegetation Indices

In [None]:
#Other Preproccessing needed?

## Step 3 Feature Extraction: 
Using the preprocessed images, extract relevant features for analysis. This could include spectral indices related to vegetation health, such as the Normalized Difference Vegetation Index (NDVI), which can be calculated using the appropriate bands in the multi-spectral images.

## Export for Model Development