### QUESTION_3

As a part of the estimation of the tree biomass, we need the tree trunk diameter. Which is
the tree diameter at breast height(DBH). But, we don’t get the Diameter at breast height
data from the drone orthomosaic. So, we decided to get it from the 3 variables given below -
a. Tree species
b. Tree height
c. Tree crown size
We have some ground data collected, which is uploaded in https://drive.google.com/drive/folders/1zYBCBJlYS-5KCU4AUpMXHSS4jHadfuZS?usp=drive_link.
So, please develop and train the Machine Learning Model to get the Tree DBH from the Tree
Species, Tree Height and Tree Crown Size.
Share the code you have used to train the model and some description on the methodology
and results in the document.

#### About the code:
This code performs a regression task using a Random Forest algorithm to predict the diameter at breast height (TreeDBH_cm) of trees based on some features. Here's a breakdown of the code:

1. Import necessary libraries:
   - `pandas` is imported as `pd` for data manipulation.
   - `train_test_split` from `sklearn.model_selection` is used to split the dataset into training and testing sets.
   - `RandomForestRegressor` from `sklearn.ensemble` is the regression model used.
   - `mean_absolute_error` from `sklearn.metrics` is used to evaluate the model's performance.

2. Load the dataset:
   - The dataset is loaded from a CSV file located at '/home/sushil/Desktop/F2F/data_for_assignment.csv' into a pandas DataFrame named `data`.

3. Preprocessing:
   - Categorical variables are converted into numerical format using one-hot encoding. This is done using the `pd.get_dummies()` function, where the column 'Tree species' is one-hot encoded.

4. Split the dataset into features (X) and target variable (y):
   - The features (`X`) are obtained by dropping the 'TreeDBH_cm' column from the DataFrame.
   - The target variable (`y`) is set to the 'TreeDBH_cm' column.

5. Split the data into training and testing sets:
   - The dataset is split into training and testing sets using `train_test_split()`, with 80% of the data used for training (`X_train`, `y_train`) and 20% for testing (`X_test`, `y_test`). The parameter `random_state` is set to 42 for reproducibility.

6. Model training:
   - A Random Forest regression model is instantiated with `RandomForestRegressor()` and trained on the training data (`X_train`, `y_train`) using the `fit()` method.

7. Predictions:
   - The trained model is then used to make predictions on the testing features (`X_test`) using the `predict()` method, resulting in predicted values (`y_pred`).

8. Evaluation:
   - Mean Absolute Error (MAE) is calculated to evaluate the performance of the model. MAE measures the average absolute difference between the actual and predicted values. The calculated MAE is then printed out.

In [13]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

# Load the dataset from Excel file
data = pd.read_csv('/home/sushil/Desktop/F2F/data_for_assignment.csv')

# Preprocessing: Convert categorical variables to numerical using one-hot encoding
data = pd.get_dummies(data, columns=['Tree species'])

# Split the dataset into features (X) and target variable (y)
X = data.drop(columns=['TreeDBH_cm'])
y = data['TreeDBH_cm']

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error:", mae)


Mean Absolute Error: 2.1564795321936985


### Question 2

Consider we will have 3000 plots where we will be doing the Agroforestry plantation, and we
want to calculate the NDVI of all the fields from the multispectral drone data. We can use the
QGIS/ArcGIS for calculating it for a small number of plots. But, we will need some code to
automate this calculation for a large number of plots. So, please write a code to calculate the
NDVI from the multispectral drone data given https://drive.google.com/drive/folders/1n7sp_eJIevkJ-Fsp97AuN_C_jBpkndG8?usp=drive_link

#### About the code
This code calculates the Normalized Difference Vegetation Index (NDVI) from two input raster files containing near-infrared (NIR) and red band data. Here's a breakdown of what each part of the code does:

1. Import Libraries:
   - `numpy` is imported as `np` for numerical operations.
   - `rasterio` is imported for reading and writing raster files.

2. Function Definition (`calculate_ndvi`):
   - This function takes three arguments: `red_file`, `nir_file`, and `output_file`, representing the file paths for the red band, NIR band, and the output NDVI file, respectively.

3. Opening Raster Files:
   - The red and NIR bands are opened using `rasterio.open()`.
   - The `with` statement is used to ensure proper closing of the raster files after use.

4. Reading Band Data:
   - The red and NIR bands are read as arrays using `.read(1)` (assuming single-band images) and converted to `np.float32` data type.

5. Masking Nodata Values:
   - Nodata values in the red and NIR bands are identified and replaced with `np.nan` (Not a Number) to exclude them from calculations.

6. Calculating NDVI:
   - NDVI is calculated using the formula: {(NIR - Red)}/{(NIR + Red)}.

7. Writing NDVI to Output File:
   - The metadata (profile) of the red band raster is copied to ensure consistency.
   - The datatype of the output raster is set to `rasterio.float32`.
   - The NDVI array is written to the output file using `rasterio.open()` in write mode (`'w'`).

8. Example Usage:
   - An example call to the `calculate_ndvi` function is provided, specifying the file paths for the red band, NIR band, and the output NDVI file.

In [14]:
import numpy as np
import rasterio

def calculate_ndvi(red_file, nir_file, output_file):
    # Open red and NIR bands
    with rasterio.open(red_file) as red_src, rasterio.open(nir_file) as nir_src:
        # Read band data as arrays
        red_band = red_src.read(1).astype(np.float32)
        nir_band = nir_src.read(1).astype(np.float32)

        # Mask out nodata values
        red_nodata = red_src.nodata
        nir_nodata = nir_src.nodata
        red_band[red_band == red_nodata] = np.nan
        nir_band[nir_band == nir_nodata] = np.nan

        # Calculate NDVI
        ndvi = (nir_band - red_band) / (nir_band + red_band)

        # Write NDVI to output file
        profile = red_src.profile
        profile.update(dtype=rasterio.float32)

        with rasterio.open(output_file, 'w', **profile) as dst:
            dst.write(ndvi.astype(rasterio.float32), 1)

# Example usage
calculate_ndvi("/home/sushil/Desktop/F2F/MultispectralDroneData-20240420T061142Z-001/MultispectralDroneData/DJI_20240318112444_0001_MS_R.TIF", "/home/sushil/Desktop/F2F/MultispectralDroneData-20240420T061142Z-001/MultispectralDroneData/DJI_20240318112444_0001_MS_NIR.TIF", "output_ndvi.tif")


  dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  dataset = writer(
