# Environment Setup

Follow these steps to set up and activate the development environment for this project:

1. **Create the Environment**

   Open a terminal in the project root directory and run one of the following commands:

   - For the standard environment:
     ```bash
     conda env create -f environment-dev.yml
     ```
   - For GPU support (e.g., with Sockeye):
     ```bash
     conda env create -f environment-dev-gpu.yml
     ```

2. **Activate the Environment**

   - For the standard environment:
     ```bash
     conda activate mds-afforest-dev
     ```
   - For GPU support:
     ```bash
     conda activate mds-afforest-dev-gpu
     ```

These commands will install all required dependencies for development.  
**Note:** Ensure you have [conda](https://docs.conda.io/en/latest/miniconda.html) installed before proceeding.


# Running the Scripts

To run the scripts, ensure you are in the project directory and the environment is activated. You can then execute the scripts using make. You can find the available scripts in the `Makefile` located in the root directory of the project. The scripts are organized into different sections, such as data processing, model training, and evaluation.

## Pre-requisites
Before running the scripts, ensure you have:
- Installed the required environment as described above.
- All necessary data files are available in the expected directories. By default the raw data is not available in the repository, but you can download it from: [Google Drive Link](https://drive.google.com/file/d/1GengsSVG29m0wH9EET1oaVhadv48dgGj/view?usp=drive_link)
- Place the data files in the `data/raw` directory of the project.

## Load the Data
To load the data into parquet format, you can use the provided script. Run the following command in the terminal:
```bash
make data/raw/raw_data.parquet: RAW_DATA_PATH=data/raw/AfforestationAssessmentDataUBCCapstone.rds
```

## Preprocess the Data
To preprocess the data, you can use the following command:
```bash
make preprocess_features
```

## Pivot the data
To pivot the data, you can use the following command:
```bash
make pivot_data
```

## Split the Data  
To split the processed data into training and test sets:  
```bash
make data_split
```  
This will execute the `data_split.py` script to generate the train and test datasets in the specified directory.

## Train the Models

### Classical Machine Learning Models
To train the models using the provided pipelines, run the following commands:

- **Logistic Regression:**
    ```bash
    make logistic_regression_pipeline
    ```
    This will train a logistic regression model and save it to `models/logistic_regression.joblib`.

- **Random Forest:**
    ```bash
    make random_forest_pipeline
    ```
    This will train a random forest model and save it to `models/` directory.

- **Gradient Boosting:**
    ```bash
    make gradient_boosting_pipeline
    ```
    This will train a gradient boosting model and save it to `models/` directory.
- **All models**:
    ```bash
    make all_classical_models
    ```
    This will train all the models defined in the `Makefile` and save them to the `models/` directory.
- **Fine-tune the models**:
    ```bash
    make tune_classical_models
    ```
    This will fine-tune the models using the `fine_tune.py` script and save the best models to the `models/` directory. This might take some time depending on the dataset size and the number of hyperparameter combinations.

### Deep Learning Models (RNNs)


- **Prepare data for RNN models:**  
    To generate the time series train and test datasets required for RNN models, run:  
    ```bash
    make data_for_RNN_models
    ```
    This will create `data/processed/train_lookup.parquet` and `data/processed/test_lookup.parquet` for use in RNN training.

    To split the processed data specifically for RNN models, run:  
    ```bash
    make data_split_RNN
    ```
    This will generate the appropriate train and test splits for RNN-based workflows.

- **Train RNN models:**
    To train the RNN models, you can use the following command:
    ```bash
    make rnn_pipeline RNN_TYPE=GRU RNN_PIPELINE_PATH=models/gru_no_sites.pth    
    ```
    This will train the RNN model and save it to `models/` directory.

## Running tests
To run the tests for the project, you can use the following command:

```bash
make test
```

## Clean Up

To clean up generated data and models, you can use the following commands:

- **Clean data files:**
    ```bash
    make clean_data
    ```
    This will remove the raw, interim, and processed data files and recreate the necessary directories with `.gitkeep` files.

- **Clean model files:**
    ```bash
    make clean_models
    ```
    This will remove all files in the `models` directory.
- **Clean all generated files:**
    ```bash
    make clean_all
    ```