# Task 6 – Decision Tree vs Rule-Based Model Comparison

This notebook compares two possible approaches for implementing the exclusion rules:
a rule-based model and a decision tree model.

The goal is to evaluate flexibility, maintainability, and suitability for future rule changes,
rather than to build a production-ready system.


## Background

The exclusion rules are used to filter out unsuitable species before scoring.
In this project, the rules may change over time and new rules may be added as more data becomes available.

Because of this, flexibility and ease of updates are important design considerations.


## Example Exclusion Rules (Small Sample)

For this comparison, a small set of example exclusion rules is used:

- Rainfall must be within a minimum and maximum range
- Soil type must be supported by the species
- A habitat flag (e.g. coastal) must be satisfied
- Species dependencies may exist (assumed for now)


In [None]:
# Rule-based prototype (conceptual example)

def rule_based_exclusion(farm, species):
    reasons = []

    if farm["rainfall"] < species["rainfall_min"]:
        reasons.append("rainfall below minimum")

    if farm["soil_type"] not in species["allowed_soil_types"]:
        reasons.append("soil type not supported")

    if species.get("requires_coastal") and not farm.get("is_coastal"):
        reasons.append("habitat requirement not met")

    if reasons:
        return False, reasons

    return True, []


### Rule-Based Approach – Observations

- Each rule is independent
- New rules can be added without changing existing ones
- Missing data can be handled by skipping rules
- Failure reasons are easy to record


In [None]:
# Decision tree prototype (conceptual example)

def decision_tree_exclusion(farm, species):
    if farm["rainfall"] < species["rainfall_min"]:
        return False, "rainfall below minimum"

    if farm["soil_type"] not in species["allowed_soil_types"]:
        return False, "soil type not supported"

    if species.get("requires_coastal") and not farm.get("is_coastal"):
        return False, "habitat requirement not met"

    return True, "passed all checks"


### Decision Tree Approach – Observations

- Rules are evaluated in a fixed order
- The structure becomes harder to manage as rules increase
- Updating logic often requires changing the tree structure
- Less flexible when rules are frequently added or removed


## Comparison of Rule-Based and Decision Tree Approaches

Both approaches can be used to implement exclusion logic, but they differ in structure and behaviour.

### Maintainability
In a rule-based approach, each rule is implemented as a separate check.
This makes the code easier to read and maintain, especially when the number of rules grows.
In contrast, a decision tree requires changes to the tree structure when new rules are added,
which can make maintenance more difficult over time.

### Ease of Updates
Rule-based logic allows new rules to be added or removed with minimal impact on existing rules.
Each rule can be updated independently.
With a decision tree, updates often require restructuring the tree or reordering conditions,
which increases the risk of introducing errors.

### Transparency and Debugging
Rule-based models are generally easier to understand and debug.
It is clear which rule caused a species to be excluded, and multiple failure reasons can be recorded.
Decision trees follow a fixed path, so only the first failing condition is usually captured,
which makes detailed explanations harder.

### Handling Missing Data
Rule-based logic can easily skip rules when required data is missing,
without excluding the species.
In a decision tree, missing data often needs special handling at each node,
which adds complexity to the implementation.

### Future Rule Growth and Dependencies
As more rules and species dependencies are introduced,
a rule-based approach scales more naturally.
Decision trees can become large and rigid as more conditions are added,
making them harder to modify when requirements change.

### Overall Fit for This Project
Because exclusion rules are expected to change over time and may depend on incomplete data,
the rule-based approach provides better flexibility and long-term maintainability for this project.


## Recommendation

Based on this comparison, the rule-based approach is more suitable for the exclusion rules module.

The main reasons are:
- Rules are expected to change and grow over time
- Each rule can be added or updated independently
- Failure reasons are easy to track and explain

For these reasons, a rule-based model provides better flexibility and maintainability for this project.
