# Test Methods for Decision Trees and Regression

## **Metrics**

### 1.1 Metrics for Classification (Decision Trees)

1. **Accuracy**  
   The percentage of correctly classified samples.

    $\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Samples}}$


2. **Precision**  
   The number of true positive results divided by all results predicted as positive. It measures the proportion of correctly predicted positive outcomes out of all predicted positive outcomes.

    $\text{Precision} = \frac{TP}{TP + FP}$


3. **Recall (Sensitivity, Coverage)**  
   The number of true positive results divided by all actual positive cases. It measures the proportion of true positive cases correctly identified by the model.

    $\text{Recall} = \frac{TP}{TP + FN}$


4. **F1-Score**  
   The harmonic mean of precision and recall. **F1** is used for balanced evaluation of models in situations where maintaining both high precision and recall is important.

    $F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}$


5. **ROC-AUC (Area Under Curve)**  
   The area under the Receiver Operating Characteristic curve, measuring the model's ability to distinguish between classes.


6. **Log Loss**  
   A metric penalizing models for predictions with low confidence. The higher the uncertainty or inaccuracy, the higher the penalty.

---

### 1.2 Metrics for Regression

1. **Mean Absolute Error (MAE)**  
   The average of absolute differences between predicted and actual values. It indicates how much (on average) predictions deviate from actual values, in the same units as the output (e.g., USD, km/h). For example, MAE = 1000 USD means the model is off by 1000 USD on average.

    $MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$


2. **Mean Squared Error (MSE)**  
   The average of squared differences between predicted and actual values. MSE penalizes large errors more heavily, as each difference is squared. It helps capture situations where the model makes significant errors.

    $MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$


3. **Root Mean Squared Error (RMSE)**  
   The square root of MSE. The result is in the same units as the output data (e.g., USD, kg), which makes it easier to interpret and indicates the average magnitude of prediction errors.

    $RMSE = \sqrt{MSE}$


4. **R-squared (R²)**  
   Indicates what portion of the variance in the dependent variable is explained by the variables included in the model. It is a measure of how well the model fits the data. The closer R² is to 1, the better the fit.

    $R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}$


5. **Mean Absolute Percentage Error (MAPE)**  
   The average percentage difference between predictions and actual values. It reflects the relative size of the error.

    $MAPE = \frac{100\%}{n} \sum_{i=1}^{n} \left|\frac{y_i - \hat{y}_i}{y_i}\right|$

---

## **How to compare regression models and classification models among themselves?**

### Formula for Ideal Distance
The distance from the ideal metrics for a classification model is calculated as:

$
\text{Distance} = \sqrt{\sum_{i=1}^n (\text{Metric}_i - \text{IdealMetric}_i)^2}
$
Where:
- $ \text{Metric}_i $ is the value of the $ i $-th metric.
- $ \text{IdealMetric}_i $ is the ideal value of the $ i $-th metric (e.g., 1.0 for all metrics).

Ideal values are:
- Regression: MAE, RMSE, MAPE = 0.0; R² = 1.0
- Classification: accuracy, precision, recall, f1_score = 1.0

**Comparator Logic**

The comparator ($m1 > m2$) returns `True` if $m1$ has a smaller distance to the ideal metrics than $m2$.

**Scoring Function**

Additionally, there is a get_score function that calculates a weighted score for each model by assigning different weights to metrics. This provides an alternative way to rank models based on customized priorities.

## **What will the testing process look like?**

#### 1. Data Splitting
   Split the data into training (70-80%) and test (20-30%) sets. Additionally, cross-validation (k-fold cross-validation) can be used for more reliable results.

#### 2. Model Training  
   Train the decision tree on the training data. For regression, train the appropriate model (e.g., linear regression, regression tree, etc.).

#### 3. Prediction  
   Use the test data to generate predictions (regression) or classes (classification).

#### 4. **Model Evaluation** 

##### 4.1 Metrics for Classification - `evaluation\evaluate_classification.py`
- Compute **Accuracy**, **Precision**, **Recall**, and **F1-Score** for the classification model's results.  
- For the regression model's results, map predicted values to intervals using the `values_to_class_labels` function. Then compute the same classification metrics (based on assigned classes).  
- Compare classification metrics for both models to assess their ability to classify data.

##### Visualization of Classification Metrics
- **X-axis:** Metric names (Accuracy, Precision, Recall, F1-Score).  
- **Y-axis:** Metric values (on a scale from 0 to 1).
- **Goal:** Compare Accuracy, Precision, Recall, and F1-Score for the classification and regression models.
- Two sets of bars representing results for the classification and regression models. Bars should be clearly labeled and include a legend.

##### 4.2 Metrics for Regression - `evaluation\evaluate_regression.py`
- Compute **MSE**, **RMSE**, **MAE**, **MAPE**, and **R²** for the regression model's results.  
- For the classification model's results, map intervals to single values (interval midpoints) using the `labels_to_midpoints` function. Then compute the same regression metrics.  
- Compare regression metrics for both models to determine which model generates smaller errors.  
- Pay special attention to **MAE** and **MAPE**, as they are the most intuitive and practical for interpretation (e.g., average error in USD or percentage).

##### Additional Analysis:
- Assess whether the magnitude of the error (e.g., **RMSE = 5000 USD**) is acceptable in the given context.  
- Analyze whether the differences between the models' results are **practically significant** or only statistically significant.  
- Examine the distribution of errors (residuals) as a function of actual values:  
  - Are errors random, or do they follow patterns (e.g., larger deviations for higher values)?  
  - If patterns exist, consider model adjustments or feature modifications.


**Actual vs. Predicted Plot**

   - **X-axis:** Values predicted by the model.  
   - **Y-axis:** Actual values.  
   - **Goal:** Points should align along the diagonal line (\( y = x \)) for an ideal model.

**Residual Plot**
   - **X-axis:** Predicted values.  
   - **Y-axis:** Residuals $(y_{\text{actual}} - y_{\text{predicted}})$.  
   - **Goal:** Residuals should be randomly distributed around zero, without visible patterns.

#### 5. **Evaluate Consistency between Models** - `evaluation\evaluate_consistency.py`

##### 5.1 Define Consistency
- The **linear regression model** outputs a point value $ p_{\text{regression}} $.  
- The **decision tree model** outputs a price interval $ [l_{\text{tree}}, u_{\text{tree}}] $, where:
  - $ l_{\text{tree}} $: lower bound of the interval.  
  - $ u_{\text{tree}} $: upper bound of the interval.  

Consistency occurs if:
$$
  l_{\text{tree}} \leq p_{\text{regression}} \leq u_{\text{tree}}
$$

##### 5.2 Percentage of Consistency
- Calculate the percentage of cases in the dataset where $ p_{\text{regression}} $ falls within the interval $ [l_{\text{tree}}, u_{\text{tree}}] $.  

Formula:
$$
  \text{Consistency Percentage} = \frac{\text{Number of Consistent Cases}}{\text{Total Cases}} \times 100\%
$$

##### 5.3 Distance from the Interval
If $ p_{\text{regression}} $ does not fall within the interval:
- If $ p_{\text{regression}} < l_{\text{tree}} $, distance $ = l_{\text{tree}} - p_{\text{regression}} $.  
- If $ p_{\text{regression}} > u_{\text{tree}} $, distance $ = p_{\text{regression}} - u_{\text{tree}} $.  

Average distance from the interval for all inconsistent cases:
$$
  \text{Average Distance} = \frac{\sum \text{Distances for Inconsistent Cases}}{\text{Number of Inconsistent Cases}}
$$

##### 5.4 Interpretation of Results
**High Consistency**: Both models predict similar values, suggesting their agreement.

**Low Consistency**: May indicate different fitting of models to the data.