### Summary of Decision Trees

A decision tree is a machine learning model that predicts an outcome by learning simple "if-then" rules from data. It visually resembles a flowchart, where each internal node represents a test on an attribute (e.g., "Is income > $50k?"), each branch represents the test's outcome, and each leaf node represents the final class label (e.g., "Approve Loan"). The tree structure is built by recursively splitting the data into purer subsets based on the features that provide the most information. This model is highly popular because its logic is transparent and easy for humans to understand and interpret.

This document explains the structure, construction, and common pitfalls of decision trees, primarily in the context of financial credit analysis.

#### Core Concepts

* **Application:** Decision trees are presented as a competitive tool for credit analysis, where a single incorrect decision can erase the profits from many successful ones. The goal is to compare the cost of granting versus denying a loan.
* **Structure:** A decision tree is composed of nodes (representing attributes), branches (representing attribute values), and leaf nodes (representing the final class). Each path from the root to a leaf forms an IF-THEN classification rule.
* **Construction:** Trees are typically built using a recursive partitioning approach. This process involves two main phases:
    1.  **Tree Construction:** Recursively splitting the data.
    2.  **Tree Pruning (Simplification):** Making the tree more compact to improve its predictive power.

#### Building the Tree: Entropy and Information Gain

The key to the construction phase is selecting the best attribute for each split.
* **Entropy:** This is a measure of the impurity or homogeneity of a data set. A "pure" set (all one class) has an entropy of 0, while a perfectly balanced set (e.g., 50/50 split) has the maximum entropy. The formula is given as $Entropia(t)=-\Sigma_{i=1}^{c}p(i/t)log_{2}p(i/t)$. An example calculation on a set of 10 instances (6 positive, 4 negative) yields an entropy of 0.971.
* **Information Gain:** This metric determines the best attribute for a split. It is defined as the expected reduction in entropy achieved by partitioning the data on a specific attribute. The attribute providing the highest information gain (the biggest difference in impurity before and after the split) is chosen.

#### Common Problems: Underfitting and Overfitting

The goal is to find a tree with the lowest error rate on *both* the training and test datasets.
* **Underfitting:** This occurs when the tree is not complex enough (too few nodes), and the error rate remains high.
* **Overfitting:** This occurs when the tree becomes too complex. It causes the training error to decrease while the *test error increases*. This means the model has become too specific to the training data and cannot generalize to new, unseen data.

#### Causes of Overfitting

1.  **Noise:** Overfitting can be caused by "noise," which is defined as mislabeled or incorrectly classified records in the training data. A complex model (M1) might learn these false patterns, achieving 100% training accuracy. However, a simpler model (M2) that ignores the noise (e.g., 80% training accuracy) may perform significantly better on the unseen test data (e.g., 90% test accuracy for M2 vs. 70% for M1).
2.  **Lack of Representative Samples:** Overfitting can also happen if the training data is not representative. For example, if the data contains no instances of "insolvent" clients with an income above 1600, the algorithm cannot learn the rules to identify them.

#### Algorithm Comparison

The document studies three financial datasets (Australian Credit, German Credit, BB-Paraná). It compares the mean accuracy of the J48 decision tree algorithm against Naive Bayes Simple and Back-Propagation. In this experiment, Back-Propagation achieved the highest final average accuracy (77.748%), followed by J48 (75.991%) and Naive Bayes (69.986%).

### Summary of Decision Trees in Credit Analysis

This document is a computer science monograph that explores the use of the decision tree technique for credit analysis.

#### Introduction and Motivation

The proliferation of large-scale databases, driven by low-cost storage and the internet, has created a need for automated tools to analyze data. This field is known as Knowledge Discovery in Databases (KDD), with Data Mining being one of its principal steps.

In high-stakes domains like credit-granting, a single incorrect classification decision can eliminate the gains from dozens of successful ones. This work focuses on the classification task, specifically using decision trees, because their results are highly intelligible and easy to interpret, which is a significant advantage in financial domains.

#### Classification and Model Evaluation

Classification is a data mining task aimed at finding a model, or function, that maps data records to predefined categorical labels (classes). This model can then be used for predictive purposes, such as determining if a new client will be a good or bad payer.

Evaluating a model's performance is critical. Common metrics include:
* **Accuracy:** The percentage of correctly classified records. This can be misleading in datasets with a heavy class imbalance. For example, a model that always predicts the "good payer" majority class in a 99% imbalanced dataset would have 99% accuracy but be useless at its real goal of finding bad payers.
* **Confusion Matrix:** A table that provides a detailed breakdown of performance, showing the counts of True Positives, False Positives, True Negatives, and False Negatives.
* **Cost Matrix:** This metric addresses the limitations of accuracy by assigning a specific penalty (cost) to different types of errors. In credit analysis, the cost of classifying a bad payer as good (a False Positive) is much higher than classifying a good payer as bad (a False Negative).

Models are typically evaluated using methods like **Holdout** (a single train/test split), **K-Fold Cross-Validation** (where the data is split into *k* folds and the model is trained *k* times), or **Bootstrap** (sampling with replacement).

#### Decision Tree Concepts

Decision trees are a popular classification technique that represents knowledge in a hierarchical structure. They employ a "divide-and-conquer" strategy.
* **Structure:** A tree consists of nodes (which represent attributes), branches (representing attribute values), and leaves (representing the final class).
* **Interpretability:** Each path from the tree's root to a leaf can be read as an explicit IF-THEN rule, making them easy to understand.

#### Tree Construction and Pruning

Trees are built using a recursive partitioning process. At each node, the algorithm must choose the "best" attribute to split the data. This is done using a measure of impurity, such as **Entropy**. The algorithm selects the attribute that provides the highest **Information Gain**, which is the greatest reduction in entropy.

This process can lead to **overfitting**, where the model becomes overly complex and learns noise or non-representative patterns from the training data. To prevent this, a simplification technique called **pruning** is used to remove branches.
* **Pre-Pruning:** Stops the tree-building process early based on a specific condition.
* **Post-Pruning:** Grows the tree to its full size and then removes branches in a bottom-up fashion. This method often yields better results.

#### Algorithms: ID3 and C4.5

* **ID3:** A foundational algorithm by Ross Quinlan that uses Information Gain as its splitting criterion. Its main limitation is that it only works with discrete (nominal) attributes.
* **C4.5:** The successor to ID3, which can handle continuous attributes by creating binary splits based on a "threshold" value.
    * **Gain Ratio:** C4.5 (Release 7) uses *gain ratio* instead of information gain. This corrects a bias in the original metric that favored attributes with many unique values (like an ID column).
    * **MDL Principle:** C4.5 (Release 8) incorporates the Minimum Description Length (MDL) principle, which adds a penalty to the gain calculation for continuous attributes, further improving attribute selection.

#### Case Study and Results

The study used the **WEKA** software, a Java-based data mining tool, for its experiments. The C4.5 Release 8 algorithm, implemented in WEKA as **J48**, was the primary technique studied.

The J48 algorithm was run on three financial datasets (Australian Credit, German Credit, and BB-Paraná) and evaluated using 10-fold cross-validation. Its accuracy was compared against Back-Propagation (Neural Networks) and Naive Bayes Simple.

The final mean accuracy results were:
* **Back-Propagation:** 77.748%
* **J48 (Decision Tree):** 75.991%
* **Naive Bayes Simple:** 69.986%

#### Conclusion

Although the neural network (Back-Propagation) achieved a slightly higher average accuracy, the study concludes that decision trees (J48) offer a crucial, non-negotiable advantage for credit analysis: **intelligibility**.

Decision trees provide transparent, human-readable rules (e.g., "IF account_time <= 25 months AND risk = C... THEN insolvent"). This ability to identify the most important factors and understand the model's logic is of great value to business analysts in the financial domain.