## General ordering from simpler to more complicated decision tree types, along with brief guidance on when to use each:

1. **Decision Stump:**
  - A very simple decision tree consisting of only one decision node and two leaves. It is often used as a building block in ensemble methods.
   - Simplicity: Consists of only one decision node and two leaves.
   - Use Case: Quick baseline models when simplicity is more critical than accuracy.

2. **ID3 (Iterative Dichotomiser 3):**
   - Primarily used for classification tasks. It builds trees by selecting the best attribute at each node based on information gain.
   - Simplicity: Relatively straightforward, uses information gain for attribute selection.
   - Use Case: Suitable for categorical data and when interpretability is essential.

3. **CHAID (Chi-squared Automatic Interaction Detector):**
   - Primarily used for classification tasks. It uses statistical tests like chi-squared to determine the most significant attributes.
   - Simplicity: Uses statistical tests for attribute selection, suitable for categorical data.
   - Use Case: When dealing with categorical data and seeking insights from significant interactions.

4. **CART (Classification and Regression Trees):**
   - Versatile decision trees that can be used for both classification (assigning labels to items) and regression (predicting numerical values).
   - Simplicity: Versatile for classification and regression, uses Gini impurity for classification.
   - Use Case: Balanced choice for both classification and regression tasks, especially when the data includes a mix of categorical and numerical features.

5. **C4.5:**
   - An extension of ID3 that handles both categorical and numerical data. It uses information gain ratio for attribute selection.
   - Simplicity: An extension of ID3, handles both categorical and numerical data, uses information gain ratio.
   - Use Case: Suitable for datasets with a mix of categorical and numerical features.

6. **Random Forest:**
   - An ensemble of decision trees where multiple trees are built, and their predictions are combined to improve accuracy and reduce overfitting.
   - Simplicity: Ensemble method combining multiple decision trees.
   - Use Case: Robust choice for various tasks, especially when dealing with large datasets and a high number of features. Effective for reducing overfitting.

7. **Gradient Boosted Trees:**
   - A boosting algorithm that builds decision trees sequentially, with each tree correcting the errors of the previous ones. It is commonly used for regression and classification tasks.
   - Complexity: Builds decision trees sequentially, correcting errors of previous trees.
   - Use Case: When seeking high predictive accuracy and willing to invest computational resources, suitable for tasks like regression and classification.

8. **M5:**
   - An extension of C4.5 that includes additional features like handling numeric prediction tasks.
   - Complexity: An extension of C4.5 with additional features.
   - Use Case: When working with datasets that require the handling of numeric prediction tasks and benefit from the characteristics of C4.5.

9. **Conditional Decision Trees:**
   - Decision trees that include conditions based on certain criteria, allowing for more complex decision-making.
   - Complexity: Includes conditions for more complex decision-making.
   - Use Case: When decision-making requires intricate conditions and the dataset has complex relationships.

10. **Cost-sensitive Decision Trees:**
    - Decision trees designed to consider the costs associated with different types of errors during classification.
    - Complexity: Decision trees designed to consider the costs associated with different types of errors.
    - Use Case: When the costs of different types of errors in classification are uneven and need to be taken into account.

Remember that the choice of a specific decision tree type depends on the characteristics of your data, the nature of your problem (classification or regression), and your priorities (interpretability, accuracy, computational efficiency). It's often beneficial to experiment with multiple types to find the most suitable one for your specific task.

# 1. Decision Stump

In [2]:
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

In [3]:
# Create a DataFrame with the given dataset
data = {
    'Hours_of_Study': [2, 3, 4, 1, 5],
    'Exam_Result': ['Fail', 'Pass', 'Pass', 'Fail', 'Pass']
}

df = pd.DataFrame(data)

# Separate features (X) and target variable (y)
X = df[['Hours_of_Study']]
y = df['Exam_Result']
df.head()

Unnamed: 0,Hours_of_Study,Exam_Result
0,2,Fail
1,3,Pass
2,4,Pass
3,1,Fail
4,5,Pass


In [12]:
# Create a Decision Stump model (max_depth=1)
decision_stump_model = DecisionTreeClassifier(max_depth=1) #(one node and 2 leveles tree)

# Fit the model to the data
decision_stump_model.fit(X, y)



In [13]:
# Predictions for new data
new_data = pd.DataFrame({'Hours_of_Study': [2.5, 4.5]})
predictions = decision_stump_model.predict(new_data)


# Display the predictions for new data
print("Predictions for new data:")
for hours, prediction in zip(new_data['Hours_of_Study'], predictions):
    print(f"Hours of Study: {hours} => Predicted Result: {prediction}")

Predictions for new data:
Hours of Study: 2.5 => Predicted Result: Fail
Hours of Study: 4.5 => Predicted Result: Pass


In [14]:
# Display the decision stump rule
threshold = decision_stump_model.tree_.threshold[0]
feature_name = X.columns[decision_stump_model.tree_.feature[0]]
print(f"If {feature_name} <= {threshold}, then predict Fail.")
print(f"If {feature_name} > {threshold}, then predict Pass.")

If Hours_of_Study <= 2.5, then predict Fail.
If Hours_of_Study > 2.5, then predict Pass.


In [15]:
new_data2 = pd.DataFrame({'Hours_of_Study': [3]})

predictions = decision_stump_model.predict(new_data2)
for i in predictions:
    print(i)


Pass


In [16]:
threshold

2.5

# 2. ID3 (Iterative Dichotomiser 3)
DecisionTreeClassifier(criterion='entropy')

In [1]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_text


In [2]:
# Load the Iris dataset
iris = load_iris()

X = iris.data
y = iris.target

iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [3]:
X[0]

array([5.1, 3.5, 1.4, 0.2])

In [4]:
y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [5]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [6]:
# Create and train the decision tree model (ID3-like)
model = DecisionTreeClassifier(criterion='entropy')
model.fit(X, y)

# Display the decision tree rules
tree_rules = export_text(model, feature_names=iris.feature_names)
print("Decision Tree Rules:")
print(tree_rules)


Decision Tree Rules:
|--- petal width (cm) <= 0.80
|   |--- class: 0
|--- petal width (cm) >  0.80
|   |--- petal width (cm) <= 1.75
|   |   |--- petal length (cm) <= 4.95
|   |   |   |--- petal width (cm) <= 1.65
|   |   |   |   |--- class: 1
|   |   |   |--- petal width (cm) >  1.65
|   |   |   |   |--- class: 2
|   |   |--- petal length (cm) >  4.95
|   |   |   |--- petal width (cm) <= 1.55
|   |   |   |   |--- class: 2
|   |   |   |--- petal width (cm) >  1.55
|   |   |   |   |--- sepal length (cm) <= 6.95
|   |   |   |   |   |--- class: 1
|   |   |   |   |--- sepal length (cm) >  6.95
|   |   |   |   |   |--- class: 2
|   |--- petal width (cm) >  1.75
|   |   |--- petal length (cm) <= 4.85
|   |   |   |--- sepal length (cm) <= 5.95
|   |   |   |   |--- class: 1
|   |   |   |--- sepal length (cm) >  5.95
|   |   |   |   |--- class: 2
|   |   |--- petal length (cm) >  4.85
|   |   |   |--- class: 2



In [7]:
# Make predictions
sample_data = [[5.1, 3.5, 1.4, 0.2], [6.2, 2.8, 4.8, 1.8]]
predictions = model.predict(sample_data)

# Display predictions
for i, prediction in enumerate(predictions):
    print(f"Sample {i + 1}: Predicted Class - {iris.target_names[prediction]}")

Sample 1: Predicted Class - setosa
Sample 2: Predicted Class - virginica


# 3. CHAID (Chi-squared Automatic Interaction Detector)

In [None]:
import pandas as pd
from scipy.stats import chi2_contingency

In [None]:
# Load the Titanic dataset
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
titanic_data = pd.read_csv(url)

# Select relevant columns for the example
columns_of_interest = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Survived']

new_column_names = {'Siblings/Spouses Aboard': 'SibSp', 'Parents/Children Aboard': 'Parch'}

titanic_data.rename(columns=new_column_names, inplace=True)



In [None]:
titanic_data.head()

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


In [None]:
titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 887 entries, 0 to 886
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Survived  887 non-null    int64  
 1   Pclass    887 non-null    int64  
 2   Name      887 non-null    object 
 3   Sex       887 non-null    object 
 4   Age       887 non-null    float64
 5   SibSp     887 non-null    int64  
 6   Parch     887 non-null    int64  
 7   Fare      887 non-null    float64
dtypes: float64(2), int64(4), object(2)
memory usage: 55.6+ KB


In [None]:
titanic_data.describe()

Unnamed: 0,Survived,Pclass,Age,SibSp,Parch,Fare
count,887.0,887.0,887.0,887.0,887.0,887.0
mean,0.385569,2.305524,29.471443,0.525366,0.383315,32.30542
std,0.487004,0.836662,14.121908,1.104669,0.807466,49.78204
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.25,0.0,0.0,7.925
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.1375
max,1.0,3.0,80.0,8.0,6.0,512.3292


In [None]:
titanic_data.isnull().sum()

Survived    0
Pclass      0
Name        0
Sex         0
Age         0
SibSp       0
Parch       0
Fare        0
dtype: int64

In [None]:
titanic_data = titanic_data[columns_of_interest].dropna()

In [None]:
# Function to perform chi-squared test for a given split
def chi_square_test(data, feature, target):
    contingency_table = pd.crosstab(data[feature], data[target])
    _, p_value, _, _ = chi2_contingency(contingency_table)
    return p_value

# Recursive function to build CHAID decision tree
def build_chaid_tree(data, target, features):
    if len(features) == 0:
        return "No more features to split"

    best_feature = None
    best_p_value = 1

    for feature in features:
        p_value = chi_square_test(data, feature, target)

        if p_value < best_p_value:
            best_p_value = p_value
            best_feature = feature

    if best_p_value < 0.05:  # Adjust the significance level as needed
        print(f"Splitting on {best_feature} (p-value: {best_p_value})")
        unique_values = data[best_feature].unique()

        for value in unique_values:
            subset_data = data[data[best_feature] == value]
            print(f"  Subgroup '{best_feature}'={value} - {subset_data.shape[0]} samples")
            build_chaid_tree(subset_data, target, features.difference({best_feature}))
    else:
        print("No statistically significant split found.")





In [None]:
# Specify target variable and features
target_variable = 'Survived'
all_features = set(columns_of_interest) - {target_variable}

# Build CHAID decision tree
build_chaid_tree(titanic_data, target_variable, all_features)


Splitting on Sex (p-value: 3.847574039733745e-58)
  Subgroup 'Sex'=male - 573 samples
Splitting on Pclass (p-value: 9.552405032058647e-08)
  Subgroup 'Pclass'=3 - 343 samples
Splitting on Age (p-value: 0.03819988633936868)
  Subgroup 'Age'=22.0 - 21 samples
No statistically significant split found.
  Subgroup 'Age'=35.0 - 7 samples
No statistically significant split found.
  Subgroup 'Age'=27.0 - 11 samples
No statistically significant split found.
  Subgroup 'Age'=2.0 - 3 samples
No statistically significant split found.
  Subgroup 'Age'=20.0 - 20 samples
Splitting on SibSp (p-value: 0.024498330005453244)
  Subgroup 'SibSp'=0 - 16 samples
No statistically significant split found.
  Subgroup 'SibSp'=1 - 3 samples
No statistically significant split found.
  Subgroup 'SibSp'=8 - 1 samples
No statistically significant split found.
  Subgroup 'Age'=39.0 - 5 samples
No statistically significant split found.
  Subgroup 'Age'=26.0 - 13 samples
No statistically significant split found.
  Subgr