# Assignment-4 COMP-5630 Jacob Murrah
## README
This notebook loads the housing dataset, computes and visualizes Naive Bayes conditional probabilities for both discrete and continuous features, implements a Gaussian Naive Bayes classifier, runs a Decision Tree classifier by varying the max depth to study over-fitting, and reports performance metrics on training and test splits.

## Dependencies
- **Python 3.x**
- **pandas**
- **numpy**
- **matplotlib**
- **sklearn**

*Note: If you are running this notebook in Google Colab, all the required packages are pre-installed.*

## Instructions
1. **Run All Cells:** Please click on \"Runtime\" > \"Run all\" to execute the entire notebook sequentially.
2. **Review the Outputs:** The notebook is organized into several sections. Ensure that all cells run without errors.


In [69]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from math import sqrt, pi, exp, log

# Part 1. Naive Bayes
See the attached housing data (Asssignment4_Data.xlsx). Each tab in the Excel file contains training and test splits. Your goal is to construct a Naïve Bayes classifier for this data.

In [70]:
train_data = pd.read_excel("Asssignment4_Data.xlsx", sheet_name="Train")
test_data = pd.read_excel("Asssignment4_Data.xlsx", sheet_name="Test")

## Part 1. (1) Compute and show the conditional probability distribution for each feature.<br>
Note: You are expected to do this part of the question by hand.<br>
**Answer:**<br>
Using Naive Bayes we will predict the price of a house given it's other features. The first step to creating a conditional probability is seperating the housing prices into groups. Since housing prices are continuous I will need to use thresholds to split the housing prices into groups. I choose to split the data into two equal groups (Low and High) with the seperator being the median.<br><br>
Median House Price = (5.6039 + 5.8282) / 2 = **5.7161**<br>
Low Price House Ids: 1, 2, 3, 4, 5, 6, 8, 12, 15, 16<br>
High Price House Ids: 7, 9, 10, 11, 13, 14, 17, 18, 19, 20<br>

| Price Category | Prior Probability |
| -------------- | ----------------- |
| Low Price      | 10/20 = 0.5       |
| High Price     | 10/20 = 0.5       |

Now I will calculate the conditional probabilities for each feature. For discrete features I will use tables and for continuous features I will use assume normality and calculate the mean and variance. Note that I will be using Add-One (Laplace) Smoothing for ALL of my calculations.
$$P(x_i | y) = \frac{count(x_i, y) + 1}{count(y) + |V|}$$
$$\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i$$
$$s^2 = \frac{1}{n - 1} \sum_{i=1}^{n} \bigl(x_i - \bar{x}\bigr)^2$$
$$f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\,\sigma^2}}\exp\!\Bigl(-\frac{(x - \mu)^2}{2\,\sigma^2}\Bigr)$$

**Bathrooms Feature (Discrete)**:

| # Bathrooms (LOW PRICE) | Count | P(# Bath = X \| LOW)  |
|-------------------------|-------|-----------------------|
| 1                       | 10    | (10+1)/(10+3) ≈ 0.846 |
| 1.5                     | 0     | (0+1)/(10+3) ≈ 0.077  |
| 2.5                     | 0     | (0+1)/(10+3) ≈ 0.077  |

| # Bathrooms (HIGH PRICE) | Count | P(# Bath = X \| HIGH) |
|--------------------------|-------|-----------------------|
| 1                        | 5     | (5+1)/(10+3) ≈ 0.462  |
| 1.5                      | 3     | (3+1)/(10+3) ≈ 0.308  |
| 2.5                      | 2     | (2+1)/(10+3) ≈ 0.231  |

<br>**Garages Feature (Discrete)**:

| # Garages (LOW PRICE) | Count | P(# Garages = X \| LOW) |
|-----------------------|-------|-------------------------|
| 0                     | 3     | (3+1)/(10+4) ≈ 0.286    |
| 1                     | 6     | (6+1)/(10+4) ≈ 0.500    |
| 1.5                   | 0     | (0+1)/(10+4) ≈ 0.071    |
| 2                     | 1     | (1+1)/(10+4) ≈ 0.143    |

| # Garages (HIGH PRICE) | Count | P(# Garages = X \| HIGH) |
|------------------------|-------|--------------------------|
| 0                      | 0     | (0+1)/(10+4) ≈ 0.071     |
| 1                      | 3     | (3+1)/(10+4) ≈ 0.286     |
| 1.5                    | 2     | (2+1)/(10+4) ≈ 0.214     |
| 2                      | 5     | (5+1)/(10+4) ≈ 0.429     |

<br>**Rooms Feature (Discrete)**:

| # Rooms (LOW PRICE)     | Count | P(# Rooms = X \| LOW)  |
|-------------------------|-------|------------------------|
| 5                       | 1     | (1+1)/(10+6) ≈ 0.125   |
| 6                       | 7     | (7+1)/(10+6) ≈ 0.500   |
| 7                       | 2     | (2+1)/(10+6) ≈ 0.1875  |
| 8                       | 0     | (0+1)/(10+6) ≈ 0.0625  |
| 9                       | 0     | (0+1)/(10+6) ≈ 0.0625  |
| 10                      | 0     | (0+1)/(10+6) ≈ 0.0625  |

| # Rooms (HIGH PRICE)    | Count | P(# Rooms = X \| HIGH) |
|-------------------------|-------|------------------------|
| 5                       | 1     | (1+1)/(10+6) ≈ 0.125   |
| 6                       | 3     | (3+1)/(10+6) ≈ 0.250   |
| 7                       | 3     | (3+1)/(10+6) ≈ 0.250   |
| 8                       | 1     | (1+1)/(10+6) ≈ 0.125   |
| 9                       | 1     | (1+1)/(10+6) ≈ 0.125   |
| 10                      | 1     | (1+1)/(10+6) ≈ 0.125   |

<br>**Bedrooms Feature (Discrete)**:

| # Bedrooms (LOW PRICE)  | Count | P(# Bedrooms = X \| LOW) |
|-------------------------|-------|--------------------------|
| 2                       | 1     | (1+1)/(10+4) ≈ 0.143     |
| 3                       | 7     | (7+1)/(10+4) ≈ 0.571     |
| 4                       | 2     | (2+1)/(10+4) ≈ 0.214     |
| 5                       | 0     | (0+1)/(10+4) ≈ 0.0714    |

| # Bedrooms (HIGH PRICE) | Count | P(# Bedrooms = X \| HIGH) |
|-------------------------|-------|---------------------------|
| 2                       | 1     | (1+1)/(10+4) ≈ 0.143      |
| 3                       | 6     | (6+1)/(10+4) ≈ 0.500      |
| 4                       | 1     | (1+1)/(10+4) ≈ 0.143      |
| 5                       | 2     | (2+1)/(10+4) ≈ 0.214      |

<br>**Construction Type Feature (Discrete)**:

| Construction Type (LOW PRICE) | Count | P(Construction Type = X \| LOW)  |
|-------------------------|-------|-----------------------------|
| Apartment               | 4     | (4+1)/(10+3) ≈ 0.385        |
| Condo                   | 2     | (2+1)/(10+3) ≈ 0.231        |
| House                   | 4     | (4+1)/(10+3) ≈ 0.385        |

| Construction Type (HIGH PRICE) | Count | P(Construction Type = X \| HIGH)  |
|-------------------------|-------|-----------------------------|
| Apartment               | 3     | (3+1)/(10+3) ≈ 0.308 |
| Condo                   | 4     | (4+1)/(10+3) ≈ 0.385 |
| House                   | 3     | (3+1)/(10+3) ≈ 0.308 |

<br>**Land Area Feature (Continuous)**:

| Price Type | Sample Mean ($\bar{x}$)  | Sample Variance ($s^2$) |
|------------|------------------|---------------------------|
| Low Price  | $\frac{51.2663}{10}\approx 5.1267$ | $\frac{60.1283}{10}\approx 6.0128$ |
| High Price | $\frac{74.025}{10}\approx 7.4025$  | $\frac{53.2068}{10}\approx 5.3207$ |

| House Index | 1 | 2 | 3 | 4 | 5 | 6 | 8 | 12 | 15 | 16 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Low Price   | 0.129566 | 0.131648 | 0.082734 | 0.147742 | 0.156703 | 0.156703 | 0.032684 | 0.162435 | 0.162477 | 0.032684 |

| House Index | 7 | 9 | 10 | 11 | 13 | 14 | 17 | 18 | 19 | 20 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| High Price  | 0.137898 | 0.100772 | 0.011193 | 0.158388 | 0.123964 | 0.164357 | 0.107363 | 0.168928 | 0.171491 | 0.170403 |

<br>**Living Area Feature (Continuous)**:

| Price Type | Sample Mean ($\bar{x}$)  | Sample Variance ($s^2$) |
|------------|------------------|---------------------------|
| Low Price  | $\frac{12.588}{10}\approx 1.2588$ | $\frac{0.5406}{10}\approx 0.0541$ |
| High Price | $\frac{17.009}{10}\approx 1.7009$  | $\frac{6.8197}{10}\approx 0.6820$ |

| House Index | 1 | 2 | 3 | 4 | 5 | 6 | 8 | 12 | 15 | 16 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Low Price   | 0.914756 | 1.001839 | 1.607403 | 1.703839 | 1.439109 | 0.870899 | 0.997374 | 0.774924 | 1.012563 | 0.997374 |

| House Index | 7 | 9 | 10 | 11 | 13 | 14 | 17 | 18 | 19 | 20 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| High Price  | 0.413410 | 0.055342 | 0.140175 | 0.409173 | 0.328280 | 0.377525 | 0.482597 | 0.467290 | 0.447103 | 0.468994 |

<br>**Age of Home Feature (Continuous)**:

| Price Type | Sample Mean ($\bar{x}$)  | Sample Variance ($s^2$) |
|------------|------------------|---------------------------|
| Low Price  | $\frac{436}{10}\approx 43.6$ | $\frac{1198.2222}{10}\approx 119.8222$ |
| High Price | $\frac{313}{10}\approx 31.3$  | $\frac{1682.3333}{10}\approx 168.2333$ |

| House Index | 1 | 2 | 3 | 4 | 5 | 6 | 8 | 12 | 15 | 16 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| Low Price   | 0.036058 | 0.008873 | 0.034527 | 0.023208 | 0.036058 | 0.019186 | 0.020787 | 0.016844 | 0.035580 | 0.020787 |

| House Index | 7 | 9 | 10 | 11 | 13 | 14 | 17 | 18 | 19 | 20 |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| High Price  | 0.009706 | 0.021886 | 0.012637 | 0.030713 | 0.030604 | 0.030713 | 0.010879 | 0.023786 | 0.016750 | 0.025063 |


## Part 1. (1) CONTINUED.
Explain how you got the probability distribution for at least two features in detail. Explain How I got the above values.<br>
**Answer:**<br>
I will explain how I got the probability distribution for the # Bathrooms (discrete) and the Land Area (continuous). Note that all of the values are calculated using the equations and processes I describe below.<br><br>
**# Bathrooms (discrete)**: The first step in creating the distribution is listing ALL of the values present in the dataset. Next we count the number of times each value appears in each class. I then calculate the probability for each discrete value using the Add-One (Laplace) smoothing for a specific class. Note that the V value in the Add-One smoothing equation represents the number of unique bathroom values. The other counts are self explainatory. We repeat this for each of our classes, in our case our classes are low and high. Note that this process is the exact same for each discrete feature in our dataset.<br>
Example: I will calculate the probability of the high price class having 1.5 bathrooms.<br>
$P(1.5 | HIGH) = \frac{3 + 1}{10 + 3} ≈ 0.308$
<br><br>
**Land Area (continuous):** For continuous features we must know the distribution of the data. It is commonly known that the continuous features provided follow a Normal Distribution so I ASSUME NORMALITY. In order to make predictions we will plug in a feature's mean and variance into the pdf of the normal distribution. This requires us to calculate the mean and variance for the continuous feature. The equations to do so are provided above and are commonly known. One we have the normal pdf, for every continuous datapoint we will plug it into the pdf to obtain an output. Note that I do this in my COD<br>
Example: I will calculate the mean and variance of a LOW PRICE home for the land area feature and predict using the normal equation. Let's use the land area datapoint of 9.52 as an example.
$\bar{x} ≈ \frac{1}{10} \sum_{i=1}^{n} x_i ≈ \frac{51.2663}{10} ≈ 5.1267$<br>
$s^2 ≈ \frac{1}{10 - 1} \sum_{i=1}^{n} \bigl(x_i - {5.1267}\bigr)^2 ≈ \frac{60.1283}{10 - 1} ≈ 6.0128$<br>
$f(9.52 \mid 5.1267, 6.0128) = \frac{1}{\sqrt{2\pi\,6.0128}}\exp\!\Bigl(-\frac{(9.52 - 5.1267)^2}{2 \cdot 6.0128}\Bigr) ≈ 0.03268$

## Part 1. (2) Classify the Test Data using your conditional probability distributions and the MAP rule.

In [71]:
THRESHOLD = 5.7161

In [72]:
class NaiveBayesClassifier:
  def __init__(self):
    self.probabilities = self.get_probabilities()
    self.all_features = self.get_column_key_mapping()

  def get_column_key_mapping(self):
    return {
      "discrete": {
        "bathrooms": "Bathrooms",
        "garages": "# Garages",
        "rooms": "# Rooms",
        "bedrooms": "# Bedrooms",
        "construction": "Construction type"
      },
      "continuous": {
        "land_area": "Land Area",
        "living_area": "Living area",
        "home_age": "Age of home"
      }
    }

  def get_probabilities(self):
    # prior probability
    prob_prior = {
      "low": 0.5,
      "high": 0.5
    }

    # discrete features
    prob_bathrooms = {
      1: {"low": 0.846, "high": 0.462},
      1.5: {"low": 0.077, "high": 0.308},
      2.5: {"low": 0.077, "high": 0.231},
    }
    prob_garages = {
      0: {"low": 0.286, "high": 0.071},
      1: {"low": 0.500, "high": 0.286},
      1.5: {"low": 0.071, "high": 0.214},
      2: {"low": 0.143, "high": 0.429}
    }
    prob_rooms = {
      5: {"low": 0.125, "high": 0.125},
      6: {"low": 0.500, "high": 0.250},
      7: {"low": 0.1875, "high": 0.250},
      8: {"low": 0.0625, "high": 0.125},
      9: {"low": 0.0625, "high": 0.125},
      10: {"low": 0.0625, "high": 0.125}
    }
    prob_bedrooms = {
      2: {"low": 0.143, "high": 0.143},
      3: {"low": 0.571, "high": 0.500},
      4: {"low": 0.214, "high": 0.143},
      5: {"low": 0.0714, "high": 0.214}
    }
    prob_construction = {
      "Apartment": {"low": 0.385, "high": 0.308},
      "Condo": {"low": 0.231, "high": 0.385},
      "House": {"low": 0.385, "high": 0.308}
    }

    # continuous features
    prob_land_area = {
      "low": {"mean": 5.1267, "var": 6.0128},
      "high": {"mean": 7.4025, "var": 5.3207}
    }
    prob_living_area = {
      "low": {"mean": 1.2588, "var": 0.0541},
      "high": {"mean": 1.7009, "var": 0.682}
    }
    prob_home_age = {
      "low": {"mean": 43.6, "var": 119.8222},
      "high": {"mean": 31.3, "var": 168.2333}
    }

    return {
      "prior": prob_prior,
      "bathrooms": prob_bathrooms,
      "garages": prob_garages,
      "rooms": prob_rooms,
      "bedrooms": prob_bedrooms,
      "construction": prob_construction,
      "land_area": prob_land_area,
      "living_area": prob_living_area,
      "home_age": prob_home_age
    }

  def calculate_normal_pdf(self, x, mean, var):
    return (1 / sqrt(2 * pi * var)) * exp(-((x - mean) ** 2) / (2 * var))

  def process_feature(self, data_type, log_probs, prob_key, feature):
    if data_type == "discrete":
      log_probs["low"] += log(self.probabilities[prob_key][feature]["low"])
      log_probs["high"] += log(self.probabilities[prob_key][feature]["high"])
    else:
      pdf_low = self.calculate_normal_pdf(
        feature,
        self.probabilities[prob_key]["low"]["mean"],
        self.probabilities[prob_key]["low"]["var"]
      )
      pdf_high = self.calculate_normal_pdf(
        feature,
        self.probabilities[prob_key]["high"]["mean"],
        self.probabilities[prob_key]["high"]["var"]
      )
      log_probs["low"] += log(pdf_low)
      log_probs["high"] += log(pdf_high)

  def predict(self, house_data):
    log_probs = {
      "low": log(self.probabilities["prior"]["low"]),
      "high": log(self.probabilities["prior"]["high"])
    }
    for data_type, features in self.all_features.items():
      for prob_key, column_name in features.items():
        self.process_feature(
          data_type, log_probs, prob_key, house_data[column_name]
        )

    # convert log probabilities back to normal
    max_log_prob = max(log_probs["low"], log_probs["high"])
    probs = {
      "low": exp(log_probs["low"] - max_log_prob),
      "high": exp(log_probs["high"] - max_log_prob)
    }

    # normalize
    total_prob = probs["low"] + probs["high"]
    probs["low"] /= total_prob
    probs["high"] /= total_prob

    return {
      "prob_low": probs["low"],
      "prob_high": probs["high"],
      "pred": "Low Price" if probs["low"] > probs["high"] else "High Price"
    }

  def predict_batch(self, data, label):
    results = []
    for _, house_data in data.iterrows():
      prediction = nbc.predict(house_data)
      expectation = (
        "Low Price" if house_data["Local Price"] <= THRESHOLD else "High Price"
      )
      result = {
        "House ID": house_data["House ID"],
        "Price": house_data["Local Price"],
        "Probability Low": prediction["prob_low"],
        "Probability High": prediction["prob_high"],
        "Prediction": prediction["pred"],
        "Expectation": expectation,
        "Correct": int(prediction["pred"] == expectation)
      }
      results.append(result)
      if label == "Testing Data":
        print(
          f"House {house_data['House ID']}: P(Low)={result['Probability Low']:.4f} P(High)={result['Probability High']:.4f} → Prediction: {result['Prediction']}"
        )

    return results

  def print_metrics(self, results, label):
    print(f"\n--- {label} Prediction Results ---")
    total_correct = sum(r["Correct"] for r in results)
    print(f"Total CORRECT Predictions: {total_correct}")
    print(f"Total INCORRECT Predictions: {len(results) - total_correct}")
    print(f"{label} Accuracy: {total_correct / len(results)}")

In [73]:
print(f"Low Prices <= {THRESHOLD}\tHigh Prices > {THRESHOLD}\n")
test_label = "Testing Data"
train_label = "Training Data"

nbc = NaiveBayesClassifier()

print("--- Testing Data House Price Probabilities ---")
test_results = nbc.predict_batch(test_data, test_label)
train_results = nbc.predict_batch(train_data, train_label)

nbc.print_metrics(test_results, test_label)
nbc.print_metrics(train_results, train_label)

Low Prices <= 5.7161	High Prices > 5.7161

--- Testing Data House Price Probabilities ---
House 24: P(Low)=0.6174 P(High)=0.3826 → Prediction: Low Price
House 25: P(Low)=0.0186 P(High)=0.9814 → Prediction: High Price
House 26: P(Low)=0.0088 P(High)=0.9912 → Prediction: High Price
House 27: P(Low)=0.0052 P(High)=0.9948 → Prediction: High Price
House 28: P(Low)=0.4666 P(High)=0.5334 → Prediction: High Price

--- Testing Data Prediction Results ---
Total CORRECT Predictions: 4
Total INCORRECT Predictions: 1
Testing Data Accuracy: 0.8

--- Training Data Prediction Results ---
Total CORRECT Predictions: 16
Total INCORRECT Predictions: 4
Training Data Accuracy: 0.8


# Part 2. Decision Tree
Using the same housing data (Asssignment4 Data.xlsx), construct a decision tree
classifier. You can use the implementation available on Sci-Kit Learn.

In [74]:
THRESHOLD = 5.7161
RANDOM_STATE = 2000

In [75]:
# data preprocessing
train_data = pd.read_excel('Asssignment4_Data.xlsx', sheet_name='Train')
test_data = pd.read_excel('Asssignment4_Data.xlsx', sheet_name='Test')

train_data['label'] = (train_data['Local Price'] > THRESHOLD).astype(int)
test_data['label'] = (test_data['Local Price'] > THRESHOLD).astype(int)

features = [
  "Bathrooms",
  "# Garages",
  "# Rooms",
  "# Bedrooms",
  "Construction type",
  "Land Area",
  "Living area",
  "Age of home",
]

X_train = train_data[features]
y_train = train_data['label']

X_test  = test_data[features]
y_test  = test_data['label']

X_train_encoded = pd.get_dummies(X_train, columns=['Construction type'])
X_test_encoded  = pd.get_dummies(X_test,  columns=['Construction type'])
X_test_encoded = X_test_encoded.reindex(
  columns=X_train_encoded.columns, fill_value=0
)
feature_names = X_train_encoded.columns.tolist()

## Part 2. (1) Use the default parameters.

In [76]:
dtc_default = DecisionTreeClassifier(random_state=RANDOM_STATE)
dtc_default.fit(X_train_encoded, y_train)
train_accuracy_default = dtc_default.score(X_train_encoded, y_train)
test_accuracy_default = dtc_default.score(X_test_encoded, y_test)

print("Default parameters:")
print(f"(a) Training Data Accuracy = {train_accuracy_default:.3f}")
print(f"(b) Testing Data Accuracy = {test_accuracy_default:.3f}\n")

Default parameters:
(a) Training Data Accuracy = 1.000
(b) Testing Data Accuracy = 0.800



## Part 2. (2) What is the effect of restricting the maximum depth of the tree? Try different depths and find the best value.

In [77]:
results = []
for depth in range(1, 11):
  dtc = DecisionTreeClassifier(max_depth=depth, random_state=RANDOM_STATE)
  dtc.fit(X_train_encoded, y_train)

  train_acc = dtc.score(X_train_encoded, y_train)
  test_acc = dtc.score(X_test_encoded,  y_test)
  results.append((depth, train_acc, test_acc))

print("Accuracy at Varying Depths:")
for depth, train_acc, test_acc in results:
  print(f"depth {depth:2d}: train = {train_acc:.3f}, test = {test_acc:.3f}")

best_depth, best_train, best_test = max(results, key=lambda x: x[2])
print(f"\nBest Value: max_depth = {best_depth}, train = {best_train:.3f}, test = {best_test:.3f}")

Accuracy at Varying Depths:
depth  1: train = 0.900, test = 0.800
depth  2: train = 1.000, test = 0.800
depth  3: train = 1.000, test = 0.800
depth  4: train = 1.000, test = 0.800
depth  5: train = 1.000, test = 0.800
depth  6: train = 1.000, test = 0.800
depth  7: train = 1.000, test = 0.800
depth  8: train = 1.000, test = 0.800
depth  9: train = 1.000, test = 0.800
depth 10: train = 1.000, test = 0.800

Best Value: max_depth = 1, train = 0.900, test = 0.800


## Part 2. (3) Why does restricting the depth have such a strong effect on the classifier performance?
**Answer:**<br>
In general when using decision trees, restricting the depth is great way to prevent overfitting. In the above example, keeping the depth at 1 would produce the same results as having a depth of 10. As a result, we should keep the depth at 1, for this example, since the decision tree with this depth will generalize much better than a tree of depth 10. In practice, pruned/shallow trees act as a great regularizer that captures the the main patterns in the data. Since our test dataset is so small it is difficult to fully appreciate the significance of pruning as our test accuracy remains the same for every depth. However, in larger datasets we typically see that shallow trees generalize better than larger trees.

## Part 2. (4) For test data point, perform inference on decision tree.

In [78]:
test_datapoint = {
  "Local Price": 9.0384,
  "Bathrooms": 1,
  "Land Area": 7.8,
  "Living Area": 1.5,
  "# Garages": 1.5,
  "# Rooms": 7,
  "# Bedrooms": 3,
  "Age of home": 23
}
df_datapoint = pd.DataFrame(
  [test_datapoint], columns=feature_names
)

probs = dtc_default.predict_proba(df_datapoint)[0]
pred = dtc_default.predict(df_datapoint)[0]
label = "Low Price" if pred == 0 else "High Price"

print("Inference on the given test datapoint:")
print(f"P(Low) = {probs[0]:.3f}")
print(f"P(High) = {probs[1]:.3f}")
print(f"Prediction: {label}")

Inference on the given test datapoint:
P(Low) = 0.000
P(High) = 1.000
Prediction: High Price
