Problem 1
The file linked below contains a data set of 150 samples. The first column contains a single continuous
feature, X, assumed to have been drawn from an unknown probability density. The second column contains
the binary class label Y .

https://f000.backblazeb2.com/file/jeldridge-data/011-univariate_density_estimation/data.csv

In this problem, use a histogram estimator with bins [0, 1.5), [1.5, 3), . . . , [13.5, 15) to estimate the requested
probabilities. In each part, show your code and provide your reasoning.

a) Estimate P(Y = 1 | X = 6.271) directly

In [2]:
import numpy as np
import pandas as pd

# Load the data
data = pd.read_csv('data.csv', header=None, names=['X', 'Y'])
bin_edges = np.arange(0, 15.1, 1.5)  

# Target value
target_x = 6.271

# Find which bin contains our target value
for i in range(len(bin_edges) - 1):
    if bin_edges[i] <= target_x < bin_edges[i + 1]:
        target_bin_start = bin_edges[i]
        target_bin_end = bin_edges[i + 1]
        break

# Filter data for the target bin
bin_data = data[(data['X'] >= target_bin_start) & (data['X'] < target_bin_end)]

# Count total samples in the bin
bin_total = len(bin_data)

# Count samples with Y = 1 in the bin
bin_class_1 = sum(bin_data['Y'] == 1)

# Calculate the probability P(Y = 1 | X in the target bin)
probability = bin_class_1 / bin_total

# Final answer
print("\nAnswer to part a):")
print(f"P(Y = 1 | X = 6.271) = {probability:.4f}")


Answer to part a):
P(Y = 1 | X = 6.271) = 0.7143


b) Estimate P(Y = 1 | X = 6.271) by estimating all of: 1) the marginal density pX(x), 2) the classconditional density pX(x | Y = 1), and 3) the class prior P(Y = 1), and then applying Bayes’ rule.

In [None]:
bin_edges = np.arange(0, 15.1, 1.5)
bin_width = 1.5

# Target value
target_x = 6.271

# Find target bin
for i in range(len(bin_edges) - 1):
    if bin_edges[i] <= target_x < bin_edges[i + 1]:
        target_bin_start = bin_edges[i]
        target_bin_end = bin_edges[i + 1]
        break

# Count samples for full data and for Y = 1
total_samples = len(data)
class1_total = data['Y'].sum()  
p_y1 = class1_total / total_samples

# Count samples in the target bin (using the bin width to estimate densities)
in_bin = (data['X'] >= target_bin_start) & (data['X'] < target_bin_end)
bin_total = in_bin.sum()
class1_in_bin = data.loc[in_bin, 'Y'].sum()

# Estimate pX(x) and pX(x|Y=1) in the target bin
p_x = bin_total / (total_samples * bin_width)
p_x_given_y1 = class1_in_bin / (class1_total * bin_width)

# Apply Bayes' rule: P(Y = 1 | X) = [p(X | Y = 1) * P(Y = 1)] / p(X)
p_y1_given_x = (p_x_given_y1 * p_y1) / p_x

print(f"P(Y = 1 | X = {target_x}) = {p_y1_given_x:.4f}")

P(Y = 1 | X = 6.271) = 0.7143


c) For what values of x ∈ [0, 15] will the Bayes classifier predict y = 1?

In [None]:
bin_edges = np.arange(0, 15.1, 1.5)  
# Calculate class priors
p_y1 = sum(data['Y'] == 1) / len(data)
p_y0 = 1 - p_y1
y1_bins = []
for i in range(len(bin_edges) - 1):
    bin_start, bin_end = bin_edges[i], bin_edges[i+1]
    bin_width = bin_end - bin_start
    class1_in_bin = sum((data['Y'] == 1) & (data['X'] >= bin_start) & (data['X'] < bin_end))
    class0_in_bin = sum((data['Y'] == 0) & (data['X'] >= bin_start) & (data['X'] < bin_end))
    p_x_given_y1 = class1_in_bin / (sum(data['Y'] == 1) * bin_width)
    p_x_given_y0 = class0_in_bin / (sum(data['Y'] == 0) * bin_width)
    posterior_y1 = p_x_given_y1 * p_y1
    posterior_y0 = p_x_given_y0 * p_y0
    prediction = "y=1" if posterior_y1 > posterior_y0 else "y=0"
    print(f"[{bin_start}, {bin_end}) | {posterior_y1:.6f} | {posterior_y0:.6f} | {prediction}")
    if prediction == "y=1":
        y1_bins.append([bin_start, bin_end])
merged_regions = []
if y1_bins:
    merged_regions = [y1_bins[0]]
    for i in range(1, len(y1_bins)):
        if y1_bins[i][0] == merged_regions[-1][1]:
            merged_regions[-1][1] = y1_bins[i][1]
        else:
            merged_regions.append(y1_bins[i])
print("\nSolution: The Bayes classifier predicts y = 1 whenever pX(x | Y = 1)P(Y = 1) > pX(x | Y = 0)P(Y = 0)")

[0.0, 1.5) | 0.035556 | 0.000000 | y=1
[1.5, 3.0) | 0.093333 | 0.000000 | y=1
[3.0, 4.5) | 0.137778 | 0.004444 | y=1
[4.5, 6.0) | 0.106667 | 0.013333 | y=1
[6.0, 7.5) | 0.044444 | 0.017778 | y=1
[7.5, 9.0) | 0.026667 | 0.088889 | y=0
[9.0, 10.5) | 0.000000 | 0.066667 | y=0
[10.5, 12.0) | 0.000000 | 0.031111 | y=0
[12.0, 13.5) | 0.000000 | 0.000000 | y=0
[13.5, 15.0) | 0.000000 | 0.000000 | y=0

Solution: The Bayes classifier predicts y = 1 whenever pX(x | Y = 1)P(Y = 1) > pX(x | Y = 0)P(Y = 0)
