# Demonstrating the Beta Mixture Model Fitter

This notebook provides a practical example of how to use the `BetaMixtureFitter` from the `irbstudio.simulation.distribution` module. 

The process involves:
1. **Generating Synthetic Data**: We'll create a bimodal dataset by combining samples from two different Beta distributions. This mimics a portfolio with two distinct clusters of risk (e.g., 'low risk' and 'medium risk').
2. **Fitting the Model**: We'll instantiate and fit the `BetaMixtureFitter` to this synthetic data.
3. **Visualizing the Results**: We'll plot the original data's histogram against the probability density function (PDF) of the fitted mixture model to visually assess the quality of the fit.

In [1]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from scipy.stats import beta

# Add the project root to the Python path to allow importing irbstudio
import sys
import os
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from irbstudio.simulation.distribution import BetaMixtureFitter

print("Imports successful.")

Imports successful.


### 1. Generate Synthetic Bimodal Data

In [5]:
np.random.seed(42)

# Define the parameters for two underlying Beta distributions
# Component 1: Represents a 'low-risk' group
a1, b1 = 2, 50  # Low mean, low variance
size1 = 950

# Component 2: Represents a 'medium-risk' group
a2, b2 = 10, 30 # Higher mean, higher variance
size2 = 50

# Generate data from each component
data1 = np.random.beta(a=a1, b=b1, size=size1)
data2 = np.random.beta(a=a2, b=b2, size=size2)

# Combine them into a single dataset
synthetic_data = np.concatenate([data1, data2])

print(f"Generated {len(synthetic_data)} data points.")
print(f"Data mean: {synthetic_data.mean():.4f}")

Generated 1000 data points.
Data mean: 0.0501


### 2. Fit the Beta Mixture Model

In [6]:
# Initialize the fitter with 2 components
fitter = BetaMixtureFitter(n_components=2, max_iter=150, tol=1e-5)

# Fit the model to our data
fitter.fit(synthetic_data)

# Print the learned parameters
print("Fitted Model Parameters:")
for i in range(fitter.n_components):
    print(f"  Component {i+1}:")
    print(f"    Weight: {fitter.weights_[i]:.4f}")
    print(f"    Alpha:  {fitter.alphas_[i]:.4f}")
    print(f"    Beta:   {fitter.betas_[i]:.4f}")

2025-09-09 14:37:56 - irbstudio.simulation.distribution - INFO - Converged after 7 iterations.
Fitted Model Parameters:
  Component 1:
    Weight: 0.9530
    Alpha:  2.1350
    Beta:   51.5835
  Component 2:
    Weight: 0.0470
    Alpha:  12.6203
    Beta:   36.0280
Fitted Model Parameters:
  Component 1:
    Weight: 0.9530
    Alpha:  2.1350
    Beta:   51.5835
  Component 2:
    Weight: 0.0470
    Alpha:  12.6203
    Beta:   36.0280


### 3. Visualize the Fit

Now, let's plot the histogram of our synthetic data and overlay the probability density function (PDF) of the fitted mixture model. A good fit means the combined PDF curve should closely follow the shape of the histogram.

In [7]:
# Create a range of x-values for plotting the PDF
x_plot = np.linspace(0, 1, 1000)

# Calculate the PDF for each component
pdf1 = beta.pdf(x_plot, fitter.alphas_[0], fitter.betas_[0])
pdf2 = beta.pdf(x_plot, fitter.alphas_[1], fitter.betas_[1])

# Calculate the combined PDF of the mixture model
combined_pdf = (fitter.weights_[0] * pdf1) + (fitter.weights_[1] * pdf2)

# Create the plot
fig = go.Figure()

# 1. Add the histogram of the original data
fig.add_trace(go.Histogram(
    x=synthetic_data,
    name='Synthetic Data',
    histnorm='probability density', # Normalize to compare with PDF
    marker_color='#a9d1f7',
    opacity=0.7
))

# 2. Add the combined PDF of the fitted model
fig.add_trace(go.Scatter(
    x=x_plot, 
    y=combined_pdf, 
    name='Fitted Mixture PDF',
    line=dict(color='navy', width=3)
))

# 3. (Optional) Add the individual component PDFs
fig.add_trace(go.Scatter(
    x=x_plot, 
    y=fitter.weights_[0] * pdf1, 
    name='Component 1 PDF',
    line=dict(color='red', dash='dash')
))
fig.add_trace(go.Scatter(
    x=x_plot, 
    y=fitter.weights_[1] * pdf2, 
    name='Component 2 PDF',
    line=dict(color='green', dash='dash')
))

# Update layout
fig.update_layout(
    title_text='Fitted Beta Mixture Model vs. Original Data',
    xaxis_title='Value (e.g., PD Score)',
    yaxis_title='Density',
    legend_title='Components',
    template='plotly_white'
)

fig.show()