<a href="https://colab.research.google.com/github/supsi-dacd-isaac/TeachDecisionMakingUncertainty/blob/main/07/data_driven_ball_enclosing_problem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data-Driven Enclosing Sphere Problem

In this notebook, we address a robust data-driven optimization problem where the goal is to determine a credible set that encloses most of the data scenarios. This is achieved by finding the smallest sphere (or ball) that contains the majority of data points, while also allowing for a few outliers using slack variables.

## Problem Formulation

### Without discarded samples

Consider a data set of scenarios:






In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic 2D data with more samples and non-symmetric distribution
np.random.seed(42)
num_samples = 300  # Increased number of samples
X = np.random.multivariate_normal([2, 3], [[1, 0.5], [0.5, 2]], num_samples) # Non-symmetric covariance
X[:,0] = np.sin(X[:,1])* X[:,0]
plt.scatter(X[:, 0], X[:, 1], color='blue', label="Data Points", s=10)
plt.grid()
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.title("Synthetic 2D Data with Non-Symmetric Covariance")
plt.show()




We define the enclosing set as a sphere:
$$
\mathcal{U} = \{ u \in \Omega : \|u - c\| \le r \}
$$
where:
- $c \in \mathbb{R}^n$ is the center.
- $r \ge 0$ is the radius.

The optimization problem is:
$$
\begin{aligned}
\min_{c, r} \quad & r \\
\text{subject to} \quad & \|u^{(i)} - c\|_2 \le r, \quad i = 1, \dots, N, \\
& r \ge 0.
\end{aligned}
$$

where $\|u^{(i)} - c\|_2$ is the Euclidean norm (distance) of each scenario from the ball center, e.g., $\sqrt{(u_x^{(i)} - c_x)^2 + (u_y^{(i)} - c_y)^2}$ for a 2-dimensional case.



In [None]:
import cvxpy as cp

Thsi problem is simple (convex) but, yet, the constraits are quadratic....so we need dedicated solversm there are loaded with the package `import cvxpy as cp`

In [None]:
# Decision variables: center (c) and radius (r)
c = cp.Variable(2)
r = cp.Variable(nonneg=True)

# Constraints: Enclosing sphere condition for each data point
constraints = [cp.norm(X[i] - c, 2) <= r for i in range(X.shape[0])]  # cp.norm(x, 2)

# Objective function: Minimize the radius of the sphere
objective = cp.Minimize(r)

# Define and solve the optimization problem
prob = cp.Problem(objective, constraints)
result = prob.solve()

print("Optimal center:", c.value)
print("Optimal radius:", r.value)

# Plot the data points and the enclosing sphere
plt.figure(figsize=(8, 6))  # Adjust figure size for better visualization
plt.scatter(X[:, 0], X[:, 1], color='blue', label="Data Points", s=10) # Reduced marker size for clarity
circle = plt.Circle(c.value, r.value, color='green', fill=False, label="Enclosing Sphere", linewidth=2) # Increased linewidth
plt.gca().add_patch(circle)
plt.axis('equal')
plt.title("Minimum Enclosing Sphere")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.legend()
plt.grid(True) # Added a grid for better readability
plt.show()


### With discarded samples (Outliers)

To account for outliers, slack variables $\xi_i \ge 0$ are introduced:
$$
\|u^{(i)} - c\| \le r + \xi_i.
$$
The robust formulation becomes:
$$
\begin{aligned}
\min_{c, r, \xi} \quad & r + \lambda \sum_{i=1}^{N} \xi_i \\
\text{subject to} \quad & \|u^{(i)} - c\| \le r + \xi_i, \quad i = 1, \dots, N, \\
& \xi_i \ge 0, \quad i = 1, \dots, N, \\
& r \ge 0.
\end{aligned}
$$
where $\lambda > 0$ is a penalty parameter.


Tof find the discarded outliers proceed as follows:
* If $\xi_i > \epsilon$ where$ \epsilon$ is a sufficiently small value the i-th sample has been discarded.  

* $\xi_i \leq \epsilon$ will sattisfy the constraints to a reasonable numerical accuraccy.

In [28]:
# COMPLETE CODE

# Decision variables

# Constraints: Enclosing sphere condition for each data point

# Objective function: Minimize the radius of the sphere


# Define and solve the optimization problem