X and A are the parameter and data matrix, respectively. d is the number of features and n is the number of points we want to cluster. Each column of A is a data point and each column of X is the "centroid" of each data point. After solving this equation and get the optimal solution for X, different data points can be clustered in the same group if
i.e., different points' centroids are close enough to each other, for some predefined hyperparameter .
We provide implementation of following two optimization algorithm for convex clustering
- Accelrated Gradient Method (AGM)
- Newton-CG
The AGM requires the calculation of the gradient of the objective function, while Newton-CG requires Hessian. The detailed mathematical derivation is lengthy and thus only included in grad_hess_derivation.pdf. And the Numpy implementation is included in huber_obj.py.
We also write a cluster function in the utils.py that uses graph and DFS to find clusters