From 50fdb9b43229e54e5d2780facde65245e0e15cbf Mon Sep 17 00:00:00 2001
From: Jere Lavikainen <61044352+jerela@users.noreply.github.com>
Date: Wed, 29 Nov 2023 14:56:32 +0200
Subject: [PATCH] Added column vector util func, fixed clustering
- added column() function to utils to create a column vector for a 1D list
- fixed typos in density clustering equations
- made hard k-means algorithm also return the cluster label for each data point
- improved type hints and docstrings
---
documentation/mola.clustering.html | 18 +++++-----------
documentation/mola.matrix.html | 4 ++--
documentation/mola.regression.html | 4 ++--
documentation/mola.utils.html | 15 ++++++++++++--
mola/clustering.py | 33 ++++++++++++++++++------------
mola/matrix.py | 2 +-
mola/regression.py | 4 ++--
mola/utils.py | 29 ++++++++++++++++++++++----
tests/clustering_test.py | 2 +-
9 files changed, 71 insertions(+), 40 deletions(-)
diff --git a/documentation/mola.clustering.html b/documentation/mola.clustering.html
index 1d68d87..61f1ee6 100644
--- a/documentation/mola.clustering.html
+++ b/documentation/mola.clustering.html
@@ -41,7 +41,7 @@
Arguments:
p1 -- list: the first point
p2 -- list: the second point
-
- find_c_means(data: mola.matrix.Matrix, num_centers=2, max_iterations=100, distance_function=<function distance_euclidean_pow at 0x0000023FFD4614C0>, initial_centers=None)
- Return the cluster centers and the membership matrix of points using soft k-means clustering (also known as fuzzy c-means).
+ - find_c_means(data: mola.matrix.Matrix, num_centers=2, max_iterations=100, distance_function=<function distance_euclidean_pow at 0x000002B30AB56670>, initial_centers=None)
- Return the cluster centers and the membership matrix of points using soft k-means clustering (also known as fuzzy c-means).
Fuzzy c-means clustering is an iterative algorithm that finds the cluster centers by first assigning each point to each cluster center with a certain membership value (0 to 1) and then updating the cluster centers to be the weighted mean of the points assigned to them. This process is repeated for a set number of iterations or until the cluster centers converge. The initial cluster centers are either randomized or given by the user.
A major difference between hard k-means clustering and fuzzy c-means clustering is that in fuzzy c-means clustering, the points may belong partially to several clusters instead of belonging completely to one cluster, like in hard k-means clustering. Therefore, this algorithm is well-suited to cluster data that is not clearly separable into distinct clusters (e.g., symmetric distribution of data points).
@@ -59,9 +59,9 @@
Arguments:
data -- Matrix: the data containing the points to be clustered
num_centers -- int: the number of cluster centers to be found (default 2)
-beta -- float: the width of the Gaussian function (default 0.5)
-sigma -- float: the width of the Gaussian function (default 0.5)
- - find_k_means(data: mola.matrix.Matrix, num_centers=2, max_iterations=100, distance_function=<function distance_euclidean_pow at 0x0000023FFD4614C0>, initial_centers=None)
- Return the cluster centers using hard k-means clustering.
+beta -- float: the width of the Gaussian function (default 0.5) used to destruct the mountain function
+sigma -- float: the width of the Gaussian function (default 0.5) used to construct the mountain function
+ - find_k_means(data: mola.matrix.Matrix, num_centers=2, max_iterations=100, distance_function=<function distance_euclidean_pow at 0x000002B30AB56670>, initial_centers=None) -> mola.matrix.Matrix
- Return the cluster centers using hard k-means clustering.
K-means clustering is an iterative algorithm that finds the cluster centers by first assigning each point to the closest cluster center and then updating the cluster centers to be the mean of the points assigned to them. This process is repeated for a set number of iterations or until the cluster centers converge. The initial cluster centers are either randomized or given by the user.
@@ -74,13 +74,5 @@
distance_function -- function: the distance function to be used (default Euclidean distance); options are squared Euclidean distance (distance_euclidean_pow) and taxicab distance (distance_taxicab)
initial_centers -- Matrix: the initial cluster centers; if not specified, they are initialized randomly (default None)
- random() method of random.Random instance
- random() -> x in the interval [0, 1).
-
-
-
-
-Data |
-
- | |
-INFINITE = 4294967295
-INFINITY = inf |
+