> __Purpose:__ The purpose of this NB is to serve as a repository of notes/resources worth pursuiing, as well as to outline the overarching and ongoing goal of the project.

# Current Goal
1. Subspace clustering:
    1. Reduce the dimensionality of the data. Ideally to remove noise and decrease the required computation time.
        - Determine the optimal number of dimensions? Or at least sufficient... dim reduc may not really be that important, perhaps would be more important if we can show we're reducing it to a specific manifold or doing some kind of joint embedding, but basic approaches like PCA and t-SNE probably aren't as well informed
        - How many PCs are required to explain at least 80% of the variance? 
        - Remember to normalize/scale the data first, especially when combining the EMG and IMU. Plus demean the EMG.
    2. Cluster the latent representations: we are searching for clusters (presumably that are ability/anatomical in nature), looking to develop archetypes, or a set of hierarchical groupings such that we can direct model training and initilization on the basis of these groupings/clusters.
        - Determine the optimal number of clusters...
        - We don't have ground truth, so need to find some (possibly combination of) metric(s) to compare different cluster outputs
    3. Finally, we need a way of few-shot cluster assignment. Template matching (basically just KNN), few-shot learning, meta-learning, mixture of experts, hierarchical FL?
        - When testing, need to think about how to break up the data. Cross validation? Or at least hold out a few entire users, and a few entire gestures of some included users, and half of the gestures of some included users.


# Subspace Clustering Resources
## Misc Good StatsStackExchanges I haven't Read Yet
- https://stats.stackexchange.com/questions/241381/clustering-methods-that-do-not-require-pre-specifying-the-number-of-clusters
- https://stats.stackexchange.com/questions/95782/what-are-the-most-common-metrics-for-comparing-two-clustering-algorithms-especi
- https://stats.stackexchange.com/questions/88550/using-the-gap-statistic-to-compare-algorithms
- https://stats.stackexchange.com/questions/21807/evaluation-measures-of-goodness-or-validity-of-clustering-without-having-truth
- https://stats.stackexchange.com/questions/195456/how-to-select-a-clustering-method-how-to-validate-a-cluster-solution-to-warran
- https://stats.stackexchange.com/questions/23472/how-to-decide-on-the-correct-number-of-clusters

## SciPy Dimensionality Reduction Algorithms
- https://scikit-learn.org/stable/modules/manifold.html
    - This one isn't code/docs, it's sort of more of a tutorial?
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
    - Maybe there's an interesting angle about doing feature extraction on the data (effectively a form of dim reduction) and clustering based on that? Not sure if we could use feature extraction for real time tho... probably?
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster
- https://scikit-learn.org/stable/modules/classes.html#clustering-metrics

## Sklearn Clustering Algorithms
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster
- AffinityPropagation
- AgglomerativeClustering
- Birch
- DBSCAN
    - DBSCAN on Wikipedia but also shows other approaches in the side bar: https://en.wikipedia.org/wiki/DBSCAN
- HDBSCAN
    - HDBSCAN Demo: https://scikit-learn.org/stable/auto_examples/cluster/plot_hdbscan.html
- FeatureAgglomeration
- KMeans
- BisectingKMeans
- MiniBatchKMeans
- MeanShift
- OPTICS
- SpectralClustering
- SpectralBiclustering
- SpectralCoclustering
## SciPy Clustering Algorithms
- Lots of different hierarchical algorithms: https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html

## Misc Clustering Algorithms To Look In To
- "In addition, I recommend that you use any ensemble clustering technique that can assign to more partitions, so called consensus partitions, where objects are often better partitioned than in the initial partitions."

## Sklearn Clustering Metrics
- Internal Evaluation Metrics: Internal evaluation metrics assess the quality of the clustering based on the data itself, without relying on external information or the true number of clusters. Examples of internal evaluation metrics include the Silhouette coefficient, Davies-Bouldin index, Calinski-Harabasz index, and Dunn index. These metrics provide a measure of compactness, separation, or overall clustering quality.
- External Evaluation Metrics: External evaluation metrics compare the clustering results with some external reference, such as known class labels if available. However, these metrics are not suitable when the number of clusters is not equal to the number of classes. If you have class labels, you could consider transforming the clustering results into a classification problem by assigning cluster labels to the data points and then use traditional classification evaluation metrics such as accuracy, precision, recall, or F1 score.
- Stability-based Methods: Stability-based methods evaluate clustering stability by assessing the consistency of clustering results across multiple iterations or subsets of the data. Techniques like stability index, bootstrapping, or consensus clustering can provide insights into the robustness of the clustering algorithm and help determine the optimal number of clusters.
- Visual Inspection: Visual inspection can be a useful approach to evaluate clustering results. Plotting the data points in a low-dimensional space using dimensionality reduction techniques (e.g., t-SNE, PCA) and coloring the points according to the clustering results can provide an intuitive visualization of the clusters. However, this method is subjective and might not provide a quantitative measure of performance.
- The AMI or (Adjusted Mutual Information) score is rescaled such that random clustering has a score of 0. The NMI (Normalized Mutual Information) is used for cases where you have a different number of clusters and therefore often a golden standard in the clustering community. Both measures range between 0 and 1, where 0 is considered as random clustering and 1 matches the ground truth perfectly.
- There exist also other measures such as F-measure or Purity.
- https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
- Adjusted Mutual Information Score: 
    - Unsupervised. It measures the agreement between two clusterings, making it an unsupervised metric.
- Adjusted Rand Score: 
    - Unsupervised. It quantifies the similarity between two clusterings, making it an unsupervised metric.
- Calinski-Harabasz Score: 
    - Unsupervised. It measures the ratio of between-cluster dispersion to within-cluster dispersion, making it an unsupervised metric.
- Davies-Bouldin Score: 
    - Unsupervised. It evaluates the average similarity between each cluster and its most similar cluster, making it an unsupervised metric.
- Completeness Score: 
    - Supervised. It measures the completeness of a clustering given the ground truth labels, making it a supervised metric.
- Contingency Matrix: 
    - Unsupervised. It describes the relationship between labels in two clusterings, making it an unsupervised tool.
- Pair Confusion Matrix: 
    - Unsupervised. It represents the confusion between two clusterings, making it an unsupervised tool.
- Fowlkes-Mallows Score: 
    - Unsupervised. It computes the similarity between two clusterings, making it an unsupervised metric.
- Homogeneity, Completeness, and V-Measure Scores: 
    - Supervised. These metrics evaluate the agreement between a clustering and the ground truth labels, making them supervised metrics.
- Homogeneity Score: 
    - Supervised. It measures the homogeneity of a clustering given the ground truth labels, making it a supervised metric.
- Mutual Information Score: 
    - Unsupervised. It quantifies the mutual information between two clusterings, making it an unsupervised metric.
- Normalized Mutual Information Score: 
    - Unsupervised. It computes the normalized mutual information between two clusterings, making it an unsupervised metric.
- Rand Score: 
    - Unsupervised. It computes the Rand index, which measures the similarity between two clusterings, making it an unsupervised metric.
- Silhouette Score and Silhouette Samples: 
    - Unsupervised. These metrics evaluate the quality of clusters based on intra-cluster and inter-cluster distances, making them unsupervised.
    - Silhouette analysis demo: https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
- V-Measure Score: 
    - Supervised. It evaluates the harmonic mean of homogeneity and completeness, making it a supervised metric.
> Other metrics:
- https://scikit-learn.org/stable/modules/classes.html#pairwise-metrics
    - Not a clustering metric, but rather pairwise/distance metrics. People frequently take the cosine similarity between matrices, that's essentially what we are looking to do between gestures. These are basically just different distance metrics we could use.
    
## Choosing Optimal Number of Clusters
- Partitioning clustering methods, like k-means and Partitioning Around Medoids (PAM), require that you specify the number of clusters to be generated.
- The Sum of Squares Method: choose the optimal number of cluster by minimizing the within-cluster sum of squares (a measure of how tight each cluster is) and maximizing the between-cluster sum of squares (a measure of how seperated each cluster is from the others).
- Clustree (does it exist in Python or only in R) and dendograms
- "For example, many of the above heuristics contradicted each other for what the optimal number of clusters was. Keep in mind these were all evaluating the k-means algorithm at different numbers of k. This could potentially mean that the k-means algorithm fails and no k is good. The k-means algorithm is not a very robust algorithm that is sensitive to outliers and this data set is quit small."
- Also you could use an algorithm which does not require the number of clusters as input. DBSCAN or HDBSCAN should scale fine to your dataset size.
- https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
- https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/

## Misc Dimensionality Reduction Github repos I haven't really looked at:
- https://github.com/heucoder/dimensionality_reduction_alo_codes
- https://github.com/azampagl/ai-ml-clustering/tree/master
- https://github.com/deeptime-ml/deeptime
    - This one is a whole library, you have to build the docs to view them, probably too much work, but if we could use some of their algos / evals that may be helpful

## Random subspace clustering Github repos I haven't really looked at:
> May or may not be useful. These are probably paper implementations so are likely too specific to be useful for us
- https://github.com/ChongYou/subspace-clustering
- https://github.com/panji530/Deep-subspace-clustering-networks
- https://github.com/jeya-maria-jose/Overcomplete-Deep-Subspace-Clustering

## Gesture Clustering Github repos I haven't really looked at:
- https://github.com/pjyazdian/Gesture2Vec
- https://github.com/hamzaiqbal786/FYP-Adaptive-Clustering-For-Gesture-Analysis