Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

sree0917 · 2019-12-09T05:13:08Z

Aim: Analyzing how the performance of different clustering algorithms for different datasets change on adding noise with different dimensions:

This demo is a Jupyter Notebook documentation describing the effect of the addition of different dimensions of noise on a dataset. Here different types of synthetic datasets are generated on which the experiment is performed. To these datasets Gaussian noise of different dimensions are added, and the performance of each clustering algorithm is measured after noise addition. This is repeated for noise with different variances.

Output: The plots that compare the effect of varying noise dimensions on different clustering algorithms for each of the datasets. In this set of subplots, the variance of the added noise changes along the column and the dataset changes along the row.

Link to the demo: https://nbviewer.jupyter.org/github/sree0917/scikit-learn/blob/master/clustering_comparison_pr.ipynb

…algorithms

bdpedigo · 2019-12-12T20:26:37Z

when you plot the data, remove ticks, tick labels, and have only one legend to the right of the entire figure (rather than 3). This will give you a lot more space. And can then make the titles bigger.
python should use snake_case (can look this up) as a convention for naming variables. For example addNoise should be add_noise. Many other variables like this should be changed
I think the last plot is the only ARI one that we need (so that's all I'm going to give feedback on)
can you add the datasets plot on top of the last ARI plot, and then remove the datasets alone plot?
is the variance of DBSCAN always 0? That seems weird to me, but I also don't know much about DBSCAN
not sure I agree with your claim about spectral clustering winning

Overall, nice plots and nice work! These are fairly minor comments.

sree0917 · 2019-12-15T03:45:39Z

Thanks for the feedback. I have made necessary changes.

bdpedigo · 2019-12-15T04:30:37Z

Unclear what file is the right one now? The notebook in the link at the top looks the same. Are you trying to merge two files?

bdpedigo · 2019-12-15T13:19:52Z

also this PR should be into the neurodatadesign fork

bdpedigo · 2019-12-15T13:20:12Z

seems like there are still plotting things that you didn't change, if I am looking at the right notebook

Comparing how the performance of different clustering algorithms change on adding high dimensional noise.

sree0917 · 2019-12-15T20:59:12Z

Hi! I have made the necessary changes and made a new PR into the NeuroDataDesign fork.
Link to the PR - NeuroDataDesign@0a71e4d

sree0917 · 2019-12-16T19:54:26Z

Hi! I have made PR the right way now.

Link - NeuroDataDesign#25

sree0917 added 2 commits December 9, 2019 00:08

Add files via upload

ddcbef0

Demo on the effect of high dimensional noise on different clustering …

ad0a944

…algorithms

sree0917 added 7 commits December 14, 2019 22:25

Add files via upload

c4d2c11

Delete clustering_comparison.ipynb.json

31cdc78

Add files via upload

7eb50ed

Delete clustering_comparison.ipynb.json

5499d58

Add files via upload

de5ccd3

Delete clustering_comparison_pr.ipynb

a55f522

File upload - demo on clustering on high dimensional data

cbf362f

Demo on comparison of clustering algorithms

0a71e4d

Comparing how the performance of different clustering algorithms change on adding high dimensional noise.

adam2392 deleted the branch neurodata:master September 1, 2021 16:05

adam2392 closed this Sep 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

sree0917 commented Dec 9, 2019 •

edited

bdpedigo commented Dec 12, 2019

sree0917 commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

sree0917 commented Dec 15, 2019

sree0917 commented Dec 16, 2019

Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

Conversation

sree0917 commented Dec 9, 2019 • edited

bdpedigo commented Dec 12, 2019

sree0917 commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

bdpedigo commented Dec 15, 2019

sree0917 commented Dec 15, 2019

sree0917 commented Dec 16, 2019

sree0917 commented Dec 9, 2019 •

edited