Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparing the performance of different clustering algorithms on toy datasets on adding high dimensional gaussian noise #6

Closed
wants to merge 10 commits into from

Conversation

sree0917
Copy link

@sree0917 sree0917 commented Dec 9, 2019

Aim: Analyzing how the performance of different clustering algorithms for different datasets change on adding noise with different dimensions:

This demo is a Jupyter Notebook documentation describing the effect of the addition of different dimensions of noise on a dataset. Here different types of synthetic datasets are generated on which the experiment is performed. To these datasets Gaussian noise of different dimensions are added, and the performance of each clustering algorithm is measured after noise addition. This is repeated for noise with different variances.

Output: The plots that compare the effect of varying noise dimensions on different clustering algorithms for each of the datasets. In this set of subplots, the variance of the added noise changes along the column and the dataset changes along the row.

Link to the demo: https://nbviewer.jupyter.org/github/sree0917/scikit-learn/blob/master/clustering_comparison_pr.ipynb

@bdpedigo
Copy link

  • when you plot the data, remove ticks, tick labels, and have only one legend to the right of the entire figure (rather than 3). This will give you a lot more space. And can then make the titles bigger.
  • python should use snake_case (can look this up) as a convention for naming variables. For example addNoise should be add_noise. Many other variables like this should be changed
  • I think the last plot is the only ARI one that we need (so that's all I'm going to give feedback on)
  • can you add the datasets plot on top of the last ARI plot, and then remove the datasets alone plot?
  • is the variance of DBSCAN always 0? That seems weird to me, but I also don't know much about DBSCAN
  • not sure I agree with your claim about spectral clustering winning

Overall, nice plots and nice work! These are fairly minor comments.

@sree0917
Copy link
Author

Thanks for the feedback. I have made necessary changes.

@bdpedigo
Copy link

Unclear what file is the right one now? The notebook in the link at the top looks the same. Are you trying to merge two files?

@bdpedigo
Copy link

also this PR should be into the neurodatadesign fork

@bdpedigo
Copy link

seems like there are still plotting things that you didn't change, if I am looking at the right notebook

Comparing how the performance of different clustering algorithms change on adding high dimensional noise.
@sree0917
Copy link
Author

Hi! I have made the necessary changes and made a new PR into the NeuroDataDesign fork.
Link to the PR - NeuroDataDesign@0a71e4d

@sree0917
Copy link
Author

Hi! I have made PR the right way now.

Link - NeuroDataDesign#25

@adam2392 adam2392 deleted the branch neurodata:master September 1, 2021 16:05
@adam2392 adam2392 closed this Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants