Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DBSCAN results incorrect #80

Closed
cjnolet opened this issue Jan 11, 2019 · 8 comments
Closed

[BUG] DBSCAN results incorrect #80

cjnolet opened this issue Jan 11, 2019 · 8 comments
Assignees
Labels
1 - On Deck To be worked on next bug Something isn't working

Comments

@cjnolet
Copy link
Member

cjnolet commented Jan 11, 2019

@daxiongshu ran our DBSCAN & k-means implementations against [1] and found that our results do not match, even for datasets as small as size 2^10.

[1] https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html

@cjnolet cjnolet added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 11, 2019
@cjnolet
Copy link
Member Author

cjnolet commented Jan 11, 2019

branch-0.5 should be compared against 0.4 release.

@dantegd
Copy link
Member

dantegd commented Jan 11, 2019

what is [1]?

@cjnolet
Copy link
Member Author

cjnolet commented Jan 11, 2019

Updated original comment

@cjnolet
Copy link
Member Author

cjnolet commented Jan 11, 2019

I ran @daxionshu's notebook against branches 0.5, 0.4, 0.3. This means this has been broken since before the refactor.

I believe the sklearn toy datasets should be tested even on the C++ side. That way when results don't match it's very clear to see which layer bugs were introduced.

@cjnolet
Copy link
Member Author

cjnolet commented Jan 11, 2019

@teju85, have you gotten a chance to look at this or #63 yet? It looks like a fix for this is slated for 0.5. Referencing #83 to reproduce the problem.

@cjnolet cjnolet added 1 - On Deck To be worked on next and removed ? - Needs Triage Need team to review and classify labels Jan 11, 2019
@cjnolet cjnolet added this to Issue-Needs prioritizing in v0.5 Release via automation Jan 11, 2019
@dantegd dantegd moved this from Issue-Needs prioritizing to Issue-P0 in v0.5 Release Jan 11, 2019
@cjnolet cjnolet self-assigned this Jan 13, 2019
@teju85
Copy link
Member

teju85 commented Jan 14, 2019

@cjnolet which of these issues against dbscan needs to be prioritized? 54, 63 or 80?

@teju85
Copy link
Member

teju85 commented Jan 14, 2019

Also, is there a standalone python script that could repro this mismatch? (Sorry, if you have had it somewhere already!)

@cjnolet cjnolet moved this from Issue-P0 to Done in v0.5 Release Jan 24, 2019
@cjnolet
Copy link
Member Author

cjnolet commented Feb 9, 2019

I’m going to go ahead and close this for now since we have discussed how the subtle differences in eps affect the results.

@cjnolet cjnolet closed this as completed Feb 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - On Deck To be worked on next bug Something isn't working
Projects
No open projects
v0.5 Release
  
Done
Development

No branches or pull requests

3 participants