-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBSCAN isn't using PointSelectionPolicy
#1625
Comments
I would like to work on this. |
Hi there @Yugandhartripathi, you are more than welcome to. When you have a working PR, I'll review it and we can get it merged. |
Hi @rcurtin, @Yugandhartripathi, I read this issue. The specifications were so detailed that I wanted to fix and contribute too. As a result, I made a PR #1627. And I think the existing RandomPointSelection algorithm that uses boost::dynamic_bitmask is very efficient in finding unvisited points. So I wanted to keep this algorithm instead of using other's. Could you review this code?? |
@KimSangYeon-DGU thanks! I reviewed it. It seems like I should try and make more issues with detailed specifications like this in the future. 👍 |
Fixed DBSCAN isn't using PointSelectionPolicy issue #1625
With #1628 merged, this can be closed now. Thanks again @KimSangYeon-DGU! |
It was pointed out in IRC today that the DBSCAN class, which has a template parameter
PointSelectionPolicy
, isn't actually using that template parameter class. We should fix that. This would be a good first issue for someone looking to contribute. Here are the steps to fix the issue...Read and understand what DBSCAN is and how it works. Wikipedia or the original paper should suffice here, but in either case you should be familiar with DBSCAN.
Take a look at the code at the bottom of dbscan_impl.hpp, at line 193:
That happens after the range search is done, and basically we are building the clusters at this point by using a UnionFind structure. The original intention of the
PointSelctionPolicy
was that instead of selecting points linearly (i.e. index 0, then 1, then 2, like the for loop above does), that another strategy could be used. We should aim to replace that code with this:Take a look at
random_point_selection.hpp
. You'll note that its signature does not match what I just proposed above, so it will need to be adapted. Additionally, the class will need to internally hold the list of which points have been selected and which haven't, so that whenSelect()
is called it returns a random index that has not yet been selected.Implemented
ordered_point_selection.hpp
, which whenSelect(i, data)
is called just returnsi
---so this will imitate the functionality from before.Set the default
PointSelectionPolicy
toOrderedPointSelection
from the previous step, so that it still behaves the same as previous versions of mlpack.Modify
dbscan_main.cpp
to add an option likePARAM_FLAG("random_selection", ...)
that will control whether or notOrderedPointSelection
orRandomPointSelection
is used.Modify
HISTORY.md
to point out this change.Add a test for DBSCAN using
OrderedPointSelection
tosrc/mlpack/tests/dbscan_test.cpp
. Also make sure thatRandomPointSelection
is properly tested in that file.Add a test to ensure that the
random_selection
flag is being properly used tosrc/mlpack/tests/main_tests/dbscan_test.cpp
.Hopefully the clarity in what to do here is helpful. Like I said I think this would be a nice straightforward task for someone looking to get involved and contribute to the library.
The text was updated successfully, but these errors were encountered: