-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify dense and sparse import in FIL #4328
Unify dense and sparse import in FIL #4328
Conversation
rerun tests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the overall direction where it is going. This pull request is in good shape, but there are still some comments.
Most of the comments are technical, but here are the points I'd like to highlight:
- It would be better to combine various compile-time node properties into
node_traits<node_t>
. - I like the introduction of a separate conversion type
tl2fil_t
to hold all conversion-related state. I think functions liketree2fil
could be methods of this type as well.
cpp/src/fil/common.cuh
Outdated
@@ -114,6 +114,16 @@ struct sparse_storage : storage_base { | |||
typedef sparse_storage<sparse_node16> sparse_storage16; | |||
typedef sparse_storage<sparse_node8> sparse_storage8; | |||
|
|||
template <typename fil_node_t> | |||
struct node2storage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several types like this in this pull request. It would be better to unify them into a single node_traits
type. E.g.
// for sparse nodes
template <typename node_t>
struct node_traits {
using storage = sparse_storage<node_t>;
using forest = sparse_forest<node_t>;
static const bool IS_DENSE = false;
};
// for dense nodes
template<>
struct node_traits<dense_node> {
using forest = dense_forest;
using storage = dense_storage;
static const bool IS_DENSE = true;
};
It is also possible to include the check()
method that checks whether the number of nodes fits into sparse_node8
, but that's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely! I assume putting them all into the dense_node
, sparse_node*
themselves will not be as neat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though possible, it won't be as neat, as it would mix the node data structure with unrelated information.
A separate node_traits
data type looks better.
cpp/src/fil/fil.cu
Outdated
check_params(params, false); | ||
sparse_forest<fil_node_t>* f = new sparse_forest<fil_node_t>(h); | ||
check_params(params, is_dense<fil_node_t>()); | ||
auto f = new typename node2forest<fil_node_t>::T(h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to split this line:
using forest_type = typename node_traits<fil_node_t>::forest;
forest f = new forest_type(h);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrote forest_type* f = new forest_type(h);
because you chose to make void init(...)
non-virtual and not part of struct forest
base class. Did you intend for me to make it virtual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to make init()
virtual if it would simplify things.
Co-authored-by: Andy Adinets <adinetz@gmail.com>
…nse-sparse-import
…to unify-dense-sparse-import
rerun tests: Test Result (2 failures / +2)
cuml.test.test_nearest_neighbors.test_nearest_neighbors_rbc[10000-4-euclidean]
cuml.test.test_nearest_neighbors.test_nearest_neighbors_rbc[10000-25-euclidean]
--
distance = 'euclidean', n_neighbors = 4, nrows = 10000
@pytest.mark.parametrize('distance', ["euclidean", "haversine"])
@pytest.mark.parametrize('n_neighbors', [4, 25])
@pytest.mark.parametrize('nrows', [unit_param(10000), stress_param(70000)])
def test_nearest_neighbors_rbc(distance, n_neighbors, nrows):
X, y = make_blobs(n_samples=nrows,
centers=25,
shuffle=True,
n_features=2,
cluster_std=3.0,
random_state=42)
knn_cu = cuKNN(metric=distance, algorithm="rbc")
knn_cu.fit(X)
query_rows = int(nrows/2)
rbc_d, rbc_i = knn_cu.kneighbors(X[:query_rows, :],
n_neighbors=n_neighbors)
if distance == 'euclidean':
# Need to use unexpanded euclidean distance
pw_dists = cuPW(X, metric="l2")
brute_i = cp.argsort(pw_dists, axis=1)[:query_rows, :n_neighbors]
brute_d = cp.sort(pw_dists, axis=1)[:query_rows, :n_neighbors]
else:
knn_cu_brute = cuKNN(metric=distance, algorithm="brute")
knn_cu_brute.fit(X)
brute_d, brute_i = knn_cu_brute.kneighbors(
X[:query_rows, :], n_neighbors=n_neighbors)
rbc_i = cp.sort(rbc_i, axis=1)
brute_i = cp.sort(brute_i, axis=1)
# TODO: These are failing with 1 or 2 mismatched elements
# for very small values of k:
# https://github.com/rapidsai/cuml/issues/4262
> assert len(brute_d[brute_d != rbc_d]) <= 1
E assert 139 <= 1
E + where 139 = len(array([0.26635543, 0.14040843, 0.15865709, 0.224747 , 0.22284727,\n 0.31209642, 0.196584 , 0.4371513 , 0.459564... 0.09977522, 0.20649612, 0.20396307, 0.18100378,\n 0.36238492, 0.41610897, 0.14166015, 0.17231807], dtype=float32))
cuml/test/test_nearest_neighbors.py:556: AssertionError |
same test error |
rerun tests |
…nse-sparse-import
C++ test failed: |
rerun tests |
lots of C++ tests fail with |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful work! Just a couple stray C types and one spot where we can avoid a raw loop. Other than that, it looks perfect.
@gpucibot merge |
Codecov Report
@@ Coverage Diff @@
## branch-22.02 #4328 +/- ##
===============================================
Coverage ? 85.72%
===============================================
Files ? 236
Lines ? 19326
Branches ? 0
===============================================
Hits ? 16568
Misses ? 2758
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
@gpucibot merge |
The pull request includes changes from #4328. To view only changes pertinent to unifying tests, see https://github.com/levsnv/cuml/pull/3/files?diff=unified&w=1 Authors: - Levs Dolgovs (https://github.com/levsnv) Approvers: - Andy Adinets (https://github.com/canonizer) - William Hicks (https://github.com/wphicks) URL: #4417
Authors: - Levs Dolgovs (https://github.com/levsnv) Approvers: - Andy Adinets (https://github.com/canonizer) - William Hicks (https://github.com/wphicks) URL: rapidsai#4328
The pull request includes changes from rapidsai#4328. To view only changes pertinent to unifying tests, see https://github.com/levsnv/cuml/pull/3/files?diff=unified&w=1 Authors: - Levs Dolgovs (https://github.com/levsnv) Approvers: - Andy Adinets (https://github.com/canonizer) - William Hicks (https://github.com/wphicks) URL: rapidsai#4417
No description provided.