New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allknn is several times slower in 1.0.9 than 1.0.8 #347

Closed
rcurtin opened this Issue Dec 29, 2014 · 1 comment

Comments

Projects
None yet
1 participant
@rcurtin
Member

rcurtin commented Dec 29, 2014

Reported by rcurtin on 25 May 44638310 21:06 UTC
I have isolated the problem to the traversal info commit, r16226. Pre-r16226:

:[ ~/work/mlpack-trunk/build ]: 5
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is 54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54 x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 4084108 node combinations were visited.
[INFO ] 8206310 node combinations were scored.
[INFO ] 33333021 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ] 
[INFO ] Execution parameters:
[INFO ]   cover_tree: false
[INFO ]   distances_file: d.csv
[INFO ]   help: false
[INFO ]   info: ""
[INFO ]   k: 3
[INFO ]   leaf_size: 20
[INFO ]   naive: false
[INFO ]   neighbors_file: n.csv
[INFO ]   query_file: ""
[INFO ]   random_basis: false
[INFO ]   reference_file: /home/ryan/datasets/covertype.csv
[INFO ]   seed: 0
[INFO ]   single_mode: false
[INFO ]   verbose: true
[INFO ]   version: false
[INFO ] 
[INFO ] Program timers:
[INFO ]   computing_neighbors: 9.526934s
[INFO ]   loading_data: 9.164315s
[INFO ]   saving_data: 2.093499s
[INFO ]   total_time: 30.320291s
[INFO ]   tree_building: 9.419669s

r16226:

:[ ~/work/mlpack-trunk/build ]: 3
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is 54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54 x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 111957318 node combinations were scored.
[INFO ] 32719779 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ] 
[INFO ] Execution parameters:
[INFO ]   cover_tree: false
[INFO ]   distances_file: d.csv
[INFO ]   help: false
[INFO ]   info: ""
[INFO ]   k: 3
[INFO ]   leaf_size: 20
[INFO ]   naive: false
[INFO ]   neighbors_file: n.csv
[INFO ]   query_file: ""
[INFO ]   random_basis: false
[INFO ]   reference_file: /home/ryan/datasets/covertype.csv
[INFO ]   seed: 0
[INFO ]   single_mode: false
[INFO ]   verbose: true
[INFO ]   version: false
[INFO ] 
[INFO ] Program timers:
[INFO ]   computing_neighbors: 41.315907s
[INFO ]   loading_data: 9.201931s
[INFO ]   saving_data: 2.106239s
[INFO ]   total_time: 62.342061s (1 mins, 2.3secs)
[INFO ]   tree_building: 9.596633s

This is a huge issue given mlpack's focus on dual-tree algorithms and the fact that I advertise mlpack to everyone as "fast nearest neighbor search". Upon fixing this ticket the fix should be incorporated into 1.0.9 to release 1.0.10 (also other easy backports should be done).

@rcurtin rcurtin self-assigned this Dec 29, 2014

@rcurtin rcurtin closed this Dec 29, 2014

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Dec 30, 2014

Member

Commented by rcurtin on 12 Aug 44651485 12:54 UTC
In r17113 I rewrote the CalculateBound() function and allknn is as fast as before (for cover trees, it seems to be significantly faster in some cases).

Member

rcurtin commented Dec 30, 2014

Commented by rcurtin on 12 Aug 44651485 12:54 UTC
In r17113 I rewrote the CalculateBound() function and allknn is as fast as before (for cover trees, it seems to be significantly faster in some cases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment