allknn is several times slower in 1.0.9 than 1.0.8 #347

Closed
rcurtin opened this Issue Dec 29, 2014 · 1 comment

Projects

None yet

1 participant

@rcurtin
Member
rcurtin commented Dec 29, 2014

Reported by rcurtin on 25 May 44638310 21:06 UTC
I have isolated the problem to the traversal info commit, r16226. Pre-r16226:

:[ ~/work/mlpack-trunk/build ]: 5
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is 54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54 x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 4084108 node combinations were visited.
[INFO ] 8206310 node combinations were scored.
[INFO ] 33333021 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ] 
[INFO ] Execution parameters:
[INFO ]   cover_tree: false
[INFO ]   distances_file: d.csv
[INFO ]   help: false
[INFO ]   info: ""
[INFO ]   k: 3
[INFO ]   leaf_size: 20
[INFO ]   naive: false
[INFO ]   neighbors_file: n.csv
[INFO ]   query_file: ""
[INFO ]   random_basis: false
[INFO ]   reference_file: /home/ryan/datasets/covertype.csv
[INFO ]   seed: 0
[INFO ]   single_mode: false
[INFO ]   verbose: true
[INFO ]   version: false
[INFO ] 
[INFO ] Program timers:
[INFO ]   computing_neighbors: 9.526934s
[INFO ]   loading_data: 9.164315s
[INFO ]   saving_data: 2.093499s
[INFO ]   total_time: 30.320291s
[INFO ]   tree_building: 9.419669s

r16226:

:[ ~/work/mlpack-trunk/build ]: 3
:[ ryan @ trevelyan ]: $ bin/allknn -r ~/datasets/covertype.csv -k 3 -d d.csv -n n.csv -v
[INFO ] Loading '/home/ryan/datasets/covertype.csv' as CSV data.  Size is 54 x 581012.
[INFO ] Loaded reference data from '/home/ryan/datasets/covertype.csv' (54 x 581012).
[INFO ] Building reference tree...
[INFO ] Trees built.
[INFO ] Computing 3 nearest neighbors...
[INFO ] 111957318 node combinations were scored.
[INFO ] 32719779 base cases were calculated.
[INFO ] Neighbors computed.
[INFO ] Re-mapping indices...
[INFO ] Saving CSV data to 'd.csv'.
[INFO ] Saving CSV data to 'n.csv'.
[INFO ] 
[INFO ] Execution parameters:
[INFO ]   cover_tree: false
[INFO ]   distances_file: d.csv
[INFO ]   help: false
[INFO ]   info: ""
[INFO ]   k: 3
[INFO ]   leaf_size: 20
[INFO ]   naive: false
[INFO ]   neighbors_file: n.csv
[INFO ]   query_file: ""
[INFO ]   random_basis: false
[INFO ]   reference_file: /home/ryan/datasets/covertype.csv
[INFO ]   seed: 0
[INFO ]   single_mode: false
[INFO ]   verbose: true
[INFO ]   version: false
[INFO ] 
[INFO ] Program timers:
[INFO ]   computing_neighbors: 41.315907s
[INFO ]   loading_data: 9.201931s
[INFO ]   saving_data: 2.106239s
[INFO ]   total_time: 62.342061s (1 mins, 2.3secs)
[INFO ]   tree_building: 9.596633s

This is a huge issue given mlpack's focus on dual-tree algorithms and the fact that I advertise mlpack to everyone as "fast nearest neighbor search". Upon fixing this ticket the fix should be incorporated into 1.0.9 to release 1.0.10 (also other easy backports should be done).

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin rcurtin closed this Dec 29, 2014
@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 12 Aug 44651485 12:54 UTC
In r17113 I rewrote the CalculateBound() function and allknn is as fast as before (for cover trees, it seems to be significantly faster in some cases).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment