Skip to content

Releases: jlmelville/rnndescent

rnndescent 0.1.6

15 May 03:14
Compare
Choose a tag to compare

rnndescent 0.1.6

  • This is a patch release to fix an issue that occurred with the recent release of a new version of 'dqrng' on CRAN. Unfortunately, version 0.1.5 didn't quite work out. A fix for incorrect behavior for some metrics is also included (see below).

Bug fixes and minor improvements

  • Building an index with the "cosine", "jaccard" or "hellinger" metrics could give incorrect results if maximally-dissimilar items were in the nearest neighbors. Thank you to Maciej Beręsewicz for the report (#14).

CRAN release 0.1.5

19 Apr 05:51
Compare
Choose a tag to compare

This is a minor release to change an internal API to support an upcoming release for dqrng. See daqana/dqrng#80. Thank you to Ralf Stubner for the report and a patch.

v0.1.4

18 Mar 14:43
Compare
Choose a tag to compare

rnndescent 0.1.4

New features

  • New parameter: for nnd_knn and rnnd_build: weight_by_degree. If set to TRUE, then the candidate list in nearest neighbor descent is weighted in favor of low-degree items, which should make for a more diverse local join. There is a minor increase in computation but also a minor increase in accuracy.

Bug fixes and minor improvements

  • rnnd_build generated an error when preparing the search graph for some metrics (notably cosine and jaccard).
  • Fix a factor of 2 error in the TS-SS metric. This does not affect the returned neighbors, just the distances. Thank you to reporter
    Henry Linder (#8).
  • New parameter for prepare_search_graph: use_alt_metric. This behaves like the existing use_alt_metric parameters in other functions and may speed up the index preparation step in e.g. rnnd_build.
  • New parameter for rnnd_build and prepare_search_graph: prune_reverse. If set to TRUE the reverse graph will be pruned using the
    pruning_degree_multiplier parameter before any diversification. This can help prevent an excessive amount of time being spent in the diversification step in the case where an item has a large number of neighbors (in the reverse graph this can be as large as the number of items in the dataset). Pruning of the merged graph still occurs, so this is an additional pruning step. This should have little effect on search results, but for backwards compatibility, the default is FALSE.

CRAN release 0.1.3

09 Dec 06:42
Compare
Choose a tag to compare

This is a patch release to fix some ASAN/UBSAN errors that occurred due to the sentinel values not being correctly cast to unsigned integral types when missing values were in the k-nearest neighbor graph (and also one case of insufficient input validation).

CRAN release 0.1.1

28 Nov 04:13
Compare
Choose a tag to compare

This is the first CRAN release. There are no significant changes compared to the last pre-release version, just minor documentation changes.

rnndescent 0.0.16

25 Nov 05:33
Compare
Choose a tag to compare
rnndescent 0.0.16 Pre-release
Pre-release

rnndescent 0.0.16

Breaking changes

  • rnnd_build now always prepares the search graph.
  • The rnnd_prepare function has been removed. The option to not prepare the search graph during index building only made sense if you were only interested in the k-nearest neighbor graph. Now that rnnd_knn exists for that purpose (see below), the logic of index building has been substantially simplified.
  • The nn_to_sparse function has been removed.
  • The merge_knn function has been removed, and merge_knnl has been renamed to merge_knn. If you were running e.g. merge_knn(nn1, nn2), you must now use merge_knn(list(nn1, nn2)). Also the parameter nn_graphs has been renamed graphs.

New features

  • New function: rnnd_knn. Behaves a lot like rnnd_build, but only returns the knn graph with no index built. The index can be very large in size for high dimensional or large datasets, so this function is useful if you only care about the knn graph and won't ever want to query new data.
  • New function: neighbor_overlap. Measures the overlap of two knn graphs via their shared indices. A similar function was used extensively in some vignettes so it may have sufficient utility to be useful to others.
  • New parameter for rnnd_query and graph_knn_query: max_search_fraction. This parameter controls the maximum number of nodes that can be searched during querying. If the number of nodes searched exceeds this fraction of the total number of nodes in the graph, the search will be terminated. This can be used in combination with epsilon to avoid excessive search times.

Bug fixes and minor improvements

  • The sparse spearmanr distance has been fixed.
  • During tree-building with n_threads = 0, progress/interrupt monitoring was not occurring.
  • You can provide a user-defined graph to the init parameter of rnnd_query.
  • rnnd_query: if verbose = TRUE, a summary of the min, max and average number of distance queries will be logged. This can help tune epsilon and max_search_fraction.

rnndescent 0.0.15

19 Nov 18:26
Compare
Choose a tag to compare
rnndescent 0.0.15 Pre-release
Pre-release

rnndescent 0.0.15

Breaking Changes

  • Standalone distance functions have been removed. They hadn't expanded to match all the distances available in the nearest neighbor functions, nor was sparse support added. Doing so would increase the size of this package's API even further. They may show up in another package.
  • The local_scale_nn has been removed, for similar reasons to the removal of the standalone distance functions. It remains in the localscale branch of the github repo.
  • The search graph returned from prepare_search_graph is now transposed. This prevents having to repeatedly transpose inside every call to graph_knn_query if multiple queries are being made. You will need to either regenerate any saved search graphs or transpose them with Matrix::t(search_graph).

New features

  • New functions: rnnd_build, rnnd_query and rnnd_prepare. These functions streamline the process of building a k-nearest neighbor graph, preparing a search graph and querying it.

rnndescent 0.0.14

14 Nov 06:10
Compare
Choose a tag to compare
rnndescent 0.0.14 Pre-release
Pre-release

rnndescent 0.0.14

Breaking Changes

  • The bhamming metric no longer exists as a specialized metric. Instead, if you pass a logical matrix to data, reference or query parameter (depending on the function) and specify metric = "hamming" you will automatically get the binary-specific version of the hamming metric.

  • The hamming and bhamming metrics are now normalized with respect to the number of features, to be consistent with the other binary-style metrics (and PyNNDescent). If you need the old distances, multiply the distance matrix by the number of columns, e.g. do something like:

    res <- brute_force_knn(X, metric = "hamming")
    res$dist <- res$dist * ncol(X)
  • The metric l2sqr has been renamed sqeuclidean to be consistent with PyNNDescent.

New features

  • Metrics? We got 'em! The metric parameter now accepts a much larger number of metrics. See the rdoc for the full list of supported metrics. Currently, most of the metrics from PyNNDescent which don't require extra parameters are supported. The number of specialized binary metrics has also been expanded.
  • New parameter for rpf_knn and rpf_build: max_tree_depth this controls the depth of the tree and was set to 100 internally. This default has been doubled to 200 and can now be user-controlled. If verbose = TRUE and the largest leaf in the forest exceeds the leaf_size parameter, a message warning you about this will be logged and indicates that the maximum tree depth has been exceeded. Increasing max_tree_depth may not be the answer: it's more likely there is something unusual about the distribution of the distances in your dataset and a random initialization might be a better use of your time.

rnndescent 0.0.13

07 Nov 07:34
Compare
Choose a tag to compare
rnndescent 0.0.13 Pre-release
Pre-release

rnndescent 0.0.13

New features

  • Sparse data is now supported. Pass a dgCMatrix to the data, reference or query parameters where you would usually use a dense matrix or data frame. cosine, euclidean, manhattan, hamming and correlation are all available, but alternative versions in the dense case, e.g. cosine-preprocess or the binary-specific bhamming for dense data is not.
  • A new init option for graph_knn_query: you can now pass an RP forest and initialize with that, e.g. from rpf_build, or by setting ret_forest = TRUE on nnd_knn or rpf_knn. You may want to cut down the size of the forest used for initialization with rpf_filter first, though (a single tree may be enough). This will also use the metric data in the forest, so setting metric (or use_alt_metric) in the function itself will be ignored.

Bug fixes and minor improvements

  • If the knn graph you pass to prepare_search_graph or to graph_knn_query contains missing data, this will no longer cause an error (it still might not be the best idea though).

rnndescent 0.0.12

30 Oct 22:23
Compare
Choose a tag to compare
rnndescent 0.0.12 Pre-release
Pre-release

New features

  • New function: rpf_knn. Calculates the approximate k-nearest neighbors using a random partition forest.
  • New function: rpf_build. Builds a random partition forest.
  • New function: rpf_knn_query. Queries a random partition forest (built with rpf_build to find the approximate nearest neighbors for the query points.
  • New function: rpf_filter. Retains only the best "scoring" trees in a forest, where each tree is scored based on how well it reproduces a given knn.
  • New initialization method for nnd_knn: init = "tree". Uses the RP Forest initialization method.
  • New parameter for nnd_knn: ret_forest. Returns the search forest used if init = "tree" so it can be used for future searching or filtering.
  • New parameter for nnd_knn: init_opts. Options that can be passed to the RP forest initialization (same as in rpf_knn).