Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k-NN Faiss filtering documentation #4476

Merged
merged 12 commits into from
Jul 18, 2023
Merged

Add k-NN Faiss filtering documentation #4476

merged 12 commits into from
Jul 18, 2023

Conversation

kolchfa-aws
Copy link
Collaborator

@kolchfa-aws kolchfa-aws commented Jul 3, 2023

Description

Fixes #4350

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws self-assigned this Jul 3, 2023
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@hdhalter hdhalter added the release-notes PR: Include this PR in the automated release notes label Jul 13, 2023
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@@ -11,12 +11,24 @@ has_math: true

To refine k-NN results, you can filter a k-NN search using one of the following methods:

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned(if present). This approach is supported by the following engines:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded.

@@ -11,12 +11,24 @@ has_math: true

To refine k-NN results, you can filter a k-NN search using one of the following methods:

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Lucene engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)

- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
- [Efficient k-NN filtering](#efficient-k-nn-filtering): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. This approach is supported by the following search engines:
- Lucene search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.4 and later)
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Faiss search engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)
- Faiss engine with a Hierarchical Navigable Small World (HNSW) algorithm (k-NN plugin versions 2.9 or later)


- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the intersection of their result sets is taken.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator(should/must etc) provided in the query.


- [Lucene k-NN filter](#using-a-lucene-k-nn-filter): This approach applies filtering _during_ the k-NN search, as opposed to before or after the k-NN search, which ensures that `k` results are returned. You can only use this method with the Hierarchical Navigable Small World (HNSW) algorithm implemented by the Lucene search engine in k-NN plugin versions 2.4 and later.
- [Scoring script filter](#scoring-script-filter): This approach involves pre-filtering a document set and then running an exact k-NN search on the filtered subset. It does not scale for large filtered subsets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also add that, latencies can be high for this query.


Filter | When the filter is applied | Type of search | Supported engines and methods | Where to place the `filter` clause
:--- | :--- | :--- | :---
Efficient k-NN filtering | During search (a hybrid of pre- and post-filtering) | Approximate | - `lucene` (`hnsw`) <br> - `faiss` (`hnsw`, `ivf`) | Inside the k-NN query clause.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove ivf from here.

| 10M | 80 | 100 | Scoring script |Efficient k-NN filtering |
| 1M | 2.5 | 100 |Efficient k-NN filtering | Scoring script |
| 1M | 38 | 100 |Efficient k-NN filtering |Efficient k-NN filtering/scoring script |
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 1M | 80 | 100 | Boolean filter |Efficient k-NN filtering |
| 1M | 80 | 100 | Efficient k-NN filtering |Boolean filter |


A scoring script filter first filters the documents and then uses a brute-force exact k-NN search on the results. For example, the following query searches for hotels with a rating between 8 and 10, inclusive, that provide parking and then performs a k-NN search to return the 3 hotels that are closest to the specified `location`:
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can perform efficient k-NN filtering with the `lucene` or `faiss` search engines.
You can perform efficient k-NN filtering with the `lucene` or `faiss` engines.


**Step 3: Search your data with a filter**

Now you can create a k-NN search with filters. <!-- TODO: add details -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing which we wrote for lucene above is true for filters here too.

Note that there are multiple ways to construct a filter that returns hotels that provide parking, for example:

A term query clause in the should clause
A wildcard query clause in the should clause
A regexp query clause in the should clause
A must_not clause to eliminate hotels with parking set to false.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Collaborator

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minimal edits

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
_search-plugins/knn/filter-search-knn.md Show resolved Hide resolved
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sub-bullets here need to be introduced using a colon.

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

- [Boolean filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn) search and then applies a filter to the results. Because of post-filtering, it may return significantly fewer than `k` results for a restrictive filter.
- [Post filtering](#post-filtering): Because it is performed after the k-NN search, this approach may return significantly fewer than `k` results for a restrictive filter.
- [Boolean post filter](#boolean-filter-with-ann-search): This approach runs an [approximate nearest neighbor (ANN)]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) search and then applies a filter to the results. The two query parts are executed independently and then the results are combined based on the query operator (`should`, `must`, and so on) provided in the query.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Boolean post-filter"? "elements" instead of "parts"? "run" instead of "executed"?

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved

### Using a Faiss efficient filter

Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Consider an index that holds information about shirts for a clothing store. You want to find the top-rated shirts that are similar to the one you have but would like to restrict the results by shirt size.
Consider an index containing information about a particular brand of shirt. You want to find the top-rated shirts that are similar to one you already have but would like to restrict the results by shirt size.

@@ -466,4 +507,196 @@ POST /hotels-index/_search
}
}
```
{% include copy-curl.html %}

## Post filtering
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hyphenated elsewhere. Ensure consistency across docs.


## Post filtering

You can achieve post filtering with a Boolean filter or by providing the `post_filter` parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

}
```

### Post filter parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Post-filter?

_search-plugins/knn/filter-search-knn.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 4 commits July 18, 2023 10:25
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws merged commit 6c83dfd into main Jul 18, 2023
4 checks passed
@hdhalter hdhalter mentioned this pull request Sep 25, 2023
29 tasks
@prudhvigodithi prudhvigodithi added release-notes PR: Include this PR in the automated release notes and removed release-notes PR: Include this PR in the automated release notes labels Oct 3, 2023
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
* Add k-NN Faiss filtering documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Move the note

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add faiss and a filter table

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Refactor boolean filtering section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Clarified that Faiss works with hnsw only

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add more Faiss filtering information

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented one more editorial comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
vagimeli added a commit that referenced this pull request Dec 21, 2023
* Add k-NN Faiss filtering documentation

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Move the note

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add faiss and a filter table

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Refactor boolean filtering section

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Clarified that Faiss works with hnsw only

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add more Faiss filtering information

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/knn/filter-search-knn.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Implemented one more editorial comment

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
@Naarcha-AWS Naarcha-AWS deleted the knn-filter-update branch March 28, 2024 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes PR: Include this PR in the automated release notes v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Faiss Engine Efficient Filtering
6 participants