Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back sparse_vector mapping #98996

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Aug 29, 2023

Adds back sparse_vector field type, as a copy of rank_features.

The main goal is to have the sparse_vector field type available so we can switch ELSER queries to use the new type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied and pasted from RankFeaturesFieldMapper with some checks on index version.

public void parse(DocumentParserContext context) throws IOException {

// No support for indexing / searching 7.x sparse_vector field types
if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_0_0)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - checking that we're not operating on indices created before 8.0. Return the same errors that the previous implementation had.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from RankFeaturesFieldMapperTests, with some additions

return Strings.concatStringArrays(super.getParseMinimalWarnings(indexVersion), additionalWarnings);
}

public void testSparseVectorWith7xIndex() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added from the previous version of the test. The main idea is to allow indices to exist with the previous sparse_vector implementation (<8.0), but no new docs can be added to it.

@@ -441,6 +441,10 @@ protected String[] getParseMinimalWarnings() {
return Strings.EMPTY_ARRAY;
}

protected String[] getParseMinimalWarnings(IndexVersion indexVersion) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this method so we can specify warnings that will be included when using mappings that go back to specific index versions.


public Builder(String name) {
super(name);
}

@Override
protected Parameter<?>[] getParameters() {
return new Parameter<?>[] { meta };
return new Parameter<?>[] { positiveScoreImpact, meta };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove positiveScoreImpact we can simplify this feature.

@@ -0,0 +1,21 @@
setup:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a simple set of tests for exercising the mapped type & queries and such. No need to have a single file only for mapping tests.

This file should test:

  • Creating the mapping (in the correct versions)
  • Ensure creation fails in older 8.x versions
  • That queries work
  • That indexing works

@@ -0,0 +1,143 @@
---
"Indexing and searching sparse vectors":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test that checks indexing and performs a simulated text_expansion query.


public void testBoostNotAllowed() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated boost checking in two, for deprecated warnings and boosts not allowed.

@@ -521,6 +527,10 @@ public final void testDeprecatedBoost() throws IOException {
assertParseMinimalWarnings();
}

protected IndexVersion boostNotAllowedIndexVersion() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the problem that checking for boosts not allowed caused sparse_vector mappings to error because sparse_vector is not allowed for 8.0.0.

I've created an overrridable method that returns the index version to test boosting for.

I'm happy to hear about other ideas for solving this in a simpler way 👍

@@ -103,8 +103,16 @@ public class TextExpansionQueryIT extends PyTorchModelRestTestCase {
RAW_MODEL_SIZE = Base64.getDecoder().decode(BASE_64_ENCODED_MODEL).length;
}

public void testRankFeaturesTextExpansionQuery() throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've duped the tests on text_expansion queries so we check they can be used both with rank_features and sparse_vector field types.

I could as well have created separate classes for this, LMK if you can think of better ways to handle this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not do this. Thank you for taking initiative here. But, all the ML related changes should be done after sparse_vector is merged. It will be good to separate the changes to make the change log cleaner and reviews easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - I'll revert this change and the text_expansion_search_sparse_vector.yml test as well.

@@ -0,0 +1,109 @@
# This test uses the simple model defined in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a separate YAML test that performs the same checks for text_expansion queries for sparse_vector field type

@carlosdelest
Copy link
Member Author

@elasticsearchmachine run elasticsearch-ci/part-1

@carlosdelest carlosdelest marked this pull request as ready for review August 31, 2023 12:14
@carlosdelest carlosdelest requested a review from a team August 31, 2023 12:14
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 31, 2023
@benwtrent benwtrent added :Search/Mapping Index mappings, including merging and defining field types >enhancement test-full-bwc Trigger full BWC version matrix tests and removed needs:triage Requires assignment of a team area label labels Aug 31, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Aug 31, 2023
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Carlos!

docs/changelog/98996.yaml Outdated Show resolved Hide resolved
reason: "sparse_vector field type supported in 7.x"
- do:
allowed_warnings:
- "The [sparse_vector] field type is deprecated and will be removed in 8.0."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep the 8.0 warning here, since the test is 7.x only?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8.0.0 - indicates to me that this test only runs in 7.x as well. I am not sure why both warnings would be returned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have received both warnings on BWC tests, depending on the BWC test being performed:

  • All nodes under test are on 7.x
  • A mixed cluster test like this one. I understand that a node could have been updated to 8.x as part of this test, and thus return this warning.

public static final String CONTENT_TYPE = "sparse_vector";

static final String ERROR_MESSAGE_7X = "[sparse_vector] field type in old 7.x indices is allowed to "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky question: Should we use the CONTENT_TYPE constant in error messages instead of hardcoding sparse_vector?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied / pasted from the previous sparse_vector error messages. As content type constant is unlikely to change, I personally think this is more readable as well (and in line with how error messages are created as well)

public DocValueFormat docValueFormat(String format, ZoneId timeZone) {
throw new UnsupportedOperationException(ERROR_MESSAGE_7X);
public Query existsQuery(SearchExecutionContext context) {
throw new IllegalArgumentException("[sparse_vector] fields do not support [exists] queries");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we decide to move from UnsupportedOperationException to IllegalArgumentException? Both are fine, just curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really a change in the exception, as the method is different. Keep in mind that I copied / pasted RankFeaturesFieldMapper, so there are a lot of changes and the diff is misleading.

);
@Override
protected boolean allowsNullValues() {
return false; // TODO should this allow null values?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Remove TODO before merging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied / pasted from RankFeaturesFieldMapper, so I think it makes sense for re-evaluating in the future.

Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
@carlosdelest
Copy link
Member Author

@elasticsearchmachine update branch

static final String ERROR_MESSAGE_8X = "The [sparse_vector] field type is not supported from 8.0 to 8.10 versions.";
static final IndexVersion PREVIOUS_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_0_0;

static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have here V_8_500_001?

// No support for indexing / searching 7.x sparse_vector field types
if (context.indexSettings().getIndexVersionCreated().before(PREVIOUS_SPARSE_VECTOR_INDEX_VERSION)) {
throw new UnsupportedOperationException(ERROR_MESSAGE_7X);
} else if (context.indexSettings().getIndexVersionCreated().before(IndexVersion.V_8_11_0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here instead of 8_11 we should also put a new V_8_500_001 version

if (context.doc().getByKey(key) != null) {
throw new IllegalArgumentException(
"[sparse_vector] fields do not support indexing multiple values for the same "
+ "rank feature ["
Copy link
Contributor

@mayya-sharipova mayya-sharipova Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "rank feature", I think we can just say "feature"


static final IndexVersion NEW_SPARSE_VECTOR_INDEX_VERSION = IndexVersion.V_8_500_000;

private static SparseVectorFieldType ft(FieldMapper in) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it is never used. Should we remove it?

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlosdelest Thanks Carlos, I left minor comments, but the change overall LGTM!

@carlosdelest carlosdelest added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 5, 2023
@carlosdelest
Copy link
Member Author

@elasticsearchmachine update branch

@carlosdelest
Copy link
Member Author

@elasticsearchmachine run elasticsearch-ci/full-bwc

@elasticsearchmachine elasticsearchmachine merged commit 6bbf6c2 into elastic:main Sep 6, 2023
12 checks passed
@carlosdelest carlosdelest deleted the carlosdelest/sparse_vector_type_comeback branch September 6, 2023 09:27
@saikatsarkar056
Copy link
Contributor

@carlosdelest

This PR has been merged. However, I would like to ask some questions to improve my understanding of this work. 😄

  • Will sparse_vector be searchable in the new version of deployment (8.11.0)?
  • Can sparse_vector contain negative values?
  • When are we planning to deprecate rank_features?

@benwtrent
Copy link
Member

@saikatsarkar056

Will sparse_vector be searchable in the new version of deployment (8.11.0)?

Yes

Can sparse_vector contain negative values?

No

When are we planning to deprecate rank_features?

No

alex-spies pushed a commit to dreamquster/elasticsearch that referenced this pull request Sep 8, 2023
Adds back `sparse_vector` field type, as a copy of `rank_features`. 

The main goal is to have the `sparse_vector` field type available so we
can switch ELSER queries to use the new type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team test-full-bwc Trigger full BWC version matrix tests v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants