Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DocValues Support for Lucene Byte Sized Vector #953

Conversation

naveentatikonda
Copy link
Member

Description

The changes in this PR adds doc values support for fields with byte sized vectors, which helps users to perform script scoring and painless scripting search functionalities on these indices.

Issues Resolved

#812

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@naveentatikonda naveentatikonda added Features Introduces a new unit of functionality that satisfies a requirement v2.9.0 labels Jul 7, 2023
@naveentatikonda naveentatikonda self-assigned this Jul 7, 2023
@naveentatikonda naveentatikonda force-pushed the add_doc_values_support_byte_vector branch from f852135 to 1fcba06 Compare July 7, 2023 00:22
@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Merging #953 (0eb0adf) into feature/lucene_byte_vector (1386519) will decrease coverage by 0.01%.
The diff coverage is 92.30%.

@@                       Coverage Diff                        @@
##             feature/lucene_byte_vector     #953      +/-   ##
================================================================
- Coverage                         85.07%   85.07%   -0.01%     
- Complexity                         1129     1141      +12     
================================================================
  Files                               154      154              
  Lines                              4616     4682      +66     
  Branches                            412      422      +10     
================================================================
+ Hits                               3927     3983      +56     
- Misses                              499      507       +8     
- Partials                            190      192       +2     
Impacted Files Coverage Δ
...opensearch/knn/index/mapper/LuceneFieldMapper.java 71.15% <50.00%> (ø)
...nsearch/knn/index/mapper/KNNVectorFieldMapper.java 82.93% <79.16%> (-1.17%) ⬇️
...rg/opensearch/knn/index/query/KNNQueryFactory.java 88.13% <85.00%> (-2.35%) ⬇️
...opensearch/knn/index/KNNVectorDVLeafFieldData.java 75.00% <100.00%> (+2.27%) ⬆️
.../opensearch/knn/index/KNNVectorIndexFieldData.java 100.00% <100.00%> (ø)
...opensearch/knn/index/KNNVectorScriptDocValues.java 84.61% <100.00%> (-5.87%) ⬇️
.../java/org/opensearch/knn/index/VectorDataType.java 100.00% <100.00%> (ø)
...rg/opensearch/knn/index/codec/KNNCodecService.java 100.00% <100.00%> (ø)
...rch/knn/index/mapper/KNNVectorFieldMapperUtil.java 90.47% <100.00%> (+3.63%) ⬆️
...rg/opensearch/knn/index/query/KNNQueryBuilder.java 84.10% <100.00%> (+1.00%) ⬆️
... and 3 more

... and 1 file with indirect coverage changes

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add uni tests in scripting for BYTE type. Seems that in this PR we only add FLOAT to existing test cases.

@naveentatikonda
Copy link
Member Author

Can we also add uni tests in scripting for BYTE type. Seems that in this PR we only add FLOAT to existing test cases.

Will raise a separate PR to improve test coverage. Will add these unit tests in that PR

@naveentatikonda naveentatikonda force-pushed the add_doc_values_support_byte_vector branch 2 times, most recently from 081d157 to 4c24661 Compare July 7, 2023 21:42
@naveentatikonda
Copy link
Member Author

Build is failing due to this change in core opensearch-project/OpenSearch#8312

@navneet1v
Copy link
Collaborator

@naveentatikonda can you make build and test github action for this change.

@navneet1v
Copy link
Collaborator

@naveentatikonda Seems like there are some changes in core.

/home/runner/work/k-NN/k-NN/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java:21: error: constructor CodecService in class CodecService cannot be applied to given types;

        super(codecServiceConfig.getMapperService(), codecServiceConfig.getLogger());
        ^
  required: MapperService,IndexSettings,Logger
  found:    MapperService,Logger
  reason: actual and formal argument lists differ in length
/home/runner/work/k-NN/k-NN/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java:21: error: incompatible types: Logger cannot be converted to IndexSettings
        super(codecServiceConfig.getMapperService(), codecServiceConfig.getLogger());
                                                                                 ^
> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
2 errors

You might need to fix it.

@naveentatikonda
Copy link
Member Author

@naveentatikonda can you make build and test github action for this change.

@naveentatikonda can you make build and test github action for this change.

Yes, the build was succeeding for this PR before opensearch core made this change.

@naveentatikonda
Copy link
Member Author

@naveentatikonda Seems like there are some changes in core.

/home/runner/work/k-NN/k-NN/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java:21: error: constructor CodecService in class CodecService cannot be applied to given types;

        super(codecServiceConfig.getMapperService(), codecServiceConfig.getLogger());
        ^
  required: MapperService,IndexSettings,Logger
  found:    MapperService,Logger
  reason: actual and formal argument lists differ in length
/home/runner/work/k-NN/k-NN/src/main/java/org/opensearch/knn/index/codec/KNNCodecService.java:21: error: incompatible types: Logger cannot be converted to IndexSettings
        super(codecServiceConfig.getMapperService(), codecServiceConfig.getLogger());
                                                                                 ^
> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
2 errors

You might need to fix it.

Sure, let me check and raise a separate PR to fix this. I thought Junqiu was working on this to fix the build on Friday.

@navneet1v
Copy link
Collaborator

@naveentatikonda he was working on fixing the 2.x branch.


@Override
public float[] getValue(BytesRef binaryValue) {
float[] vector = new float[binaryValue.length];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a float[] for byte type? Shouldnt it be an int[]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using this method for scripting functions to retrieve the vector from docValues to calculate the score. SO, it doesn't make any difference if we return it as int[] or float[]. If we return it as int[] then again we need to do some method overloading and add methods for the spacetype functions in ScoringUtils to accept int[].

import java.util.Map;
import java.util.Optional;
import java.util.function.Supplier;

import static org.opensearch.knn.common.KNNConstants.DEFAULT_VECTOR_DATA_TYPE_FIELD;
import static org.opensearch.knn.common.KNNConstants.KNN_METHOD;
import static org.opensearch.knn.common.KNNConstants.VECTOR_DATA_TYPE_FIELD;
import static org.opensearch.knn.index.KNNSettings.KNN_INDEX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do changes in this file related to adding docvalues support for lucene byte sized vectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here we are trying to add some extra validation checks wrt knnIndex setting and knn engine. Also, to ingest doc values for byte vectors using script scoring

}

@SneakyThrows
private void ingestL2ByteTestData() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be L2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought of creating different methods for test data for other spacetypes later. So, I created it with L2 as I'm using it for spacetype L2.

@naveentatikonda naveentatikonda force-pushed the add_doc_values_support_byte_vector branch 2 times, most recently from 185a7a0 to 598ab43 Compare July 11, 2023 21:40
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
@naveentatikonda naveentatikonda force-pushed the add_doc_values_support_byte_vector branch from 598ab43 to 0eb0adf Compare July 11, 2023 21:54
@naveentatikonda naveentatikonda merged commit 5b7c44e into opensearch-project:feature/lucene_byte_vector Jul 11, 2023
12 of 36 checks passed
naveentatikonda added a commit to naveentatikonda/k-NN that referenced this pull request Jul 12, 2023
naveentatikonda added a commit to naveentatikonda/k-NN that referenced this pull request Jul 12, 2023
…t#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit bf04854)
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit bf04854)
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit bf04854)
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
naveentatikonda added a commit that referenced this pull request Jul 12, 2023
* Add Indexing Support for Lucene Byte Sized Vector (#937)

* Add Indexing Support for Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add tests for Indexing

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add Querying Support to Lucene Byte Sized Vector (#956)

* Add Querying Support to Lucene Byte Sized Vector

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add CHANGELOG

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Address Review Comments

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Add DocValues Support for Lucene Byte Sized Vector (#953)

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

* Update Release Notes

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>

---------

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
(cherry picked from commit bf04854)
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Introduces a new unit of functionality that satisfies a requirement v2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants