Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-325: Vectors for each king, one float to rule them all #326

Merged
merged 15 commits into from
Jun 20, 2024

Conversation

DiegoPino
Copy link
Member

@DiegoPino DiegoPino commented May 14, 2024

See #325

This stuff works now.

Requires a fixed entry in schema_extra_fields.xml

like

<!-- ML/vectors -->
<dynamicField name="knn576m_*" type="knn_vector_576" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn576s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn512m_*" type="knn_vector_512" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn512s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn1024m_*" type="knn_vector_1024" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn1024s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn3846m_*" type="knn_vector_384" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn384s_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn576m_X3b_und_*" type="knn_vector_576" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn576s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn512m_X3b_und_*" type="knn_vector_512" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn512s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn1024m_X3b_und_*" type="knn_vector_1024" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn1024s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />
<dynamicField name="knn3846m_X3b_und_*" type="knn_vector_384" stored="true" indexed="true" multiValued="false"/>
<dynamicField name="knn384s_X3b_und_*" type="pfloat" stored="true" indexed="true" multiValued="false" />

plus the provided types in schema_extra_types.xml

<!--
  Dense Vector Field of 384 dimensions suitable for Bert text feature extraction (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_384" class="solr.DenseVectorField" vectorDimension="384" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 512 dimensions suitable for Apple Vision ML Image FingerPrint (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_512" class="solr.DenseVectorField" vectorDimension="512" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 576 dimensions suitable for YOLOv8 feature extraction using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_576" class="solr.DenseVectorField" vectorDimension="576" similarityFunction="dot_product"/>
<!--
  Dense Vector Field of 1024 dimensions suitable for mobileNetV3 feature extraction (embeddings) using dot_product as comparison algorithm
  9.0.0
-->
<fieldType name="knn_vector_1024" class="solr.DenseVectorField" vectorDimension="1024" similarityFunction="dot_product"/>

Supporting code provided by esmero/strawberry_runners#92

@DiegoPino DiegoPino added Future ML Algorithms fed with human labor labels May 14, 2024
@DiegoPino DiegoPino added this to the 1.4.0 milestone May 14, 2024
@DiegoPino DiegoPino self-assigned this May 14, 2024
(and we do that now with our sbrannotations endpoints
…all 0s..

Why Search API. When i need my Vectors the same size i created them??

Only choice without patching Everything is to alter the Solarium Document last minute and restore all our 0.00000 values. @alliomeria archipelago is becoming complex. Let's discuss after Conferences about moving all ML into a different module maybe? Would be like months of rewriting.
(another issue with Normalization and L1 v/s L2)
@DiegoPino
Copy link
Member Author

@alliomeria this requires a few extra lines of code still. Works well but if I want to search for vectors and also want Filters I want/need to be sure I can also decide if the Filter applies BEFORE the vector (means I reduce the set, then I make the expensive KNN search) OR after (e.g if I want only returns if score 0.7>)

@DiegoPino
Copy link
Member Author

This explains this in a bit more depth
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
This is specially important when using facets. Do we want facets to be applied before the KNN (so we always get e.g 10 results) or After (so we do get less than 10 even if topK is 10)

@DiegoPino DiegoPino merged commit 302e933 into 1.4.0 Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Future ML Algorithms fed with human labor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant