Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a search pipeline on hot swap, enabling hybrid search #3867

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

bmquinn
Copy link
Contributor

@bmquinn bmquinn commented Mar 7, 2024

Summary

A new search pipeline is added when hot swapping that enables hybrid searching in OpenSearch (docs for reference)

Specific Changes in this PR

  • Adds Meadow.Search.Index.create_search_pipeline/4, which is now called during hot swaps.
  • Renames Meadow.Search.Index.create_vector_pipeline/4 to ``Meadow.Search.Index.create_ingest_pipeline/4`, in order to match the OpenSearch naming scheme.
  • Both pipelines are named "prefixed" work index plus "-pipeline", which is okay because they are used from different endpoints /_search/pipeline/[name] and /_ingest/pipeline/[name].

Version bump required by the PR

See Semantic Versioning 2.0.0 for help discerning which is required.

  • Patch
  • Minor
  • Major

Steps to Test

  • Run Meadow.Data.Indexer.reindex_all() to create the search pipeline
  • Make sure you have some works indexed
  • Get a deployed model id, easiest way is to use iex: Meadow.Search.Config.embedding_model_id() (will likely be a cold start)
  • Use es-proxy to start the OpenSearch dashboard and run a query:
GET /[PREFIXED WORK INDEX]/_search?search_pipeline=[PREFIXED WORK INDEX + "-pipeline"]
{
  "_source": {
    "exclude": [
      "embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text": {
              "query": "submarines and underwater vessels"
            }
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "whales",
              "model_id": "[EMBEDDING MODEL ID]",
              "k": 5
            }
          }
        }
      ]
    }
  }
}

Also please let developers know if there are any special instructions to test this in the development environment.

🚀 Deployment Notes

Note - if you check any of these boxes go to the (always open) main <- staging PR and add detailed notes and instructions to help out others who may end up deploying your changes to production

  • Backward compatible API changes
    • Database Schema changes
    • GraphQL API
    • Elasticsearch API
    • Ingest Sheet
    • CSV metadata export/update API
    • Shared Links export API
  • Backwards-incompatible API changes
    • Database Schema changes
    • GraphQL API
    • Elasticsearch API
    • Ingest Sheet
    • CSV metadata export/update API
    • Shared Links export API
  • Requires data migration
  • Requires database triggers disabled during deployment/migration
  • Requires reindex
  • Terraform changes
    • Adds/requires new or changed Terraform variables
  • Pipeline configuration changes (requires mix meadow.pipeline.setup run)
  • Requires new variable added to miscellany
  • Specific deployment synchronization instructions with other apps/API's
  • Other specific instructions/tasks

Tested/Verified

  • End users/stakeholders

@bmquinn bmquinn requested review from mbklein and kdid March 7, 2024 03:19
@mbklein mbklein merged commit 17acc2c into deploy/staging Mar 7, 2024
4 checks passed
@mbklein mbklein deleted the 4568-hybrid-search branch March 7, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants