In [None]:
# Import python packages
import streamlit as st
import pandas as pd

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()


# Building a Multi-Index Search Service 

In this notebook, we are going to look at building a multi-index Cortex Search service. 

**Use Case: JIRA Ticket Similarity** --> We will be looking at a dataset consisting of Jira tickets in one system and comparing the similarity to tickets in another system. In this our JIRA tickets have multiple columns and tags that we want to look into searching over between systems. 

Items we want to search over include :
* Jira Ticket Title: High-level written title by user 
* Jira Ticket Description: Detailed description into what is happening in the JIRA ticket 
* Jira Ticket Component: Category or tag into what is failing in the ticket 

Beyond that, we have this additional information on our JIRA tickets:
* Created Date: When the ticket was created 
* Priority: If the ticket is high/low/medium priority for resolution 
* Status: The status of the ticket, is it open, being worked on, or closed 

In [None]:
-- Welcome to Snowflake Notebooks!
-- Try out a SQL cell to generate some data.
SELECT * FROM 
DEMODB.MULTIINDEXSEARCHJIRA.SYSTEM_A
LIMIT 10 ;

## Creating the Cortex Search Service 

Now that we have seen our ticket data, we are going to create the Cortex Search service over this ticket data. 

When working with multi-index cortex search, there are two key ways we think about defining available indexes. These are as **text indexes** or **vector indexes**. 

**Text Indexes** 

Text indexes are indexes used for fields where exact or fuzzy keyword matches are important (e.g., product codes, names, categories).

In our example, a few text indexes could be:
* Component --> keyword categories 
* Title --> looking for keywords or component names in the description 

**Vector Indexes** 

Vector indexes are indexes used for fields with longer text content where semantic understanding is valuable (e.g., descriptions, reviews).

In our example, a few vector indexes could be:
* Description --> need a semantic understanding into what the ticket is about 
* Title --> need for a semantic understanding of the quick user description 

**Other Attributes in Search** 

Similar to standard Cortex Search, we also have the ability to identify attributes the enable our end-users to tune their Cortex Search to their needs. This enables users to filter the results by these key attributes. 

In this example, a few attributes could be: 
* Created Data --> filter by recent tickets, or tickets during a specific error time 
* Priority --> filter to only find high priority tickets that are the same 



In [None]:
CREATE OR REPLACE CORTEX SEARCH SERVICE jira_tickets_multi_index
TEXT INDEXES (title, component)
VECTOR INDEXES (
    title (model = 'snowflake-arctic-embed-m-v1.5'),
    description (model = 'snowflake-arctic-embed-m-v1.5')
)
ATTRIBUTES (priority, status, created_at)
WAREHOUSE = 'COMPUTE_WH'
TARGET_LAG = '365 days'
AS 
    SELECT 
        ticket_id, 
        title,
        description,
        priority,
        status,
        created_at,
        component
    FROM SYSTEM_A

## Query Multi-Index on Examples 

In [None]:
-- A look at tickets to compare 
SELECT * 
FROM SYSTEM_B LIMIT 1

### Simple Query 

In this, we can see we can use Cortex Search similar to standard search not specifying index. 

In [None]:
SELECT
  value['title']::text as title, value['description']::text as description, value['component']::text as component
FROM TABLE(FLATTEN(PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
      'JIRA_TICKETS_MULTI_INDEX',
      '{
      "query": "tech repair shop",
        "columns": ["title", "component", "description"],
        "limit": 2
      }'
  ))['results']));



### Multi-index query

Issuing a multi-index query of our ticket title over the text and vector index `title`, and ticket description over the vector index `description`. In this query, equal weighting is given to the scoring of the text and vector components in the scoring_config.

In [None]:
SELECT
    value['title']::text as title, value['description']::text as description, value['component']::text as component
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'JIRA_TICKETS_MULTI_INDEX',
        '{
          "multi_index_query": {
            "title": [
              {"text": "Authentication failing in staging"}
            ],
            "description": [
              {"text": "Users see 401 errors when signing into the staging environment"}
            ]
          },
          "columns": ["title", "component", "description"],
          "limit": 2
        }'
    )
)['results']));

### Weight Modification 

Now we can look into the same query above, but applying a weight of 4 to the vectors scoring technique, which gives all scores on vector search queries a 4x weight relative to text retrieval and reranking scores in the final scoring process. 

In [None]:
SELECT
    value['title']::text as title, value['description']::text as description, value['component']::text as component
FROM TABLE(FLATTEN(PARSE_JSON(
    SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
        'JIRA_TICKETS_MULTI_INDEX',
        '{
          "multi_index_query": {
            "title": [
              {"text": "Authentication failing in staging"}
            ],
            "description": [
              {"text": "Users see 401 errors when signing into the staging environment"}
            ]
          },
          "scoring_config": {
            "weights": {
              "texts": 1,
              "vectors": 4,
              "reranker": 1
            }
          },
          "columns": ["title", "component", "description"],
          "limit": 2
        }'
    )
)['results']));

### Multi Index Per- Field Function Boosts 

**Multi-index query with per-field function boosts:** Issuing a query of “circuit” over all fields, applying a relative weight of text matches on the name field relative to the description field. You might use this type of boost to make a hit on the business name to “matter more” than a hit on the address (i.e., if users are expected to more commonly search by name).

In [None]:
SELECT
  value['title']::text       AS title,
  value['description']::text AS description,
  value['component']::text   AS component,
  value['_score']::float     AS score
FROM TABLE(FLATTEN(PARSE_JSON(
  SNOWFLAKE.CORTEX.SEARCH_PREVIEW(
    'JIRA_TICKETS_MULTI_INDEX',
    '{
      "multi_index_query": {
        "title": [
          {"text": "Authentication failing in staging"}
        ],
        "component": [
          {"text": "Data"}
        ],
        "description": [
          {"text": "Users see 401 errors when signing into the staging environment"}
        ]
      },
      "scoring_config": {
        "weights": {
          "texts": 1,
          "vectors": 4,
          "reranker": 1
        },
        "functions": {
          "text_boosts": [
            {"column": "title", "weight": 2},
            {"column": "component", "weight": 1}
          ],
          "vector_boosts": [
            {"column": "description", "weight": 2},
            {"column": "title", "weight": 1}
          ]
        }
      },
      "columns": ["title", "component", "description"],
      "limit": 2
    }'
  )
)['results']));

