# Experiment with searches

## Table of contents

<ol style="list-style: none; margin: 20px 0px 0px 0px; padding: 0px">
<li style="margin: 0px 0px 3px 0px;"><b>Step 1:</b> Authenticate with Watson Discovery</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 2:</b> Run some Discovery query language searches</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 3:</b> Format results</li>
</ol>

## Step 1: Authenticate with Watson Discovery

### 1.1 Create a service instance

Create an instance of the IBM Watson Discovery service.  

See: [IBM Watson Discovery in the IBM Cloud catalog](https://cloud.ibm.com/catalog/services/watson-discovery)


### 1.2 Get the API key and URL for your service instance

From the "manage" page of your Discovery service instance in IBM Cloud, copy the API key and URL into the cell below.

In [None]:
g_discovery_apikey = ""
g_discovery_url = ""

### 1.3 Install `ibm_watson` library

See: [IBM Watson Discovery v2 API](https://cloud.ibm.com/apidocs/discovery-data?code=python)

In [None]:
!pip install ibm_watson

### 1.4 Authenticate

See: [Discovery authentication for IBM Cloud](https://cloud.ibm.com/apidocs/discovery-data?code=python#authentication-cloud)

In [None]:
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator( g_discovery_apikey )
g_discovery = DiscoveryV2( version= "2020-08-30", authenticator=authenticator )

g_discovery.set_service_url( g_discovery_url )

## Step 2: Run some Discovery query language searches

### 2.1 Get Discovery project ID

In the [Upload documents to Discovery](https://github.com/spackows/MURAL-API-Samples/blob/main/notebooks/Discovery_02-Upload-documents-to-Discovery.ipynb) notebook, data was uploaded to Watson Discovery for search.

The call to `createDiscoveryProject` returned a project ID.  Paste that project ID in the cell below.

In [None]:
g_discovery_proj_id =""

### 2.2 Basic, empty search

This returns all documents.

See: [Discovery query](https://cloud.ibm.com/apidocs/discovery-data?code=python#query)

In [None]:
import json

response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query=""
                            ).get_result()

print( json.dumps( response, indent=2 ) )

### 2.3 Empty search with filter

In this case, filter results so only murals in the room named "Sarah's room" are included in results.

See:
- [Discovery query language](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-query-dql-overview)
- [Filters](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-query-parameters#filter)
- [Query operators](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-query-operators)

In [None]:
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="",
                              filter="room.name::Sarah's room"
                            ).get_result()

print( json.dumps( response, indent=2 ) )

### 2.4 Search with a non-empty query

Searching for a simple string `bear zebra` returns the document for the mural called "Zoo animals".

In [None]:
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="bear zebra"
                            ).get_result()

print( json.dumps( response, indent=2 ) )

### 2.5 Specify to include passages of emphasized text in results

Use the "passages" query parameter to include emphasized text in query results.  

Passages aren't strictly necessary, but it's useful to see the actual text that caused Discovery to select a given document as a match.

See: [Passages](https://cloud.ibm.com/docs/discovery-data?topic=discovery-data-query-parameters#passages)

In [None]:
# See passages of matches in sticky notes
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="bear zebra",
                              passages={ "fields" : [ "sticky_arr.text" ] }
                            ).get_result()

print( json.dumps( response["results"][0]["document_passages"], indent=2 ) )

In [None]:
# See passages of matches in text widgets
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="bear zebra",
                              passages={ "fields" : [ "text_arr.text" ] }
                            ).get_result()

print( json.dumps( response["results"][0]["document_passages"], indent=2 ) )

### 2.6 Search in specific fields

Searching for `Turkey bear` returns two murals: **Farm animals** and **Zoo animals**.

But using Discovery query language to search in only the `sticky_arr.text` field returns just the "Polar bear" match in the **Zoo animals** mural, because "Polar bear" is on a sticky note and "Turkey" is on a shape widget.

In [None]:
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="Turkey bear"
                            ).get_result()

print( json.dumps( response, indent=2 ) )

In [None]:
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query="sticky_arr.text:Turkey bear"
                            ).get_result()

print( json.dumps( response, indent=2 ) )

## Step 3: Format results

The raw, JSON results from Discovery contain a lot of information.  The following function returns a subset of the information.

In [None]:
import re

def formatResults( query_results ):
    formatted_results = []
    for result in query_results:
        room_name = result["room"]["name"] if ( ( "room" in result ) and ( "name" in result["room"] ) ) else ""
        workspace_name = result["workspace_name"] if ( "workspace_name" in result ) else ""
        mural_title = result["title"] if ( "title" in result ) else ""
        mural_link = result["link"] if ( "link" in result ) else ""
        mural_thumbnail = result["thumbnail"] if ( "thumbnail" in result ) else ""
        created = result["created"] if ( "created" in result ) else -1
        creator = result["creator"] if ( "creator" in result ) else ""
        formatted_result = { "workspace_name"  : workspace_name,
                             "room_name"       : room_name,
                             "mural_title"     : mural_title,
                             "mural_link"      : mural_link,
                             "mural_thumbnail" : mural_thumbnail,
                             "created"         : created,
                             "creator"         : creator,
                             "passages"        : [] }
        passages = result["document_passages"] if ( "document_passages" in result ) else []
        for passage in passages:
            passage_text = passage["passage_text"] if ( "passage_text" in passage ) else ""
            passage_field = passage["field"] if ( "field" in passage ) else ""
            subpassages = re.findall( r"__SPLTB__.*?__SPLTE__", passage_text )
            for subpassage in subpassages:
                if re.match( r".*\<em>.*", subpassage ):
                    subpassage = re.sub( r"^__SPLTB__", "", subpassage )
                    subpassage = re.sub( r"__SPLTE__$", "", subpassage )
                    parts = subpassage.split( "|", 1 )
                    if ( parts is not None ) and ( len( parts ) > 1 ):
                        widget_id = parts[0]
                        subpassage = parts[1]
                        widget_type = re.sub( r"_arr.*$", "", passage_field )
                        subpassage = "[ " + widget_type + " ] " + subpassage
                        shape = getShape( widget_id, passage_field, result )
                        if shape is not None:
                            subpassage = "[ " + shape + " ] " + subpassage
                        formatted_result["passages"].append( subpassage )
        formatted_results.append( formatted_result )
    return formatted_results

def getShape( widget_id, passage_field, result ):
    field_parts = passage_field.split( "." )
    if ( field_parts is None ) or ( len( field_parts ) < 2 ) or ( "text_arr" == field_parts[0] ):
        return None
    shape = None
    result_arr = result[ field_parts[0] ]
    if isinstance( result_arr, list ):
        for widget in result_arr:
            if widget["id"] == widget_id:
                shape = widget["shape"]
                break
    else:
        widget = result_arr
        if widget["id"] == widget_id:
            shape = widget["shape"]
    if shape is not None:
        shape = re.sub( r"^.*?\|", "", shape )
        shape = re.sub( r"__SPLTE__$", "", shape )
    return shape

In [None]:
response = g_discovery.query( project_id=g_discovery_proj_id, 
                              query = "sticky_arr.text:Turkey bear",
                              passages = { "fields" : [ "sticky_arr.text" ] }
                            ).get_result()

print( json.dumps( formatResults( response["results"] ), indent=3 ) )