# Table of contents

<ol style="list-style: none; margin: 20px 0px 0px 0px; padding: 0px">
<li style="margin: 0px 0px 3px 0px;"><b>Step 1:</b> Read MURAL data from project asset</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 2:</b> Create local documents (files in working directory)</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 3:</b> Authenticate with Watson Discovery</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 4:</b> Create a Project in Discovery</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 5:</b> Create a Collection in Discovery</li>
<li style="margin: 0px 0px 3px 0px;"><b>Step 6:</b> Upload documents to Collection</li>
</ol>

## Step 1: Read MURAL data from project asset

### 1.1 Get project asset file details

In the [Read data from MURAL](https://github.com/spackows/MURAL-API-Samples/blob/main/notebooks/Discovery_01-Read-data-from-MURAL.ipynb) notebook, we saved mural data in a JSON file as a project asset.

To import the saved data into this notebook, click on the empty cell below and then perform these steps:
1. Open the data panel by clicking on the **Find and Add Data** icon ( <img style="margin: 0px; padding: 0px; display: inline; height: 25px;" src="https://github.com/spackows/CASCON-2019_NLP-workshops/raw/master/images/find-add-data-icon.png"/> )
2. Under the file named `documents_arr.json` click **Insert to code**
3. Select "Credentials"

Code will be automatically added to the cell to define a dictionary, called something like: "metadata_1".

Run the cell to define that dictionary object.

### 1.2 Copy file from project assets to notebook working directory

Run the next cell to define a helper routine that copies that JSON file from the project Cloud Object storage to the notebook working directory.

Then call the routine in the cell that comes after.

In [None]:
from ibm_botocore.client import Config
import ibm_boto3

def copyToNotebookDir( credentials ):
    cos = ibm_boto3.client(
        service_name='s3',
        ibm_api_key_id=credentials['IBM_API_KEY_ID'],
        ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
        ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
        config=Config(signature_version='oauth'),
        endpoint_url=credentials['ENDPOINT'])
    cos.download_file(Bucket=credentials['BUCKET'],Key=credentials['FILE'],Filename=credentials['FILE'])
    print( "Done: '" + credentials['FILE'] + "'" )

In [None]:
copyToNotebookDir( metadata_1 )

### 1.3. Read JSON file

Read the JSON data from the file in the notebook working directory to a structure in memory, `g_documents_arr`.

In [None]:
import json

with open( "documents_arr.json" ) as f:
    g_documents_arr = json.load(f)

print( json.dumps( g_documents_arr, indent=3 ) )

## Step 2: Create local documents (files in working directory)

Run the following cell to create one file in the notebook working directory for each mural.

In [None]:
for document_json in g_documents_arr:
    file_name = document_json["id"] + ".json"
    with open( file_name, "w" ) as f:
        f.write( json.dumps( document_json, indent=3 ) )

!ls

## Step 3: Authenticate with Watson Discovery

### 3.1 Create a service instance

Create an instance of the IBM Watson Discovery service.  

See: [IBM Watson Discovery in the IBM Cloud catalog](https://cloud.ibm.com/catalog/services/watson-discovery)


### 3.2 Get the API key and URL for your service instance

From the "manage" page of your Discovery service instance in IBM Cloud, copy the API key and URL into the cell below.

In [None]:
g_discovery_apikey = ""
g_discovery_url = ""

### 3.3 Install `ibm_watson` library

See: [IBM Watson Discovery v2 API](https://cloud.ibm.com/apidocs/discovery-data?code=python)

In [None]:
!pip install ibm_watson

### 3.4 Authenticate

See: [Discovery authentication for IBM Cloud](https://cloud.ibm.com/apidocs/discovery-data?code=python#authentication-cloud)

In [None]:
from ibm_watson import DiscoveryV2
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator( g_discovery_apikey )
g_discovery = DiscoveryV2( version= "2020-08-30", authenticator=authenticator )

g_discovery.set_service_url( g_discovery_url )

## Step 4: Create a Project in Discovery

In Discovery, you organize your work in "Projects".

In [None]:
def createDiscoveryProject( project_name ):
    # https://cloud.ibm.com/apidocs/discovery-data?code=python#createproject
    response_json = g_discovery.create_project( name=project_name, type="document_retrieval" ).get_result()
    print( json.dumps( response_json, indent=2 ) )
    return response_json["project_id"]

In [None]:
project_name = "MURAL Search Project"
g_project_id = createDiscoveryProject( project_name )

## Step 5: Create a Collection in Discovery

In Discovery, you assemble documents to search in "Collections".

In [None]:
def createDiscoveryCollection( project_id, collection_name ):
    # https://cloud.ibm.com/apidocs/discovery-data?code=python#createcollection
    response_json = discovery.create_collection( project_id=project_id, name=collection_name, language="en" ).get_result()
    print( json.dumps( response_json, indent=2 ) )
    return response_json["collection_id"]

In [None]:
collection_name = "MURAL Search Collection"
g_collection_id = createDiscoveryCollection( g_project_id, collection_name )

## Step 6: Upload documents to Collection

In [None]:
def addMuralToDiscovery( file_name, project_id, collection_id ):
    # https://cloud.ibm.com/apidocs/discovery-data?code=python#adddocument
    with open( file_name, "rb" ) as f:
        response_json = discovery.add_document( project_id=project_id, 
                                                collection_id=collection_id, 
                                                file=f, 
                                                filename=file_name, 
                                                file_content_type="application/json" ).get_result()
        print( "\n\n" + file_name + ":\n" + json.dumps( response_json, indent=2 ) )
        return response_json["document_id"]

In [None]:
for document_json in g_documents_arr:
    file_name = document_json["id"] + ".json"
    addMuralToDiscovery( file_name, g_project_id, g_collection_id )