<a id='notebook_contents'></a>
**Instructions**

The [**Required**](#cell_required) "code" cell must be run first before running any experiments.  It only needs to be run once.  You should update the apikey **BEFORE** running the cell.  The apikey variable can be found at the beginning of the cell.  Simply update it with the apikey you were given.

Go [**here**](https://github.com/IBM/whcs-sample-opendata-notebooks/blob/master/iml-covid-19/README.md) for instructions on obtaining an apikey.

_Notebook Contents:_
* [Required](#cell_required)
* [Experiment 001: Retrieve corpora information](#cell_exp_001)
* [Experiment 002: Discover preferred name for a term in specified corpus](#cell_exp_002)
* [Experiment 003: Retrieve metadata for a CUI or preferred name](#cell_exp_003)
* [Experiment 004: Discover available semantic types in specified corpus](#cell_exp_004)
* [Experiment 005: Discover available attributes in specified corpus](#cell_exp_005)
* [Experiment 006: Discover concepts by attribute in specified corpus](#cell_exp_006)
* [Experiment 007: Discover co-occurring concepts in specified corpus](#cell_exp_007)
* [Experiment 008: Discover documents of interest in specified corpus (by terms)](#cell_exp_008)
* [Experiment 009: Discover documents of interest in specified corpus (by attributes)](#cell_exp_009)
* [Experiment 010: Retrieve metadata for document in specified corpus.](#cell_exp_010)
* [Experiment 011: Retrieve content for document in specified corpus.](#cell_exp_011)


<a id='cell_exp_001'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 001:** Retrieve corpora information.

Returns:
* ok: True|False (when true, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
ok, df = get_corpus_names()

# display results
display_df(format_df(df))

<a id='cell_exp_002'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 002:** Discover preferred name for a term in specified corpus.

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

Count Explanation:
* _results_count_ is the number of times the concept is mentioned in the query results (based on specified term)


In [None]:
corpus = "covid19"
max_names = 5

term = input ("Enter a term: ")
ok, df = get_preferred_name(corpus, term, max_names)

# display results
display_df(format_df(df))

<a id='cell_exp_003'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 003:** Retrieve metadata for a cui or preferred name in specified corpus.  

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"

concept = input ("Enter the preferred name or cui for a concept: ")
ok, df = get_concept_meta(corpus, concept)

# display results
display_df(format_df(df))

<a id='cell_exp_004'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 004:** Discover available semantic types in specified corpus.

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"

ok, df = get_semantic_types(corpus)

# display results
display_df(format_df(df))

<a id='cell_exp_005'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 005:** Discover available attributes in specified corpus.

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"

ok, df = get_attributes(corpus)

# display results
display_df(format_df(df))

<a id='cell_exp_006'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 006:** Discover concepts by attribute in specified corpus.

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

Count Explanation:
* _corpus_count_ is the number of times the concept is mentioned in the corpus


In [None]:
corpus = "covid19"
attrib_id = "covid-19"
max_concepts = 20
ok, df = get_concepts_by_attribute(corpus, attrib_id, max_concepts)

if ok:
    # wordcloud
    freq = dict()
    for ind in df.index:
        key = str(df['preferred_name'][ind])
        value = int(df['corpus_count'][ind])
        freq[key] = value
    display_wc_freq(freq)

# display results
display_df(format_df(df))

<a id='cell_exp_007'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 007:** Discover co-occurring disorders, drugs or genes for concept in specified corpus.

This experiment calls two methods.  One call to retrieve concept metadata for a term.  Subsequent call, uses that metadata to retrieve co-occurring concepts (for the specified category).  Both of the methods have the same return signature.

Returns:
* ok: True|False (when True, returned dataframe data is valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

Count Explanation:
* _results_count_ is the number of times the concept is mentioned in the query results (based on specified term)
* _corpus_count_ is the number of times the concept is mentioned in the corpus

The _corpus_count_ will always be less than or equal to the _results_count_

In [None]:
corpus = "covid19"
term = "human coronavirus"
cotype = "drugs" # possible values: "disorders", "drugs" or "genes".

# retrieve concept metadata for term
ok, df = get_preferred_name(corpus, term, 1)
if ok:
    concept = dict()
    concept["cui"] = df.iloc[0]['cui']
    concept["preferred_name"] = df.iloc[0]['preferred_name']
    concept["semantic_type"] = df.iloc[0]['semantic_type']

# display results
display_df(format_df(df))

if ok: # can only continue if we previously retrieved a valid concept

    # get co-occurring concepts
    ok, df = get_co_occurring_concepts(concept, cotype)

    if ok:
        # wordcloud
        freq = dict()
        for ind in df.index:
            key = str(df['preferred_name'][ind])
            value = int(df['corpus_count'][ind])
            freq[key] = value
        display_wc_freq(freq)

    # display results
    display_df(format_df(df))

<a id='cell_exp_008'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 008:** Discover documents of interest in specified corpus (by terms).

Returns:
* ok: True|False (when True, returned document count and dataframe data are valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"
bool_expr = "AND" # "AND" or "OR"
terms = [
    "virus",
    "lung"
]
max_docs = 10

ok, doc_count, df = get_documents_by_terms(corpus, bool_expr, terms, max_docs)

# display results
if ok:
    print("doc_count:", doc_count)
display_df(format_df(df))

<a id='cell_exp_009'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 009:** Discover documents of interest in specified corpus (by attributes).

Returns:
* ok: True|False (when True, returned document count and dataframe data are valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"
bool_expr = "AND" # "AND" or "OR"
attribs = [
    "human_coronavirus"
]
max_docs = 10

ok, doc_count, df = get_documents_by_attributes(corpus, bool_expr, attribs, max_docs)

# display results
if ok:
    print("doc_count:", doc_count)
display_df(format_df(df))

<a id='cell_exp_010'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 010:** Retrieve metadata for document in specified corpus.

Returns:
* ok: True|False (when True, returned document count and dataframe data are valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"
docid = "7086750"

ok, df = get_document_meta(corpus, docid)

# display results
display_df(format_df(df))

<a id='cell_exp_011'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Experiment 011:** Retrieve content for document in specified corpus.

Returns:
* ok: True|False (when True, returned document count and dataframe data are valid)
* df: populated dataframe

Note: When ok is False, the dataframe will be populated with a formatted error message that can still be displayed.

In [None]:
corpus = "covid19"
docid = "7086750"

ok, df = get_document(corpus, docid)

if ok:
    # wordcloud
    title = df.iloc[0]['value'] 
    display_wc_text(title)

# display results
display_df(format_df(df))

<a id='cell_required'></a>
<hr style="height:2px">

[Back to Notebook Contents](#notebook_contents)

**Required:** Run the following cell to load IML functionality.

In [None]:
# substitute APIKEY with the apikey you were given
apikey = "APIKEY"

# uncomment the following line to pip install wordcloud (if you do not already have it installed in your environment)
# !pip install wordcloud

# Copyright 2020 IBM All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#   
# http://www.apache.org/licenses/LICENSE-2.0   
#   
# Unless required by applicable law or agreed to in writing, software   
# distributed under the License is distributed on an "AS IS" BASIS,   
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   
# See the License for the specific language governing permissions and   
# limitations under the License.

# endpoint config
endpoint = "https://us-south.wh-iml.cloud.ibm.com/wh-iml/api/v1"

# needed imports
import json
import requests
import base64
from datetime import datetime
import pandas as pd

from IPython.display import display
from IPython.display import clear_output
import ipywidgets as widgets

import matplotlib.pyplot as plt
from wordcloud import WordCloud # https://anaconda.org/conda-forge/wordcloud

# prepare auth header
apikey_enc = base64.b64encode(("apikey:"+apikey).encode()).decode()
headers = {"Authorization":"Basic %s" % apikey_enc}    


def format_error(level, message, description):
    out = []
    item = dict()
    item["key"] = "level"
    item["value"] = level
    out.append(item)
    item = dict()
    item["key"] = "message"
    item["value"] = message
    out.append(item)
    item = dict()
    item["key"] = "description"
    item["value"] = description
    out.append(item)
    return out


def format_rest_error(resp):
    if (str(resp)).startswith("<Response"):
        print(resp)
        out = []
    else:        
        resp_data = json.loads(resp.content)
        out = []
        item = dict()
        item["key"] = "code"
        item["value"] = resp_data['code']
        out.append(item)
        item = dict()
        item["key"] = "message"
        item["value"] = resp_data['message']
        out.append(item)
        item = dict()
        item["key"] = "level"
        item["value"] = resp_data['level']
        out.append(item)
        item = dict()
        item["key"] = "description"
        item["value"] = resp_data['description']
        out.append(item)
        item = dict()
        item["key"] = "correlationId"
        item["value"] = resp_data['correlationId']
        out.append(item)
    return out


def on_clear_button_clicked(b):
    clear_output(wait=False)

    
def clear_btn():
    clear_button = widgets.Button(description="Clear")
    display(clear_button)
    clear_button.on_click(on_clear_button_clicked)
    return


def display_df(df):
    display(df)
    clear_btn()
    return


def format_df(df, max_rows=None):
    pd.set_option('display.max_colwidth', None)
    styles = [
        dict(selector="th", props=[("text-align", "left")]),
        dict(selector="td", props=[("text-align", "left")])
    ]
    if max_rows is not None:
        return df.head(max_rows).style.set_table_styles(styles)
    else:
        return df.style.set_table_styles(styles)


def display_wc_text(text):
    wordcloud = WordCloud(width=800, height=400, max_font_size=40).generate(text)
    plt.figure(figsize=(20,10))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    clear_btn()
    return


def display_wc_freq(freq):
    wordcloud = WordCloud(width=800, height=400, max_font_size=40).generate_from_frequencies(freq)
    plt.figure(figsize=(20,10))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()
    clear_btn()
    return


def get_corpus_names():
    ok = False
    url_parms_template = "/corpora?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<version>", datetime.now().strftime("%Y-%m-%d"))
    
    # make rest call and process response
    resp = requests.get(endpoint + url_parms, headers=headers)
    
    if resp.ok:
        resp_data = json.loads(resp.content)
        out = []
        for corpus in resp_data["corpora"]:
            item = dict()
            item["corpus_name"] = corpus["corpusName"]
            item["ontologies"] = corpus["ontologies"]
            item["descriptive_name"] = corpus["descriptiveName"]
            out.append(item)
        df = pd.DataFrame(out)
        df = df.sort_values("corpus_name", ascending=True)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_preferred_name(corpus, term, limit):
    ok = False
    ontology = "umls"
    url_parms_template = "/corpora/<corpus>/search/typeahead?version=<version>&query=<term>&ontologies=<ontology>&verbose=false&_limit=<limit>&max_hit_count=5000000&no_duplicates=true"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<term>", term).replace("<ontology>", ontology).replace("<limit>", str(limit)).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.get(endpoint + url_parms, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        concepts = resp_data["concepts"]
        if len(concepts) > 0:
            out = []
            for concept in concepts:
                item = dict()
                item["cui"] = concept["cui"]
                if "preferredName" in concept.keys():
                    item["preferred_name"] = concept["preferredName"]
                else:
                    item["preferred_name"] = concept["cui"]
                item["semantic_type"] = concept["semanticType"]
                item["results_count"] = concept["hitCount"]
                out.append(item)
            df = pd.DataFrame(out)
            df = df.sort_values("results_count", ascending=False)
            ok = True
        else:
            out = format_error("ERROR", "No concepts found for: " + term, "Re-run with an alternate term.")
            df = pd.DataFrame(out)
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_concept_meta(corpus, concept):
    ok = False
    url_parms_template = "/corpora/<corpus>/concepts/<concept>?version=<version>&ontology=concepts&tree_layout=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<concept>", concept).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.get(endpoint + url_parms, headers=headers)
    if resp.ok:
        concept = json.loads(resp.content)
        out = []
        if "cui" in concept.keys():
            item = dict()
            item["key"] = "cui"
            item["value"] = concept["cui"]
            out.append(item)
        if "ontology" in concept.keys():
            item = dict()
            item["key"] = "ontology"
            item["value"] = concept["ontology"]
            out.append(item)
        if "preferredName" in concept.keys():
            item = dict()
            item["key"] = "preferred_name"
            item["value"] = concept["preferredName"]
            out.append(item)
        if "semanticTypes" in concept.keys():
            item = dict()
            item["key"] = "semantic_types"
            item["value"] = concept["semanticTypes"]
            out.append(item)
        if "surfaceForms" in concept.keys():
            item = dict()
            item["key"] = "surface_forms"
            item["value"] = concept["surfaceForms"]
            out.append(item)
        if "definition" in concept.keys():
            item = dict()
            item["key"] = "definition"
            item["value"] = concept["definition"]
            out.append(item)
        if "hasParents" in concept.keys():
            item = dict()
            item["key"] = "has_parents"
            item["value"] = concept["hasParents"]
            out.append(item)
        if "hasChildren" in concept.keys():
            item = dict()
            item["key"] = "has_children"
            item["value"] = concept["hasChildren"]
            out.append(item)
        if "hasSiblings" in concept.keys():
            item = dict()
            item["key"] = "has_siblings"
            item["value"] = concept["hasSiblings"]
            out.append(item)
        df = pd.DataFrame(out)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_semantic_types(corpus):
    ok = False
    search = {
        "returns": {
            "types": {
                "ontology": "concepts"
            }
        }
    }
    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        sorted_list = sorted(resp_data["types"])    
        out = []
        for semantic_type in sorted_list:
            item = dict()
            item["name"] = semantic_type
            out.append(item)
        df = pd.DataFrame(out)
        df = df.sort_values("name", ascending=True)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_attributes(corpus):
    ok = False
    search = {
        "query": {},
        "returns": {
            "attributes": {}
        }
    }    
    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        out = []
        for doc in resp_data["attributes"]:
            item = dict()
            item["attrib_id"] = doc["attributeId"]
            item["display_name"] = doc["displayName"]
            out.append(item)
        df = pd.DataFrame(out)
        df = df.sort_values("attrib_id", ascending=True)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_concepts_by_attribute(corpus, attrib_id, max_concepts=20):
    ok = False
    search = {
      "query": {
        "title": {
          "boost": "1"
        }
      },
      "returns": {
        "typeahead": {
          "ontology": "concepts",
          "query": "",
          "noDuplicates": True,
          "limit": 20,
          "scope": "query"
        }
      }
    }
    
    search['returns']['typeahead']['query'] = attrib_id
    search['returns']['typeahead']['limit'] = max_concepts
    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        concepts = resp_data["typeahead"]
        if len(concepts) > 0:
            out = []
            for concept in concepts:
                item = dict()
                item["cui"] = concept["cui"]
                item["preferred_name"] = concept["preferredName"]
                item["semantic_type"] = concept["semanticType"]
                item["corpus_count"] = concept["hitCount"]
                out.append(item)
            df = pd.DataFrame(out)
            df = df.sort_values("corpus_count", ascending=False)
            ok = True
        else:
            out = format_error("ERROR", "No concepts found for: " + attrib_id, "Re-run with an alternate attribute.")
            df = pd.DataFrame(out)
    else:
        df = pd.DataFrame(format_rest_error(resp))
    return ok, df


def get_co_occurring_concepts(concept, cotype, limit=100):
    ok = False
    search = {
      "query": {
        "boolExpression": "",
        "concepts": [
          {
            "boolOperand": "",
            "ontology": "concepts",
            "cui": "",
            "rank": 10,
            "semanticType": ""
          }
        ],
        "title": {
          "boost": "1"
        }
      },
      "returns": {
        "concepts": {
          "ontology": "concepts",
          "limit": 100,
          "types": [],
          "section": "*",
          "mode": "popular"
        }
      }
    }
    
    disorder_types = [
        "AcquiredAbnormality",
        "AnatomicalAbnormality",
        "CellOrMolecularDysfunction",
        "CongenitalAbnormality",
        "DiseaseOrSyndrome",
        "ExperimentalModelofDisease",
        "InjuryOrPoisoning",
        "MentalOrBehavioralDysfunction",
        "NeoplasticProcess",
        "PathologicFunction"
    ]
    
    drug_types = [
        "Antibiotic",
        "ClinicalDrug",
        "Hormone",
        "PharmacologicSubstance"
    ]
    
    gene_types = [
        "AminoAcidSequence",
        "CarbohydrateSequence",
        "GeneOrGenome",
        "MolecularSequence",
        "NucleotideSequence"
    ]

    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))
    
    # prepare search
    search['query']['boolExpression'] = concept['preferred_name']
    search['query']['concepts'][0]['boolOperand'] = concept['preferred_name']
    search['query']['concepts'][0]['cui'] = concept['cui']
    search['query']['concepts'][0]['semanticType'] = concept['semantic_type']
    search['returns']['concepts']['limit'] = limit
    if cotype == "disorders":
        search['returns']['concepts']['types'] = disorder_types
    elif cotype == "drugs":
        search['returns']['concepts']['types'] = drug_types
    elif cotype == "genes":
        search['returns']['concepts']['types'] = gene_types
    else:
        out = format_error("ERROR", "Unsupported co-occurring concept type specified", "Re-run with a valid co-occurring concept type (disorders, drugs, genes).")
        df = pd.DataFrame(out)
        return ok, df # can't continue, so return
        
    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        concepts = resp_data["concepts"]
        out = []
        for concept in concepts:
            item = dict()
            item["cui"] = concept["cui"]
            item["preferred_name"] = concept["preferredName"]
            item["semantic_type"] = concept["semanticType"]
            item["results_count"] = concept["hitCount"]
            item["corpus_count"] = concept["count"]
            out.append(item)
        df = pd.DataFrame(out)
        df = df.sort_values("results_count", ascending=False)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))     
    return ok, df


def get_documents_by_terms(corpus, bool_expr, terms, limit=10):
    ok = False
    search = {
        "query": {
            "boolExpression": "",
            "concepts" : [],
            "title": {
              "boost": "1"
            }
        },
        "returns": {
            "documents": {
              "limit": "10",
              "offset": "0"
            }
        }
    }
    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))
    
    # prepare boolean expression
    terms_formatted = []
    for term in terms:
        terms_formatted.append('(' + term + ')')
    search['query']['boolExpression'] = (' ' + bool_expr + ' ').join(terms_formatted)
                               
    # prepare concepts
    concepts = []
    for term in terms:
        concept = dict()
        concept.update({"ontology":"text"})
        concept.update({"rank":10})
        concept.update({"boolOperand":term})
        concept.update({"text":term})
        concept.update({"proximity":0})
        concepts.append(concept)
    search['query']['concepts'] = concepts
                               
    search['returns']['documents']['limit'] = limit

    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        doc_count = resp_data["totalDocumentCount"]
        out = []
        for doc in resp_data["documents"]:
            item = dict()
            item["doc_id"] = doc["documentId"]
            item["title"] = doc["title"]
            out.append(item)
        df = pd.DataFrame(out)
        ok = True
    else:
        doc_count = 0
        df = pd.DataFrame(format_rest_error(resp))        
    return ok, doc_count, df


def get_documents_by_attributes(corpus, bool_expr, attribs, limit=10):
    ok = False
    search = {
        "query": {
            "boolExpression": "",
            "rankedSearch": True,
            "concepts": [
            ]
        },
        "returns": {
            "documents": {
              "limit": "10",
              "offset": "0"
            }
        }
    }
    concepts = []
    for attrib in attribs:
        item = dict()
        item["boolOperand"] = attrib
        item["ontology"] = "attributes"
        item["cui"] = "*"
        item["rank"] = 10
        item["semanticType"] = attrib
        concepts.append(item)
    
    url_parms_template = "/corpora/<corpus>/search?version=<version>&verbose=false"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<version>", datetime.now().strftime("%Y-%m-%d"))
    search['query']['boolExpression'] = (' ' + bool_expr + ' ').join(attribs)
    search['query']['concepts'] = concepts
    search['returns']['documents']['limit'] = limit

    # make rest call and process response
    resp = requests.post(endpoint + url_parms, json=search, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        doc_count = resp_data["totalDocumentCount"]
        out = []
        for doc in resp_data["documents"]:
            item = dict()
            item["doc_id"] = doc["documentId"]
            item["title"] = doc["title"]
            out.append(item)
        df = pd.DataFrame(out)
        ok = True
    else:
        doc_count = 0
        df = pd.DataFrame(format_rest_error(resp))                
    return ok, doc_count, df


def get_document_meta(corpus, docid):
    ok = False
    url_parms_template = "/corpora/<corpus>/documents/<docid>?version=<version>&verbose=true"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<docid>", docid).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.get(endpoint + url_parms, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        out = []
        meta = resp_data["metadata"]
        for key in meta.keys():
            item = dict()
            item["key"] = key
            item["value"] = meta[key]
            out.append(item)
        df = pd.DataFrame(out)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))                
    return ok, df


def get_document(corpus, docid):
    ok = False
    url_parms_template = "/corpora/<corpus>/documents/<docid>?version=<version>&verbose=true"
    url_parms = url_parms_template.replace("<corpus>", corpus).replace("<docid>", docid).replace("<version>", datetime.now().strftime("%Y-%m-%d"))

    # make rest call and process response
    resp = requests.get(endpoint + url_parms, headers=headers)
    if resp.ok:
        resp_data = json.loads(resp.content)
        out = []
        item = dict()
        item["key"] = "title"
        item["value"] = resp_data["title"]
        out.append(item)
        item = dict()
        item["key"] = "abstract"
        item["value"] = ""
        if "abstract" in resp_data["sections"].keys():
            item["value"] = resp_data["sections"]["abstract"]
        out.append(item)
        df = pd.DataFrame(out)
        ok = True
    else:
        df = pd.DataFrame(format_rest_error(resp))                
    return ok, df


[Back to Notebook Contents](#notebook_contents)