# FHIR Search API Exploration

Evaluate the search features on a handful of FHIR servers. This demonstrates a bunch of search queries on the servers being evaluated. Scroll down past the evaluation section to run the notebook and see it in action. 

## Summary

Evaluations are listed from least promising to most promising

### Microsoft FHIR Server on Azure

Pros
- Open source
- Free
- A "straight" FHIR server that adheres to FHIR spec
- Good error handling and reporting
    - Reports when a search param or operation is not supported

Cons
- Only runs on Azure
- Very little documentation
- Does not support a lot of major search functionality
    - Doesn't support full text search
    - Doesn't support custom queries
    - Chaining/reverse chaining
    - Including references


### Vonk

Pros
- A "straight" FHIR server that adheres to FHIR spec
- Has ~70% of search API spec implemented
- Decent documentation
- Vonk team is fairly responsive to issue reports (0.5 - 1day) on Zulip

Cons
- Not open source
- Not free
- Production releases are very buggy
- Not as quick to turn around fixes and releases as Aidbox
- Ignores search parameters it doesn't understand and returns everything
- Error handling and reporting is confusing or lacking
- Has a limited policy engine for access control
- Doesn't support full text search
- Doesn't support custom queries


### Smile CDR (commercial HAPI)

Pros
- A "straight" FHIR server that adheres to FHIR spec
- Has 70% of search API spec implemented
- Looks like professional/detailed documentation
- Looks like there are a lot of features - haven't had time to delve into them all
- Built on top of HAPI, open source FHIR server
- Good error handling and reporting
    - Reports when a search param or operation is not supported
- Supports full text search
- Supports multitenancy and seems to have a lot of features relating to auth

Cons
- Not free
- Not sure if it supports some of the basic search parameters

### Aidbox

Pros
- Has 70% of search API spec implemented
- Supports special search features that make it very flexible
    - Full text search
    - Query by SQL
    - Save SQL queries as endpoints
    - Query by exact path
- Decent documentation
- Team is very responsive to issues and makes releases often
- Good error handling and reporting
    - Reports when a search param or operation is not supported
- Aidbox representation of conformance resources is much easier to digest
  and understand than FHIR conformance resources
- Uses PostgresSQL as their database and its open source
- Has a GraphQL API - haven't tested it yet

Cons
- Represents conformance resources (e.g. extensions, search parameters in their own way
    - Until they've developed the tools to easily go between Aidbox and FHIR, we will 
      have to do this ourselves
- Does not seem to support the :missing: modifier for search

In [1]:
import requests
from click.testing import CliRunner
from pprint import pprint, pformat
import pandas

from requests.auth import HTTPBasicAuth

from kf_model_fhir.config import FHIR_VERSION, SERVER_CONFIG, PROJECT_DIR
from kf_model_fhir.loader import load_resources
from kf_model_fhir.utils import read_json

from helpers import *

print(f'Servers being evaluated: {pformat(list(SERVER_CONFIG.keys()))}')

Servers being evaluated: ['aidbox',
 'aidbox-local',
 'smile-cdr',
 'hapi',
 'azure',
 'vonk-kidsfirst-public']


## Setup Required

Every server being evaluated is publically hosted so you don't need to spin up any docker containers. You just need to clone the `kf-model-fhir` repo and switch to the `search-api-testing` branch

### 1. Get the Code

```shell
# Get code
git clone git@github.com:kids-first/kf-model-fhir.git
cd kf-model-fhir

# Switch to right branch
git checkout search-api-testing
```

### 2. Setup Virtual Environment

```shell
# Setup virtual env
python3 -m venv venv
source venv/bin/activate

# Install requirements
pip install -e .
```

Now you're ready to run this notebook

## Important Notes

### * Your Network Might Block Some FHIR Servers
For us at chop, this means you have to be on `chopguest` to run this since `chopnet` blocks the Smile CDR server

### * Disclaimer - Throw Away Code
Code in this branch is throw away code and only meant for search API exploration - don't judge :)
There are probably bugs and things might break if you change certain things

## Generate the Test Data

In [2]:
# Generate resources
run_cli_cmd('generate', [os.path.join(PROJECT_DIR, 'resources')])

2019-12-02 12:52:23,994 - kf_model_fhir.app - INFO - Created research study SD-00000
2019-12-02 12:52:23,995 - kf_model_fhir.app - INFO - Created research study SD-00001
2019-12-02 12:52:23,997 - kf_model_fhir.app - INFO - Created patient PT-00000
2019-12-02 12:52:23,999 - kf_model_fhir.app - INFO - Created research subject RS-00000
2019-12-02 12:52:24,001 - kf_model_fhir.app - INFO - Created specimen BS-00000-0
2019-12-02 12:52:24,003 - kf_model_fhir.app - INFO - Created observation OB-00000-0
2019-12-02 12:52:24,005 - kf_model_fhir.app - INFO - Created specimen BS-00000-1
2019-12-02 12:52:24,007 - kf_model_fhir.app - INFO - Created observation OB-00000-0
2019-12-02 12:52:24,009 - kf_model_fhir.app - INFO - Created condition CD-00000-0
2019-12-02 12:52:24,013 - kf_model_fhir.app - INFO - Created patient PT-00001
2019-12-02 12:52:24,034 - kf_model_fhir.app - INFO - Created research subject RS-00001
2019-12-02 12:52:24,036 - kf_model_fhir.app - INFO - Created specimen BS-00001-0
2019-12

## Test Data Description

- Data for this notebook is in the `kf-model-fhir/project` folder. 
- Conformance resources like StructureDefinitions and SearchParameters are in `kf-model-fhir/project/profiles` 
- Dummy resources that were generated from the step earlier are located `kf-model-fhir/project/resources`

In [3]:
resources = load_resources(os.path.join(PROJECT_DIR, 'resources'))
df = pandas.DataFrame(
    [
        {'resource_type': r['resource_type'],
         'id': r['content'].get('id'),
         'references': r['content'].get('subject', {}).get('reference'),
        }
        for r in resources
    ]
)
display(df)
counts = df.groupby(['resource_type']).size()
display(counts)

2019-12-02 12:52:24,313 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Condition-CD-00000-0.json
2019-12-02 12:52:24,318 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Condition-CD-00001-0.json
2019-12-02 12:52:24,319 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Condition-CD-00002-0.json
2019-12-02 12:52:24,321 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Condition-CD-00003-0.json
2019-12-02 12:52:24,322 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Condition-CD-00004-0.json
2019-12-02 12:52:24,323 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/projec

2019-12-02 12:52:24,394 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Specimen-BS-00003-0.json
2019-12-02 12:52:24,396 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Specimen-BS-00003-1.json
2019-12-02 12:52:24,397 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Specimen-BS-00004-0.json
2019-12-02 12:52:24,398 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Specimen-BS-00004-1.json
2019-12-02 12:52:24,399 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/resources/Specimen-BS-00005-0.json
2019-12-02 12:52:24,400 - kf_model_fhir.loader - DEBUG - Reading resource file: /Users/singhn4/Projects/kids_first/kf-model-fhir/project/res

Unnamed: 0,resource_type,id,references
0,Condition,CD-00000-0,Patient/PT-00000
1,Condition,CD-00001-0,Patient/PT-00001
2,Condition,CD-00002-0,Patient/PT-00002
3,Condition,CD-00003-0,Patient/PT-00003
4,Condition,CD-00004-0,Patient/PT-00004
...,...,...,...
57,Specimen,BS-00007-1,Patient/PT-00007
58,Specimen,BS-00008-0,Patient/PT-00008
59,Specimen,BS-00008-1,Patient/PT-00008
60,Specimen,BS-00009-0,Patient/PT-00009


resource_type
Condition          10
Observation        10
Patient            10
ResearchStudy       2
ResearchSubject    10
Specimen           20
dtype: int64

## Load the Test Data

This might take a couple minutes. Our test data is deleted first from every server and than POST/PUT to every server. The servers are slow

### You might be able to skip this since the servers probably have data loaded

In [4]:
# Publish profiles and resources to server
# load_all_servers(server_names=['smile-cdr'])

## FHIR Search API Spec

https://www.hl7.org/fhir/search.html

## What are SearchParameters?

https://www.hl7.org/fhir/searchparameter.html

You might be wondering how you can do a query like this: `/Patient?family=Holmes` because `family` is a nested attribute in the Patient payload: `name: [ {family: Holmes} ]`. Well, this is where SearchParameters come in!

A SearchParameter is a conformance resource that defines what part of the resource payload to get the value of a search attribute. For example, in the SearchParameter definition for `family`, it probably has a path defined in it like this: `Patient.name.family`. Since `family` is part of the base FHIR Patient resource, this SearchParameter is probably already loaded into most FHIR servers that have the base FHIR resources.

If you really want to see what the SearchParameter looks like try this:

In [5]:
url = f"{SERVER_CONFIG['smile-cdr']['base_url']}/SearchParameter"
params = {'base': 'Patient', 'code': 'family'}
resp, status_code = get(url, params=params)
print('\nSearchParameter definition for `family` attribute on Patient resource\n')
pprint(resp['entry'][0]['resource'])

NameError: name 'get' is not defined

## Basic Queries

In [None]:
queries = [
    {
        'desc': 'Get all patients',
        'endpoint': 'Patient',
        'params': {}
    },
    {
        'desc': 'Get all female patients',            
        'endpoint': 'Patient',
        'params': {'gender': 'female'}
    },
    {
        'desc': 'Get all female patients with last name = Holmes',        
        'endpoint': 'Patient',
        'params': {'gender': 'female', 'family': 'Holmes'}
    }
]
execute_queries(queries)

## Using Modifiers

These are strings that start and/or end with `:` and get appended onto the name of the search parameter
you're using in a query.

https://www.hl7.org/fhir/search.html#modifiers

In [None]:
queries = [
    {
        'desc': 'Get all patients missing the gender attribute',
        'endpoint': 'Patient',
        'params': {'gender:missing': True}
    },
    {
        'desc': 'Get all patients that are NOT female',            
        'endpoint': 'Patient',
        'params': {'gender:not': 'female'}
    },
    {
        'desc': 'Get all female patients with name containing Hol',        
        'endpoint': 'Patient',
        'params': {'gender': 'female', 'name:contains': 'Hol'}
    },
]
execute_queries(queries)

## Comparison Operators

These represent comparators like >, <, =, <=, >=, etc.

https://www.hl7.org/fhir/search.html#prefix

In [None]:
queries = [
    {
        'desc': 'Get all glucose in blood observations with value > 5 mmol/l',
        'endpoint': 'Observation',
        'params': {'value-quantity': 'gt5', 'code': '15074-8'}
    }
]
execute_queries(queries)

## Searching Coded Things

Tokens are basically coded things - its an attribute with a code, that comes from a system, and has associated
text too. If a SearchParameter is of type `token` this means when used in a query, by default, the token's
code is searched.

For example Specimen.bodysite is a `token` type search parameter. That means
you can search for specimens by bodysite codes like this: /Specimen?bodysite=49852007
    
Also by default (if server supports it) you can search for a token by its text like this:
/Specimen?bodysite:text=<text version representation of 49852007>

https://www.hl7.org/fhir/search.html#token

In [None]:
queries = [
    {
        'desc': 'Get all specimens with code text = Left median cubital vein',
        'endpoint': 'Specimen',
        'params': {'bodysite:text': 'Left median cubital vein'}
    },
    {
        'desc': 'Get all specimens with bodysite code = 49852007 (Left median cubital vein)',            
        'endpoint': 'Specimen',
        'params': {'bodysite': '49852007'}
    },
    {
        'desc': 'Get all anemia conditions by code',            
        'endpoint': 'Condition',
        'params': {'code': '271737000'}
    }
]
execute_queries(queries)

## Search Using References

Chaining and reverse chaining

https://www.hl7.org/fhir/search.html#chaining

In [None]:
# Get a sample patient name
from pprint import pprint
c = SERVER_CONFIG['azure']
patient, sc = get(f"{c['base_url']}/Patient/PT-00001")
patient_name = patient['name'][0]['given'][0]

queries = [
    {
        'desc': 'Get all specimens for a patient by patient ID',
        'endpoint': 'Specimen',
        'params': {'subject:Patient': "PT-00001"}
    },
    {
        'desc': 'Get all specimens for a patient using their name (chained search parameters)',
        'endpoint': 'Specimen',
        'params': {'subject:Patient.name': patient_name}
    },
    {
        'desc': 'Get patients with a specimen that has body site denoted by <code> (reverse chained parameters)',
        'endpoint': 'Patient',
        'params': {'_has:Specimen:patient:bodysite': '49852007'}
    },
    {
#        This should return all patients - since every observation is a (15074-8) glucose in blood
        'desc': 'Get all the patients that have a Specimen where specimen has an observation.status=final.',
        'endpoint': 'Patient',
        'params': {'_has:Specimen:patient:_has:Observation:code': '15074-8'}
    },
]

execute_queries(queries)

## Include Referenced Resources

`_include` = Get child with its parent 

`_revinclude` = Get parent with all of its children

Both return nested resources in the payload

https://www.hl7.org/fhir/search.html#revinclude

In [None]:
queries = [
    {
        'desc': 'Get a specimens with its patient',
        'endpoint': 'Specimen',
        'params': {'identifier': "BS-00001-0",
         '_include': 'Specimen:patient'}
    },
    {
        'desc': 'Get an observation and the specimen it is about',
        'endpoint': 'Observation',
        'params': {'identifier': "OB-00001-0",
         '_include': 'Observation:specimen'}
    },
    {
        'desc': 'Get a patient with all of its specimens',
        'endpoint': 'Patient',
        'params': {'identifier': "PT-00001", '_include': 'Patient:specimen'}
    }
]
execute_queries(queries, display_content=True)


## Full Text Search

In [None]:
queries = [
    {
        'desc': 'Full text search - get all specimens w/ code text = Left median cubital vein',
        'endpoint': 'Specimen',
        'params': {'_content': '"Left median cubital vein"'}
    }
]
execute_queries(queries)

## Custom Search Parameter

When you add an extension to a resource, you must create a SearchParameter in order to search for resources
by that extension. For example, you've created an `race` extension and use that on Patient resources. Now
you want to do searches like this: /Patient?race=2028-9 or /Patient?race:text=Asian. In order to do that
you will need to create a SearchParameter for the race extension.

The example queries below demonstrate searches with a custom search parameter

In [None]:
queries = [
    {
        'desc': 'Search on extension - get all patients with particular race',
        'endpoint': 'Patient',
        'params': {'race': '2028-9'}
    }
]
execute_queries(queries)

## Custom Search Query

Sometimes the RESTful FHIR search API cannot satisfy your query needs. It would be nice if you could query the underlying database(s) of the FHIR server directly. 

Aidbox is the only server solution that seems to support this via their API

In [None]:
# Only Aidbox supports this
from requests.auth import HTTPBasicAuth

c = SERVER_CONFIG['aidbox']
sql_str = (
    """
    SELECT
    p.id AS patient_id,
    s.id AS specimen_id,
    s.resource AS specimen    
    FROM 
    patient AS p
    JOIN specimen AS s
    ON p.id = s.resource->'subject'->>'id';
    """
)
url = f"{c['base_url'].rstrip('/fhir')}/$sql"
resp = requests.post(
    url,
    auth=HTTPBasicAuth(c['username'], c['password']),
    data=sql_str,
    headers={'Content-Type': 'text/yaml'}
)
pprint(resp.json())

## Exact Path Match

This is only supported by Aidbox. Sometimes you want to search by an attribute in your resource payload, 
but you don't have a SearchParameter for it defined and loaded into the server. 

With aidbox, you can simply search by the path to that attribute in the resource payload. For example, I want to search for Specimens by the analyte type extension but I don't have a SearchParameter defined for it. I can search for Specimens by analyte type like this:

In [None]:
# Only Aidbox supports this
queries = [
    {
        'desc': 'Get all specimens with analyte type = DNA',
        'endpoint': 'Specimen',
        'params': {'.extension.0.valueString': 'DNA'}
    }
]
execute_queries(queries)