## Overview
This notebook loads a collection of synthetic FHIR bundles and value sets and shows some simple queries. Running this first will set up the environment for other notebooks in the tutorial

## Setup Tasks
Some setup before the real show begins...

In [None]:
from pyspark.sql import SparkSession

# Enable Hive support for our session so we can save resources as Hive tables
spark = SparkSession.builder \
                    .config('hive.exec.dynamic.partition.mode', 'nonstrict') \
                    .enableHiveSupport() \
                    .getOrCreate()

## Import Synthetic Data
This tutorial uses data generated by Synthea. It is simply a directory of STU3 bundles visible included in the tutorial; you can see it in the bundles directory.

Let's load the bundles and examine a couple data types in them.

## Reading from a Hive database (Persistent in Google Storage Bucket
Now that we've saved our data to a Hive database, we can easily view and query the tables with Spark SQL:

In [5]:
spark.sql('use tutorial_small')
spark.sql('show tables').toPandas()

Unnamed: 0,database,tableName,isTemporary
0,tutorial_small,allergyintolerance,False
1,tutorial_small,careplan,False
2,tutorial_small,claim,False
3,tutorial_small,condition,False
4,tutorial_small,encounter,False
5,tutorial_small,immunization,False
6,tutorial_small,medication,False
7,tutorial_small,medicationrequest,False
8,tutorial_small,observation,False
9,tutorial_small,organization,False


In [6]:
spark.sql("""
select subject.reference, 
       count(*) cnt
from encounter
where class.code != 'WELLNESS' and
      period.start > '2013'
group by subject.reference
order by cnt desc
limit 10
""").toPandas()

Unnamed: 0,reference,cnt
0,urn:uuid:e206880c-7762-4aee-a3e2-5a8c89512c18,53
1,urn:uuid:e538491e-cf8e-4a3f-97a5-45811e066f27,44
2,urn:uuid:dcad3c44-64de-43b6-b24c-989f8f27c71d,33
3,urn:uuid:5804a9d3-3518-4862-a1e4-a61b0f1a4be4,31
4,urn:uuid:2bf9eab0-fec0-41b2-9f91-3369e38b98f6,19
5,urn:uuid:90a7ded5-a5ce-43df-b973-7bc7ce7a3011,18
6,urn:uuid:8f538e46-a1d1-4c75-beb7-e3946124e730,16
7,urn:uuid:6f58dbea-7532-4090-97a8-79982bab98f5,12
8,urn:uuid:aa251e83-9a9b-446f-ba2f-87e2da7c4d34,8
9,urn:uuid:73bbd5a3-00b5-4216-bd5d-601359ca9e42,6


## Using Valuesets in Queries
Finally, we illustrate how we can easily use FHIR valuesets within Spark SQL. Bunsen provides an *in_valueset* user-defined function that can be invoked directly from SQL, so users can easily work with valuesets without needing complex joins to separate ontology tables.

First, we will push some interesting valuesets to the cluster with the *push_valuesets* function seen below. This uses Apache Spark's broadcast variables to get this reference data on each node, so it can be easily used. Details are in that function documentation, but typically users work with valuesets in one of three ways:

* From a FHIR ValueSet resource, as illustrated here
* As a collection of values in a Python structure
* As an is-a relationship in some ontology, like LOINC or SNOMED.

Further documentation can be viewed in the function documentation or via help(push_valuesets).

As an example here we are using persistent (pre-built) hive databases in Google Storage Buckets.
Let's take a look at an example:

In [None]:
from bunsen.stu3.valuesets import push_valuesets, valueset

# Push multiple valuesets for this example, even though we use only one.
push_valuesets(spark, 
               {'ldl'               : [('http://loinc.org', '18262-6')],                
                'hdl'               : [('http://loinc.org', '2085-9')],
                'cholesterol'       : valueset('http://hl7.org/fhir/ValueSet/example-extensional', '20150622')},
               database='tutorial_ontologies'); 

Now that the above valuesets have been broadcast across our processing cluster, we can easily query them with the *in_valueset* user-defined function inline with our SQL:

In [None]:
spark.sql("""
select subject.reference, 
       valueQuantity.value,
       valueQuantity.unit
from tutorial_small.observation
where in_valueset(code, 'cholesterol')
limit 10
""").toPandas()