_Note: This is a SQL scoped notebook, and therefore should be attached to a Serverless SQL warehouse for execution._  

# FHIR Workshop 
***

#### Efficiently parsing FHIR JSON Bundles using Databricks SQL Serverless Streaming Tables and the Variant Data Type. 


## Unity Catalog Set Up
***

### Catalog Creation 

This workshop was designed so that multiple users could execute all commands including moving files, creating streaming tables or workflows at the same time in a classroom setting. The notebooks are written to take advantage of a single catalog that would be typically set up ahead of time, with each participant given all read/write or execute privileges.  Each user will end up creating a schema that is based on the their value from `current_user()` in Spark.  

The below statement to create the catalog may be commented out and executed by the instructor or a member of the team with "Create Catalog" in the Unity Catalog metastore.  

In [0]:
-- CREATE CATALOG IF NOT EXISTS fhir_workshop;
-- GRANT ALL PRIVILEGES ON CATALOG fhir_workshop TO `account users`;

### Set Schema Value Using a SQL Declared Variable 

We can declare variables in SQL by using the `DECLARE OR REPLACE VARIABLE` snytax.  Here we specify that the schema that will be used based on the user's value of `current_user()` which is typically an email address.  We therefore split the value at the "@" symbol and replace any periods with underscores.  

Since this will be same in each notebook we set the value of the declared variable using the `DEFAULT` syntax.  If we wanted to set the variable to a specific other value, we could use `SET VAR schema_use =` to do so.   We'll see this for another variable in a future notebook.  

In [0]:
DECLARE OR REPLACE VARIABLE schema_use STRING DEFAULT REPLACE(SPLIT(current_user(), '@')[0], '.', '_');

In [0]:
SELECT schema_use;

### Create the Schema in the fhir_workshop Catalog
***

In [0]:
USE CATALOG fhir_workshop;

In [0]:
CREATE SCHEMA IF NOT EXISTS IDENTIFIER("fhir_workshop." || schema_use);

When we need to call a Unity Catalog object such as a catalog, schema, table, function or model using a variable in SQL, we must wrap that variable with the `IDENTFIER` function to let the Spark SQL API know that we need the variable to explicitly evaluated before executing the rest of the statement.  Since `schema_use`is always going to be set by the start of our code we'll be seeing the `IDENTIFIER` frequently at the start of our SQL scoped notebooks.  

In [0]:
USE IDENTIFIER("fhir_workshop." || schema_use);

Its always a good idea to have the catalog and schema displayed at the top of a notebook, especially when the catalog or schema names are variable such as in a proper CI/CD environment.   

Note that with Unity Catalog we refer to collections of governed assets inside a catalog as a schema, however for folks that were around for the original HDFS/Hive days we can still say "database" if we want to as well.  

In [0]:
SELECT
  current_catalog()
  ,current_schema()
  ,current_database()
;

###

In [0]:
CREATE VOLUME IF NOT EXISTS landing;

In [0]:
SHOW VOLUMES;

In [0]:
-- LIST '/Volumes/fhir_workshop/synthea/synthetic_files_raw/output/fhir/' LIMIT 1000;

In [0]:
-- CREATE OR REPLACE FUNCTION copy_to_volume(
--   source_volume STRING COMMENT "The source volume path to copy the files from."
--   ,target_volume STRING COMMENT "The target volume path to move the files." 
--   ,file_pattern STRING COMMENT "The file pattern to match using * for wild cards."
-- )
-- RETURNS STRING
-- LANGUAGE PYTHON
-- AS $$

--   import subprocess

--   # Check if the source and target volumes end with a slash
--   if not source_volume.endswith('/'):
--     source_volume += '/'

--   if not target_volume.endswith('/'):
--     target_volume += '/'

--   # Use find to locate files based on the file pattern
--   if file_pattern is None:
--     file_pattern = '*'

--   find_command = f"find {source_volume} -name '{file_pattern}'"
--   files = subprocess.check_output(find_command, shell=True).decode().splitlines()

--   # Copy each file to the destination directory
--   for file in files:
--       subprocess.run(['cp', file, target_volume])
    
--   return f"Copied {str(len(files))} files."

-- $$

In [0]:
-- DROP FUNCTION copy_to_volume;

In [0]:
-- SELECT copy_to_volume(
--   "/Volumes/fhir_workshop/synthea/synthetic_files_raw/output/fhir/"
--   ,"/Volumes/fhir_workshop/odl_instructor_1452233/landing/"
--   ,"Aaron697*.json"
-- ) as copy_result;