## Vist Occurance Table Mapping

This is an attempt at mapping FHIR to OMOP using the following guide: https://build.fhir.org/ig/HL7/cdmh/profiles.html#omop-to-fhir-mappings
<br>In this notebook we are mapping FHIR to the OMOP Encounter Table

### Load Data Frame from Parquet Catalog File

In [45]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import dayofmonth,month,year,to_date,trunc,split,explode,array

# Create a local Spark session
spark = SparkSession.builder.appName('etl').getOrCreate()

In [46]:
# Reads file 
df = spark.read.parquet('data/catalog.parquet')

Data Frame schema 

In [47]:
#df.printSchema()

### Encounter Mapping 

Filter By Encounter Resource type 

In [48]:
filtered = df.filter(df['resourceType'] == 'Encounter')

In [49]:
#filtered.show(20)

Selects relevant fields 

In [51]:
Encounter = filtered.select(['id','subject','type',
                              'location','hospitalization.admitSource',
                              'period','extension'])
#Encounter.printSchema()

Explode the location structure to form the fields "care_site_id" and "discharge_to_concept_id" <br> TODO: Find the correct origin field

In [52]:
#Encounter.withColumn("care_site_id", "location.location")))\
#    .withColumn("discharge_to_concept_id", explode(array("location.physicalType")))\
#    .show(10)

Extract the start and end date along with the time from the period field.

In [53]:
#splits the date and time
split_start = split(Encounter['period.start'], 'T')
split_end = split(Encounter['period.end'], 'T') 

#assigns each to a column 
vist_date_time = Encounter\
    .withColumn("visit_start_date",split_start.getItem(0))\
    .withColumn("visit_start_datetime",split_start.getItem(1))\
    .withColumn("visit_end_date",split_end.getItem(0))\
    .withColumn("visit_end_datetime",split_end.getItem(1))

Drop columns no longer needed

In [54]:
dropped  = vist_date_time.drop("period")

Rename the columns 

In [55]:
visit_occurnace = dropped\
    .withColumnRenamed("type","preceding_visit_occurence")\
    .withColumnRenamed("id","visit_occurence_id")\
    .withColumnRenamed("admitSource","admitting_source_concept_id")\
    .withColumnRenamed("subject","person_id")\
    .withColumnRenamed("type","preceding_visit_occurence")\
    .withColumnRenamed("extension","visit_type_concept_id")

#.withColumnRenamed("location.location.id","care_site_id")\    
#.withColumnRenamed("location.location.type","discharge_to_concept_id")\

Shows mapped output table

In [56]:
visit_occurnace.show(5) 

+--------------------+--------------------+-------------------------+--------------------+---------------------------+---------------------+----------------+--------------------+--------------+------------------+
|  visit_occurence_id|           person_id|preceding_visit_occurence|            location|admitting_source_concept_id|visit_type_concept_id|visit_start_date|visit_start_datetime|visit_end_date|visit_end_datetime|
+--------------------+--------------------+-------------------------+--------------------+---------------------------+---------------------+----------------+--------------------+--------------+------------------+
|a3d6098b-0af4-4d9...|{urn:uuid:f245361...|     {[{[{http://snome...|                null|                       null|                 null|      2016-09-04|      02:36:34-12:00|    2016-09-04|    02:51:34-12:00|
|c420368d-3336-44a...|{urn:uuid:46c9970...|     {[{[{http://snome...|{[{{Location?iden...|                       null|                 null|      20