# SyntheaToEDS


This notebook shows how to generate Synthea data and how to translate it to EDS (Entrepôt de Santé) data bundles.




In [1]:
// Let's generate a number of patients using the Synthea generator.
// The default Synthea modules are used (modules contain
// generation rules).
import fr.aphp.wind.eds.generator.source.synthea
val syntheaBundle = synthea.generate(5)

Scanned 60 modules and 36 submodules.
Loading submodule modules/breast_cancer/tnm_diagnosis.json
Loading submodule modules/allergies/allergy_incidence.json
Loading submodule modules/dermatitis/moderate_cd_obs.json
Loading submodule modules/dermatitis/severe_cd_obs.json
Loading submodule modules/contraceptives/female_sterilization.json
Loading submodule modules/allergies/outgrow_env_allergies.json
Loading submodule modules/lung_cancer/lung_cancer_probabilities.json
Loading submodule modules/contraceptives/patch_contraceptive.json
Loading submodule modules/allergies/allergy_panel.json
Loading submodule modules/breast_cancer/surgery_therapy_breast.json
Loading submodule modules/allergies/severe_allergic_reaction.json
Loading submodule modules/dermatitis/early_severe_eczema_obs.json
Loading submodule modules/total_joint_replacement/functional_status_assessments.json
Loading submodule modules/contraceptives/ring_contraceptive.json
Loading submodule modules/anemia/anemia_sub.json
Loading sub

In [2]:
// Let's count the patients
syntheaBundle.patients.count()

5

In [3]:
// Let's get an EDS bundle from the Synthea bundle
import fr.aphp.wind.eds.generator.target.eds._
val edsBundle = FromSynthea(syntheaBundle) 
// Let's validate that the tables have schemas compatible with the EDS
// (all the schemas are automatically generated from the table descriptio)sn.
edsBundle.validate(allowMissingFields=true).throwOnErrors()

In [4]:
edsBundle.persons.show()

+-------------------+-----------------+-------------------+--------------+
|          person_id|gender_concept_id|     birth_datetime|death_datetime|
+-------------------+-----------------+-------------------+--------------+
|5318687062932589564|             8532|1964-02-08 00:00:00|          null|
|8052569651285488277|             8507|2003-12-03 00:00:00|          null|
|4621425714919720823|             8507|1998-11-03 00:00:00|          null|
|8677407672435441732|             8507|1982-04-04 00:00:00|          null|
|5433578829485951173|             8507|1974-09-27 00:00:00|          null|
+-------------------+-----------------+-------------------+--------------+



In [5]:
// This is how the names are converted to observations. Feel free to peek
// in /op/generator/generator-target-eds/src/main/scala/fr/aphp/wind/eds/generator/target/eds/FromSynthea.scala
// for more examples.
import java.util.UUID
import org.apache.spark.sql.functions._
import spark.implicits._

// User-Defined SQL function that casts a UUID as generated by Synthea into
// an OMOP-compatible ID.
val omopId = udf((stringId: String) => {
    math.abs(UUID.fromString(stringId).getLeastSignificantBits)
})

val df = syntheaBundle.patients
Map("first" -> 3042942L, "last" -> 3046810L, "ssn" -> 398093005L)
    .map { case (syntheaColumn, conceptId) =>
    df.select("id", syntheaColumn)
        .where('first.isNotNull)
        .withColumn("observation_concept_id", typedLit(conceptId))
        .withColumn("person_id", omopId('id))
        .withColumnRenamed(syntheaColumn, "value_as_string")
        .drop("id")
    }
    .reduce {
        _ union _
    }.show()

+---------------+----------------------+-------------------+
|value_as_string|observation_concept_id|          person_id|
+---------------+----------------------+-------------------+
|    Mechelle851|               3042942|5318687062932589564|
|       Arden380|               3042942|8052569651285488277|
|     Ignacio928|               3042942|4621425714919720823|
|      Lucien408|               3042942|8677407672435441732|
|    Jermaine675|               3042942|5433578829485951173|
|      Durgan499|               3046810|5318687062932589564|
|    Hermiston71|               3046810|8052569651285488277|
|      Duarte203|               3046810|4621425714919720823|
|       Huels583|               3046810|8677407672435441732|
|     Watsica258|               3046810|5433578829485951173|
|    999-69-9983|             398093005|5318687062932589564|
|    999-43-3332|             398093005|8052569651285488277|
|    999-99-1974|             398093005|4621425714919720823|
|    999-94-5337|       