# Synthea Medgraph

***Synthetic Patient Dataset Generation***

Open-source healthcare data generator that models the medical history of synthetic patients. The resulting dataset is free of PHI or PHI.

## Install pyTigerGraph

In [1]:
from IPython.display import clear_output
!pip install -qq watermark

################################
# Packages to install
################################
!pip install -U -qq pyTigerGraph

################################

clear_output()
%reload_ext watermark
%watermark -v -p numpy,pyTigerGraph

Python implementation: CPython
Python version       : 3.9.10
IPython version      : 8.0.1

numpy       : 1.22.2
pyTigerGraph: 0.0.9.9.2



***Imports***

In [2]:
import pyTigerGraph as tg
import json
import pandas as pd

***Helper Functions***

In [3]:
def pprint(input):
  print(json.dumps(input, indent=2))

***Connection parameters***

In [4]:
hostName = "http://localhost"
userName = "tigergraph"
password = "tigergraph"

conn = tg.TigerGraphConnection(host=hostName, username=userName, password=password)

## Understanding the Schema

![](https://i.ibb.co/CQXYbH9/Screenshot-2021-08-24-003804.png)

At it's core, this schema repesents all of a patient's interactions with the healthcare system. It also includes demographic information about each patient including their age, race, gender, and location. The full schema can be found on the [Synthea Data Directory](https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary).

Interactions with the healthace system are stored as `Encounters` and those `Encounters` nodes will connect to nodes expressing the nature of each `Encounter` such as a `Procedure` with a physician or a `Medication` perscription.

**A Critical part of this graph is the `SnomedCode` vertex.** This vertex type is tied to almost every type of `Encounters` and contains the Snomed code that describes each `Encounters`. For those unfamilar with Snomed codes, they are standardized codes used to describe the nature of a medical encounter. For example:
* Snomed code **31978002** describes a **fracture of the tibia**
* Snomed code **86174004** signifes the use of a **laparoscope** medical device
* Snomed code **25876001** denotes that a procedure is of **emergency** priority

You can read more about Snomed [here](https://confluence.ihtsdotools.org/display/DOCSTART/4.+SNOMED+CT+Basics)

Below is an overview of some key aspects of the schema and a look at their attributes. Where `_code` is used, such as `allergy_code` that is listing a **Snomed code**. These codes are also expressed as edges that lead to Nodes representing each **Snomed code**.


### <center>Encounter Related Vertices

#### <center>Encounter

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID encounter_id|STRING|
|baseEncounterCost|DOUBLE|
|totalClaimCost|DOUBLE|
|payerCoverage|DOUBLE|
|classType|STRING|
|startTime|DATETIME|
|endTime|DATETIME|

#### <center>Notes

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID note_id|STRING|
|chiefComplaint|STRING|
|historyOfPresentIllness|STRING|
|socialHistory|STRING|
|allergies|STRING|
|medications|STRING|
|assessment|STRING|
|plan|STRING|

#### <center>Symptoms

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID symptom_id|STRING|
|symptom|STRING|
|symptomValue|INT|
|pathology|STRING|

#### <center>Medication

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID medication_id |STRING|
|medication_code|STRING|
|description|STRING|
|startDate|DATETIME|
|endDate|DATETIME|
|baseCost|DOUBLE|
|payerCoverage|DOUBLE|
|dispenses|INT|
|totalCost|DOUBLE|

#### <center>Procedures

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID procedure_id|STRING|
|procedure_code|STRING|
|description|STRING|
|baseCost|INT|
|dateOfProcedure|DATETIME|

#### <center>Observations

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID observation_id|STRING|
|dateOfObservation|DATETIME|
|observation_code|STRING|
|description|STRING|
|obsValue|STRING|
|units|STRING|

#### <center>Imaging Studies

|Attribute Name|Attribute Type|
|---|---|
|PRIMARY_ID imaging_id|STRING|
|bodySiteCode|STRING|
|bodySiteDescription|STRING|
|modalityCode|STRING|
|modalityDescription|STRING|
|SOPCode|STRING|
|SOPDescription|STRING|
|dateOfImage|DATETIME|

## Loading the Schema

We'll be creating the schema through a series of **GSQL** queries that we'll execute through pyTigerGraph.

#### `Patient` related Vertices

In [5]:
conn.gsql(
    """
CREATE VERTEX Gender(PRIMARY_ID gender_id STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Race(PRIMARY_ID race_id STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Ethnicity(PRIMARY_ID ethnicity_id STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Patient(PRIMARY_ID patient_id STRING, lastName STRING, firstName STRING, maiden STRING,
                      birthday DATETIME, ssn STRING, license STRING, passport STRING,
                      healthcareExpense DOUBLE, healthcareCoverage DOUBLE, suffix STRING, prefix STRING, 
                      maritalStatus STRING, birthplace STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Allergies(PRIMARY_ID allergy_id STRING, allergy_code STRING, description STRING, 
                        startDate DATETIME, endDate DATETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX Immunizations(PRIMARY_ID immunization_id STRING, immunization_code STRING, description STRING, 
                            dateOfImmunization DATETIME, baseCost DOUBLE) WITH primary_id_as_attribute="true"
"""
)

'Successfully created vertex types: [Gender].\nSuccessfully created vertex types: [Race].\nSuccessfully created vertex types: [Ethnicity].\nSuccessfully created vertex types: [Patient].\nSuccessfully created vertex types: [Allergies].\nSuccessfully created vertex types: [Immunizations].'

#### `Location` related Vertices

In [6]:
conn.gsql(
    """
CREATE VERTEX AddressSynthea(PRIMARY_ID address_id STRING, name STRING, lat DOUBLE, lon DOUBLE) WITH primary_id_as_attribute="true"
CREATE VERTEX CitySynthea(PRIMARY_ID city_id STRING, cityName STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX StateSynthea(PRIMARY_ID state_id STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX CountySynthea(PRIMARY_ID county_id STRING, countyName STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX ZipCodeSynthea(PRIMARY_ID zip_id STRING) WITH primary_id_as_attribute="true"
"""
)

'Successfully created vertex types: [AddressSynthea].\nSuccessfully created vertex types: [CitySynthea].\nSuccessfully created vertex types: [StateSynthea].\nSuccessfully created vertex types: [CountySynthea].\nSuccessfully created vertex types: [ZipCodeSynthea].'

#### `Encounter` related Vertices

In [7]:
conn.gsql(
    """
CREATE VERTEX Encounter(PRIMARY_ID encounter_id STRING, baseEncounterCost DOUBLE, totalClaimCost DOUBLE,
                        payerCoverage DOUBLE, classType STRING, startTime DATETIME, endTime DATETIMe) WITH primary_id_as_attribute="true"
CREATE VERTEX Notes(PRIMARY_ID note_id STRING, chiefComplaint STRING, historyOfPresentIllness STRING,
                        socialHistory STRING, allergies STRING, medications STRING, assessment STRING, plan STRING) with primary_id_as_attribute="true"
CREATE VERTEX Symptoms(PRIMARY_ID symptom_id STRING, symptom STRING, symptomValue DOUBLE, pathology STRING) with primary_id_as_attribute="true"
CREATE VERTEX Medication(PRIMARY_ID medication_id STRING, medication_code STRING, description STRING,
                        startDate DATETIME, endDate DATETIME, baseCost DOUBLE, payerCoverage DOUBLE, dispenses INT, totalCost DOUBLE) WITH primary_id_as_attribute="true"
CREATE VERTEX Procedures(PRIMARY_ID procedure_id STRING, procedure_code STRING, description STRING, baseCost DOUBLE, dateOfProcedure DaTETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX Observations(PRIMARY_ID observation_id STRING, dateOfObservation DATETIME, observation_code STRING,
                        description STRING, obsValue STRING, units STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX ImagingStudies(PRIMARY_ID imaging_id STRING, bodySiteCode STRING, bodySiteDescription STRING,
                        modalityCode STRING, modalityDescription STRING, SOPCode STRING, SOPDescription STRING, dateOfImage DATETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX SnomedCode(PRIMARY_ID snomed_code STRING, description STRING) WITH primary_id_as_attribute="true"
"""
)

'Successfully created vertex types: [Encounter].\nSuccessfully created vertex types: [Notes].\nSuccessfully created vertex types: [Symptoms].\nSuccessfully created vertex types: [Medication].\nSuccessfully created vertex types: [Procedures].\nSuccessfully created vertex types: [Observations].\nSuccessfully created vertex types: [ImagingStudies].\nSuccessfully created vertex types: [SnomedCode].'

#### Provider and Other Vertices

In [8]:
conn.gsql(
    """
CREATE VERTEX Conditions(PRIMARY_ID condition_id STRING, condition_code STRING, description STRING, startDate DATETIME, endDate DATETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX Organizations(PRIMARY_ID organization_id STRING, name STRING, revenue DOUBLE, utilization INT, phone STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Providers(PRIMARY_ID provider_id STRING, name STRING, utilization INT, speciality STRING) WITH primary_id_as_attribute="true"
CREATE VERTEX Attribute(PRIMARY_ID label STRING, attributeValue INT) WITH primary_id_as_attribute="true"
CREATE VERTEX Device(PRIMARY_ID UDI_code STRING, description STRING, startDate DATETIME, endDate DATETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX Careplans(PRIMARY_ID careplan_id STRING, description STRING, startDate DATETIME, endDate DATETIME) WITH primary_id_as_attribute="true"
CREATE VERTEX Payer(PRIMARY_ID payer_id STRING, name STRING, phone STRING, amountCovered DOUBLE, amountUncovered DOUBLE, revenue DOUBLE,
                    coveredEncounters INT, uncoveredEncounters INT, coveredMedications INT, uncoveredMedications INT,
                    coveredProcedures INT, uncoveredProcedures INT, coveredImmunizations INT, uncoveredImmunizations INT,
                    uniqueCustomers INT, QOLS_Avg DOUBLE, memberMonths INT) WITH primary_id_as_attribute="true"
"""
)

'Successfully created vertex types: [Conditions].\nSuccessfully created vertex types: [Organizations].\nSuccessfully created vertex types: [Providers].\nSuccessfully created vertex types: [Attribute].\nSuccessfully created vertex types: [Device].\nSuccessfully created vertex types: [Careplans].\nSuccessfully created vertex types: [Payer].'

### Create Edges

#### `Patient` related Edges

In [9]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE PATIENT_HAS_ATTRIBUTE(FROM Patient, TO Attribute)
CREATE UNDIRECTED EDGE PATIENT_HAS_SYMPTOM(FROM Patient, TO Symptoms, ageBegin INT, ageEnd INT)
CREATE UNDIRECTED EDGE PATIENT_NOTE(FROM Patient, TO Notes, dateOfNote DATETIME)
CREATE UNDIRECTED EDGE PATIENT_GENDER(FROM Patient, TO Gender)
CREATE UNDIRECTED EDGE PATIENT_ADDRESS(FROM Patient, TO AddressSynthea)
CREATE UNDIRECTED EDGE PATIENT_RACE(FROM Patient, TO Race)
CREATE UNDIRECTED EDGE PATIENT_ETHNICITY(FROM Patient, TO Ethnicity)
CREATE UNDIRECTED EDGE PATIENT_HAS_ALLERGY(FROM Allergies, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_DEVICE(FROM Device, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_MEDICATION(FROM Medication, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_CAREPLAN(FROM Careplans, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_CONDITION(FROM Conditions, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_IMMUNIZATION(FROM Immunizations, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_IMAGING(FROM ImagingStudies, TO Patient)
CREATE UNDIRECTED EDGE PATIENT_HAS_PROCEDURE(FROM Procedures, TO Patient)
"""
)

'Successfully created edge types: [PATIENT_HAS_ATTRIBUTE].\nSuccessfully created edge types: [PATIENT_HAS_SYMPTOM].\nSuccessfully created edge types: [PATIENT_NOTE].\nSuccessfully created edge types: [PATIENT_GENDER].\nSuccessfully created edge types: [PATIENT_ADDRESS].\nSuccessfully created edge types: [PATIENT_RACE].\nSuccessfully created edge types: [PATIENT_ETHNICITY].\nSuccessfully created edge types: [PATIENT_HAS_ALLERGY].\nSuccessfully created edge types: [PATIENT_HAS_DEVICE].\nSuccessfully created edge types: [PATIENT_HAS_MEDICATION].\nSuccessfully created edge types: [PATIENT_HAS_CAREPLAN].\nSuccessfully created edge types: [PATIENT_HAS_CONDITION].\nSuccessfully created edge types: [PATIENT_HAS_IMMUNIZATION].\nSuccessfully created edge types: [PATIENT_HAS_IMAGING].\nSuccessfully created edge types: [PATIENT_HAS_PROCEDURE].'

#### `Medication` Edges

In [10]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE MEDICATION_PAYER(FROM Medication, TO Payer)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_MEDICATION(FROM Medication, TO Encounter)
CREATE UNDIRECTED EDGE MEDICATION_REASON_CODE(FROM Medication, TO SnomedCode)
CREATE UNDIRECTED EDGE MEDICATION_CODE(FROM Medication, TO SnomedCode)
"""
)

'Successfully created edge types: [MEDICATION_PAYER].\nSuccessfully created edge types: [ENCOUNTER_FOR_MEDICATION].\nSuccessfully created edge types: [MEDICATION_REASON_CODE].\nSuccessfully created edge types: [MEDICATION_CODE].'

#### `Location` Edges

In [11]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE ADDRESS_CITY_SYNTHEA(FROM AddressSynthea, TO CitySynthea)
CREATE UNDIRECTED EDGE ADDRESS_COUNTY_SYNTHEA(FROM AddressSynthea, TO CountySynthea)
CREATE UNDIRECTED EDGE ADDRESS_ZIPCODE_SYNTHEA(FROM AddressSynthea, TO ZipCodeSynthea)
CREATE UNDIRECTED EDGE STATE_HAS_COUNTY_SYNTHEA(FROM StateSynthea, TO CountySynthea)
CREATE UNDIRECTED EDGE COUNTY_HAS_CITY_SYNTHEA(FROM CountySynthea, TO CitySynthea)
CREATE UNDIRECTED EDGE CITY_HAS_ZIPCODE_SYNTHEA(FROM CitySynthea, TO ZipCodeSynthea)
"""
)

'Successfully created edge types: [ADDRESS_CITY_SYNTHEA].\nSuccessfully created edge types: [ADDRESS_COUNTY_SYNTHEA].\nSuccessfully created edge types: [ADDRESS_ZIPCODE_SYNTHEA].\nSuccessfully created edge types: [STATE_HAS_COUNTY_SYNTHEA].\nSuccessfully created edge types: [COUNTY_HAS_CITY_SYNTHEA].\nSuccessfully created edge types: [CITY_HAS_ZIPCODE_SYNTHEA].'

#### `Encounter` Edges

In [12]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_ALLERGY(FROM Allergies, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_DEVICE(FROM Device, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_PROCEDURE(FROM Procedures, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_CAREPLAN(FROM Careplans, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_CONDITION(FROM Conditions, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_IMMUNIZATION(FROM Immunizations, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_OBSERVATION(FROM Observations, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_IMAGING(FROM ImagingStudies, TO Encounter)
CREATE UNDIRECTED EDGE ENCOUNTER_FOR_PATIENT(FROM Encounter, TO Patient)
CREATE UNDIRECTED EDGE ENCOUNTER_UNDER_ORGANIZATION(FROM Encounter, TO Organizations)
CREATE UNDIRECTED EDGE ENCOUNTER_HAS_PROVIDER(FROM Encounter, TO Providers)
CREATE UNDIRECTED EDGE ENCOUNTER_HAS_PAYER(FROM Encounter, TO Payer)
CREATE UNDIRECTED EDGE ENCOUNTER_CODE(FROM Encounter, TO SnomedCode)
CREATE UNDIRECTED EDGE ENCOUNTER_REASON_CODE(FROM Encounter, TO SnomedCode)
"""
)

'Successfully created edge types: [ENCOUNTER_FOR_ALLERGY].\nSuccessfully created edge types: [ENCOUNTER_FOR_DEVICE].\nSuccessfully created edge types: [ENCOUNTER_FOR_PROCEDURE].\nSuccessfully created edge types: [ENCOUNTER_FOR_CAREPLAN].\nSuccessfully created edge types: [ENCOUNTER_FOR_CONDITION].\nSuccessfully created edge types: [ENCOUNTER_FOR_IMMUNIZATION].\nSuccessfully created edge types: [ENCOUNTER_FOR_OBSERVATION].\nSuccessfully created edge types: [ENCOUNTER_FOR_IMAGING].\nSuccessfully created edge types: [ENCOUNTER_FOR_PATIENT].\nSuccessfully created edge types: [ENCOUNTER_UNDER_ORGANIZATION].\nSuccessfully created edge types: [ENCOUNTER_HAS_PROVIDER].\nSuccessfully created edge types: [ENCOUNTER_HAS_PAYER].\nSuccessfully created edge types: [ENCOUNTER_CODE].\nSuccessfully created edge types: [ENCOUNTER_REASON_CODE].'

#### `Provider` and `Payer` Edges

In [13]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE PROVIDER_HAS_ORGANIZATION(FROM Providers, TO Organizations)
CREATE UNDIRECTED EDGE PROVIDER_GENDER(FROM Providers, TO Gender)
CREATE UNDIRECTED EDGE PROVIDER_ADDRESS(FROM Providers, TO AddressSynthea)
CREATE UNDIRECTED EDGE PAYER_TRANSITION(FROM Payer, TO Patient, startYear DATETIME, endYear DATETIME, ownership STRING)
CREATE UNDIRECTED EDGE PAYER_ADDRESS(FROM Payer, TO AddressSynthea)
"""
)

'Successfully created edge types: [PROVIDER_HAS_ORGANIZATION].\nSuccessfully created edge types: [PROVIDER_GENDER].\nSuccessfully created edge types: [PROVIDER_ADDRESS].\nSuccessfully created edge types: [PAYER_TRANSITION].\nSuccessfully created edge types: [PAYER_ADDRESS].'

#### Additional Edges

In [14]:
conn.gsql(
    """
CREATE UNDIRECTED EDGE ALLERGY_CODE(FROM Allergies, TO SnomedCode)
CREATE UNDIRECTED EDGE DEVICE_CODE(FROM Device, TO SnomedCode)
CREATE UNDIRECTED EDGE PROCEDURE_CODE(FROM Procedures, TO SnomedCode)
CREATE UNDIRECTED EDGE PROCEDURE_REASON_CODE(FROM Procedures, TO SnomedCode)
CREATE UNDIRECTED EDGE CAREPLAN_CODE(FROM Careplans, TO SnomedCode)
CREATE UNDIRECTED EDGE CAREPLAN_REASON_CODE(FROM Careplans, TO SnomedCode)
CREATE UNDIRECTED EDGE CONDITION_CODE(FROM Conditions, TO SnomedCode)
CREATE UNDIRECTED EDGE IMMUNIZATION_CODE(FROM Immunizations, TO SnomedCode)
CREATE UNDIRECTED EDGE OBSERVATION_FOR_PATIENT(FROM Observations, TO Patient)
CREATE UNDIRECTED EDGE OBSERVATION_CODE(FROM Observations, TO SnomedCode)
CREATE UNDIRECTED EDGE ORGANIZATION_ADDRESS(FROM Organizations, TO AddressSynthea)
CREATE UNDIRECTED EDGE IMAGING_CODE(FROM ImagingStudies, TO SnomedCode)
"""
)

'Successfully created edge types: [ALLERGY_CODE].\nSuccessfully created edge types: [DEVICE_CODE].\nSuccessfully created edge types: [PROCEDURE_CODE].\nSuccessfully created edge types: [PROCEDURE_REASON_CODE].\nSuccessfully created edge types: [CAREPLAN_CODE].\nSuccessfully created edge types: [CAREPLAN_REASON_CODE].\nSuccessfully created edge types: [CONDITION_CODE].\nSuccessfully created edge types: [IMMUNIZATION_CODE].\nSuccessfully created edge types: [OBSERVATION_FOR_PATIENT].\nSuccessfully created edge types: [OBSERVATION_CODE].\nSuccessfully created edge types: [ORGANIZATION_ADDRESS].\nSuccessfully created edge types: [IMAGING_CODE].'

## Create Graph

The schema that we just created exists in the **Global** sense. We can have more than one **Graph** per **Solution**, so a **Global** schema allows us to re-use parts or all of that schema across multiple graphs.

We're not going to need to do anything fancy like that for this demo, it's just important to know that any **Graphs** can contain none or any number of elements from the **Global** schema and can even contain elements that are unique to that **Graph's** schema.

In [15]:
conn.gsql(
    """
    CREATE GRAPH MedGraph(Gender, Race, Ethnicity, AddressSynthea, CitySynthea, StateSynthea, CountySynthea, ZipCodeSynthea, 
                 SnomedCode, Patient, Allergies, Device, Medication, Procedures,
                 Careplans, Conditions, Immunizations, Observations, Organizations, Providers, ImagingStudies, Payer, Encounter, 
                 Notes, Symptoms, Attribute, PATIENT_HAS_ATTRIBUTE,
                 PATIENT_HAS_SYMPTOM, PATIENT_NOTE, ADDRESS_CITY_SYNTHEA, ADDRESS_COUNTY_SYNTHEA, ADDRESS_ZIPCODE_SYNTHEA, STATE_HAS_COUNTY_SYNTHEA, 
                 COUNTY_HAS_CITY_SYNTHEA, CITY_HAS_ZIPCODE_SYNTHEA, PATIENT_GENDER,
  PATIENT_ADDRESS, PATIENT_RACE, PATIENT_ETHNICITY, PATIENT_HAS_ALLERGY, ENCOUNTER_FOR_ALLERGY, ALLERGY_CODE,
  PATIENT_HAS_DEVICE, ENCOUNTER_FOR_DEVICE, DEVICE_CODE, PATIENT_HAS_MEDICATION, MEDICATION_PAYER, ENCOUNTER_FOR_MEDICATION,
  MEDICATION_REASON_CODE, MEDICATION_CODE, PROCEDURE_CODE, PROCEDURE_REASON_CODE, PATIENT_HAS_PROCEDURE, ENCOUNTER_FOR_PROCEDURE,
  PATIENT_HAS_CAREPLAN, ENCOUNTER_FOR_CAREPLAN, CAREPLAN_CODE, CAREPLAN_REASON_CODE, PATIENT_HAS_CONDITION, ENCOUNTER_FOR_CONDITION,
  CONDITION_CODE, PATIENT_HAS_IMMUNIZATION, ENCOUNTER_FOR_IMMUNIZATION, IMMUNIZATION_CODE, OBSERVATION_FOR_PATIENT, ENCOUNTER_FOR_OBSERVATION,
  OBSERVATION_CODE, ORGANIZATION_ADDRESS, PROVIDER_HAS_ORGANIZATION, PROVIDER_GENDER, PROVIDER_ADDRESS, PATIENT_HAS_IMAGING, ENCOUNTER_FOR_IMAGING,
  IMAGING_CODE, PAYER_TRANSITION, PAYER_ADDRESS, ENCOUNTER_FOR_PATIENT, ENCOUNTER_UNDER_ORGANIZATION, ENCOUNTER_HAS_PROVIDER, ENCOUNTER_HAS_PAYER,
  ENCOUNTER_CODE, ENCOUNTER_REASON_CODE)
"""
)

'The graph MedGraph is created.'

### Re-connecting to the Graph

Now that the graph is created, we need to update our pyTigerGraph connection to point specifically to our graph. This will allow us to create a **Secret**, then use that **Secret** to get a **Token** which will be used for secure authentication to that specific graph in our **Solution**. 

In [17]:
graphName = "MedGraph"
conn.graphname = graphName
secret = conn.createSecret()
token = conn.getToken(secret, setToken=True)
print("Connection token:", token[0])

Connection token: ht4qn8e70s3v5vd6d76p22d7scn1s11k


## Loading Jobs

A **Loading Job** defines how fields from our input files map to the primary_id and attributes of vertices and source and destination vertices and attributes of edges.

To help visualize what's going on here, let's clone our data so we know what it looks like.

In [18]:
!git clone https://github.com/TigerGraph-DevLabs/Synthea-Medgraph.git

Cloning into 'Synthea-Medgraph'...
remote: Enumerating objects: 400, done.[K
remote: Counting objects: 100% (400/400), done.[K
remote: Compressing objects: 100% (213/213), done.[K
remote: Total 400 (delta 170), reused 373 (delta 154), pack-reused 0[K
Receiving objects: 100% (400/400), 23.39 MiB | 15.36 MiB/s, done.
Resolving deltas: 100% (170/170), done.


In [21]:
!head './Synthea-Medgraph/data/allergies copy.csv.header'
!head -n 3 './Synthea-Medgraph/data/allergies copy.csv'

ID, START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION
1,1958-06-14,,0ce21d8a-a59a-4c85-88c3-3bacdbb895f8,bf367bf3-9359-4530-aeb2-75a1ea849914,417532002,Allergy to fish
2,1933-11-14,,08afe1df-9155-4627-ad60-f75bfd4a5695,1fbe1386-7f90-4f67-91e4-ca99849e99f4,232347008,Dander (animal) allergy
3,1994-09-29,,a7dd6e3e-5e2a-44bd-aa38-1e368ca6980e,9ad5c33a-4d2b-4284-ace5-5582d7480894,300913006,Shellfish allergy


### Anatomy of a Loading Job

Looking at the loading job below, you'll see that we don't refer to the column names of our data by their headers, but rather by their column number. The chart below shows how that compares to the header from our alergies file above.

|Column Number|\$0|\$1|\$2|\$3|\$4|\$5|\$6|
|---|---|---|---|---|---|---|---|
|Column Name|ID|START|STOP|PATIENT|ENCOUNTER|CODE|DESCRIPTION|

In [31]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadAllergies
CREATE LOADING JOB loadAllergies FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Allergies VALUES ($0, $5, $6, $1, $2),
        TO VERTEX SnomedCode VALUES ($5, $6),
        TO EDGE PATIENT_HAS_ALLERGY VALUES ($0, $3),
        TO EDGE ALLERGY_CODE VALUES ($0, $5),
        TO EDGE ENCOUNTER_FOR_ALLERGY VALUES ($0, $4)
        USING header="true", separator=",";
    }
END
""")

"Using graph 'MedGraph'\nSuccessfully dropped jobs on the graph 'MedGraph': [loadAllergies].\nSuccessfully created loading jobs: [loadAllergies]."

In [32]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadAttributes
CREATE LOADING JOB loadAttributes FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Attribute values($1, $2),
        TO EDGE PATIENT_HAS_ATTRIBUTE values($0, $1)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSuccessfully dropped jobs on the graph 'MedGraph': [loadAttributes].\nSuccessfully created loading jobs: [loadAttributes]."

In [34]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadCareplans
CREATE LOADING JOB loadCareplans FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Careplans values ($0, $6, $1, $2),
        TO VERTEX SnomedCode values ($5, $6),
        TO VERTEX SnomedCode values ($7, $8),
        TO EDGE PATIENT_HAS_CAREPLAN values ($0, $3),
        TO EDGE ENCOUNTER_FOR_CAREPLAN values ($0, $4),
        TO EDGE CAREPLAN_CODE values ($0, $5),
        TO EDGE CAREPLAN_REASON_CODE values ($0, $7)
        USING header="true", separator=",";
    }
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadCareplans].\nSuccessfully created loading jobs: [loadCareplans]."

In [35]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadConditions
CREATE LOADING JOB loadConditions FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Conditions values ($0, $5, $6, $1, $2),
        TO VERTEX SnomedCode values ($5, $6),
        TO EDGE CONDITION_CODE values ($0, $5),
        TO EDGE PATIENT_HAS_CONDITION values ($0, $3),
        TO EDGE ENCOUNTER_FOR_CONDITION values ($0, $4)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadConditions].\nSuccessfully created loading jobs: [loadConditions]."

In [36]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadDevices
CREATE LOADING JOB loadDevices FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Device values ($6, $5, $0, $1),
        TO VERTEX SnomedCode values ($4, $5),
        TO EDGE PATIENT_HAS_DEVICE values ($6, $2),
        TO EDGE DEVICE_CODE values ($6, $4),
        TO EDGE ENCOUNTER_FOR_DEVICE values ($6, $3)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadDevices].\nSuccessfully created loading jobs: [loadDevices]."

In [37]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadEncounters
CREATE LOADING JOB loadEncounters FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Encounter values ($0, $10, $11, $12, $7, $1, $2),
        TO VERTEX SnomedCode Values ($8, $9),
        TO VERTEX SnomedCode values ($13, $14),
        TO EDGE ENCOUNTER_FOR_PATIENT values ($0, $3),
        TO EDGE ENCOUNTER_UNDER_ORGANIZATION values ($0, $4),
        TO EDGE ENCOUNTER_HAS_PAYER values ($0, $6),
        TO EDGE ENCOUNTER_HAS_PROVIDER values ($0, $5),
        TO EDGE ENCOUNTER_CODE values ($0, $8),
        TO EDGE ENCOUNTER_REASON_CODE values ($0, $13)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadEncounters].\nSuccessfully created loading jobs: [loadEncounters]."

In [38]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadImaging
CREATE LOADING JOB loadImaging FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX ImagingStudies values ($0, $4, $5, $6, $7, $8, $9, $1),
        TO VERTEX SnomedCode values ($4, $5),
        TO EDGE PATIENT_HAS_IMAGING values ($0, $2),
        TO EDGE ENCOUNTER_FOR_IMAGING values ($0, $3),
        TO EDGE IMAGING_CODE values ($0, $4)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadImaging].\nSuccessfully created loading jobs: [loadImaging]."

In [39]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadImmunizations
CREATE LOADING JOB loadImmunizations for graph MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Immunizations values ($0, $4, $5, $1, $6),
        TO VERTEX SnomedCode values ($4, $5),
        TO EDGE IMMUNIZATION_CODE values ($0, $4),
        TO EDGE PATIENT_HAS_IMMUNIZATION values ($0, $2),
        TO EDGE ENCOUNTER_FOR_IMMUNIZATION values ($0, $3)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadImmunizations].\nSuccessfully created loading jobs: [loadImmunizations]."

In [40]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadLocations
CREATE LOADING JOB loadLocations FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX StateSynthea values ($3),
        TO VERTEX CountySynthea values (gsql_concat($3, $5), $5),
        TO VERTEX CitySynthea values (gsql_concat($3, $2), $2),
        TO EDGE STATE_HAS_COUNTY_SYNTHEA values ($3, gsql_concat($3, $5)),
        TO EDGE COUNTY_HAS_CITY_SYNTHEA values (gsql_concat($3, $5), gsql_concat($3, $2))
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadLocations].\nSuccessfully created loading jobs: [loadLocations]."

In [41]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadMedications
CREATE LOADING JOB loadMedications FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Medication values ($0, $6, $7, $1, $2, $8, $9, $10, $11),
        TO VERTEX SnomedCode values ($6, $7),
        TO VERTEX SnomedCode values ($12, $13),
        TO EDGE MEDICATION_PAYER values ($0, $4),
        TO EDGE PATIENT_HAS_MEDICATION values ($0, $3),
        TO EDGE MEDICATION_REASON_CODE values ($0, $12),
        TO EDGE MEDICATION_CODE values ($0, $6),
        TO EDGE ENCOUNTER_FOR_MEDICATION values ($0, $5)
        USING HEADER="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadMedications].\nSuccessfully created loading jobs: [loadMedications]."

In [42]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadObservations
CREATE LOADING JOB loadObservations FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Observations values($0, $1, $4, $5, $6, $7),
        TO VERTEX SnomedCode values($4, $5),
        TO EDGE OBSERVATION_FOR_PATIENT values($0, $2),
        TO EDGE ENCOUNTER_FOR_OBSERVATION values($0, $3),
        TO EDGE OBSERVATION_CODE values($0, $4)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadObservations].\nSuccessfully created loading jobs: [loadObservations]."

In [44]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadOrganizations
CREATE LOADING JOB loadOrganizations FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Organizations values ($0, $1, $9, $10, $8),
        TO VERTEX AddressSynthea values (gsql_concat($6, $7), $2, $6, $7),
        TO EDGE ADDRESS_CITY_SYNTHEA values (gsql_concat($6, $7), $3),
        TO EDGE ADDRESS_ZIPCODE_SYNTHEA values (gsql_concat($6, $7), $5),
        TO EDGE ADDRESS_COUNTY_SYNTHEA values (gsql_concat($6, $7), $11),
        TO EDGE ORGANIZATION_ADDRESS values ($0, gsql_concat($6, $7))
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadOrganizations].\nSuccessfully created loading jobs: [loadOrganizations]."

In [46]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadPatient
CREATE LOADING JOB loadPatient FOR GRAPH MedGraph{
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Patient values ($0, $8, $7, $10, $1, $3, $4, $5, $23, $24, $9, $6, $11, $15),
        TO VERTEX Gender values ($14),
        TO VERTEX Race values ($12),
        TO VERTEX Ethnicity values ($13),
        TO VERTEX AddressSynthea values (gsql_concat($21, $22), $16, $21, $22),
        TO EDGE PATIENT_GENDER values ($0, $14),
        TO EDGE PATIENT_RACE values ($0, $12),
        TO EDGE PATIENT_ETHNICITY values ($0, $13),
        TO EDGE PATIENT_ADDRESS values ($0, gsql_concat($21, $22)),
        TO EDGE ADDRESS_COUNTY_SYNTHEA values (gsql_concat($21, $22), $19),
        TO EDGE ADDRESS_ZIPCODE_SYNTHEA values (gsql_concat($21, $22), $20),
        TO EDGE ADDRESS_CITY_SYNTHEA values (gsql_concat($21, $22), $17)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadPatient].\nSuccessfully created loading jobs: [loadPatient]."

In [47]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadPatientNotes
CREATE LOADING JOB loadPatientNotes FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Notes values ($0, $3, $4, $5, $6, $7, $8, $9),
        TO EDGE PATIENT_NOTE values($1, $0, $2)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadPatientNotes].\nSuccessfully created loading jobs: [loadPatientNotes]."

In [48]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadPatientSymptoms
CREATE LOADING JOB loadPatientSymptoms FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Symptoms values($0, $8, $9, $7),
        TO EDGE PATIENT_HAS_SYMPTOM values($1, $0, $5, $6)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadPatientSymptoms].\nSuccessfully created loading jobs: [loadPatientSymptoms]."

In [49]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadPayers
CREATE LOADING JOB loadPayers FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Payer values (gsql_concat($0, $20), $1, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20),
        TO VERTEX AddressSynthea values (gsql_concat($21, $22), $2, $21, $22),
        TO EDGE PAYER_ADDRESS values (gsql_concat($0, $20), gsql_concat($21, $22)),
        TO EDGE ADDRESS_CITY_SYNTHEA values (gsql_concat($21, $22), $3),
        TO EDGE ADDRESS_ZIPCODE_SYNTHEA values (gsql_concat($21, $22), $5),
        TO EDGE ADDRESS_COUNTY_SYNTHEA values (gsql_concat($21, $22), $23)
        USING header = "true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadPayers].\nSuccessfully created loading jobs: [loadPayers]."

In [50]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadPayerTransitions
CREATE LOADING JOB loadPayerTransitions FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1 
        TO EDGE PAYER_TRANSITION values ($3, $0, $1, $2, $4)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadPayerTransitions].\nSuccessfully created loading jobs: [loadPayerTransitions]."

In [51]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadProcedures
CREATE LOADING JOB loadProcedures FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Procedures values ($0, $4, $5, $6, $7),
        TO VERTEX SnomedCode values ($4, $5),
        TO VERTEX SnomedCode values ($7, $8),
        TO EDGE PROCEDURE_CODE values ($0, $4),
        TO EDGE PROCEDURE_REASON_CODE values ($0, $7),
        TO EDGE ENCOUNTER_FOR_PROCEDURE values ($0, $3),
        TO EDGE PATIENT_HAS_PROCEDURE values ($0, $2)
        USING header="true", separator=",";
}
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadProcedures].\nSuccessfully created loading jobs: [loadProcedures]."

In [52]:
conn.gsql("""
USE GRAPH MedGraph
BEGIN
DROP JOB loadProviders
CREATE LOADING JOB loadProviders FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX Providers values ($0, $2, $11, $4),
        TO VERTEX AddressSynthea values (gsql_concat($9, $10), $5, $9, $10),
        TO EDGE PROVIDER_GENDER values ($0, $3),
        TO EDGE ADDRESS_CITY_SYNTHEA values (gsql_concat($9, $10), $6),
        TO EDGE ADDRESS_ZIPCODE_SYNTHEA values (gsql_concat($9, $10), $8),
        TO EDGE ADDRESS_COUNTY_SYNTHEA values (gsql_concat($9, $10), $12),
        TO EDGE PROVIDER_HAS_ORGANIZATION values ($0, $1),
        TO EDGE PROVIDER_ADDRESS values ($0, gsql_concat($9, $10))
        USING header="true", separator=",";
    }
END
""")

"Using graph 'MedGraph'\nSemantic Check Fails: These jobs could not be found anywhere: [loadProviders].\nSuccessfully created loading jobs: [loadProviders]."

In [54]:
conn.gsql("""
BEGIN
DROP JOB loadZips
CREATE LOADING JOB loadZips FOR GRAPH MedGraph {
    DEFINE filename f1;
    LOAD f1
        TO VERTEX ZipCodeSynthea values($4),
        TO EDGE CITY_HAS_ZIPCODE_SYNTHEA values (gsql_concat($1, $3), $4)
        USING header="true", separator=",";
}
END
""")

'Semantic Check Fails: These jobs could not be found anywhere: [loadZips].\nSuccessfully created loading jobs: [loadZips].'

## Loading Data

Now that the loading jobs have been created, we can begin actually loading in data. We'll be stepping away from the GSQL heavy work that we've been using so far and switch back to more python oriented code for loading.

First, we load the data file into a variable.

`uploadFile()` requires 3 inputs:
- `filePath` - The actual data file to load
- `fileTag` - This is the name of the variable that the file will correspond to in the loading job. If you remember, we're using `f1` as our FILENAME variable in the loading jobs.
- `jobName` - The name of the corresponding loading job to run

In [55]:
patients_file = './Synthea-Medgraph/data/patients copy.csv'
results = conn.uploadFile(patients_file, fileTag='f1', jobName='loadPatient')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 496,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Gender",
          "validObject": 496,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Race",
          "validObject": 496,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Ethnicity",
          "validObject": 496,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimary

In [56]:
locations_file = './Synthea-Medgraph/data/demographics copy.csv'
results = conn.uploadFile(locations_file, fileTag='f1', jobName='loadLocations')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 34491,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "CitySynthea",
          "validObject": 34491,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "StateSynthea",
          "validObject": 34491,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "CountySynthea",
          "validObject": 34491,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,


In [57]:
zips_file = './Synthea-Medgraph/data/zipcodes copy.csv'
results = conn.uploadFile(zips_file, fileTag='f1', jobName='loadZips')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 35998,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "ZipCodeSynthea",
          "validObject": 35998,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "CITY_HAS_ZIPCODE_SYNTHEA",
          "validObject": 35998,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "deleteVertex": [],
      "deleteEdge": []
    }
  }
]


In [58]:
attributes_file = './Synthea-Medgraph/data/Notes tokenized copy.csv'
results = conn.uploadFile(attributes_file, fileTag='f1', jobName='loadAttributes')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 9920,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Attribute",
          "validObject": 9920,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_ATTRIBUTE",
          "validObject": 9920,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "deleteVertex": [],
      "deleteEdge": []
    }
  }
]


In [59]:
symptoms_file = './Synthea-Medgraph/data/normalizedSymptoms copy.csv'
results = conn.uploadFile(symptoms_file, fileTag='f1', jobName='loadPatientSymptoms')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 66939,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Symptoms",
          "validObject": 55419,
          "noIdFound": 0,
          "invalidAttribute": 11520,
          "invalidAttributeLines": [
            "1:symptomValue",
            "2:symptomValue",
            "3:symptomValue",
            "40:symptomValue",
            "41:symptomValue",
            "81:symptomValue",
            "91:symptomValue",
            "92:symptomValue",
            "93:symptomValue",
            "125:symptomValue",
            "174:symptomValue",
            "175:symptomValue",
            "176:symptomValue",
            "177:symptomValue",
            "178:symptomValue",
            "179:symptomValue",
            "180:symptomValue",
            "181:symptomValue",
            "182:symp

In [60]:
allergies_file = './Synthea-Medgraph/data/allergies copy.csv'
results = conn.uploadFile(allergies_file, fileTag='f1', jobName='loadAllergies')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 322,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Allergies",
          "validObject": 322,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 322,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_ALLERGY",
          "validObject": 322,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "inval

In [61]:
careplans_file = './Synthea-Medgraph/data/careplans copy.csv'
results = conn.uploadFile(careplans_file, fileTag='f1', jobName='loadCareplans')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 3949,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "SnomedCode",
          "validObject": 3949,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 3760,
          "noIdFound": 189,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Careplans",
          "validObject": 3949,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
         

In [62]:
conditions_file = './Synthea-Medgraph/data/conditions copy.csv'
results = conn.uploadFile(conditions_file, fileTag='f1', jobName='loadConditions')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 10129,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "SnomedCode",
          "validObject": 10129,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Conditions",
          "validObject": 10129,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_CONDITION",
          "validObject": 10129,
          "noIdFound": 0,
          "invalidAttribute": 0,
     

In [64]:
devices_file = './Synthea-Medgraph/data/devices copy.csv'
results = conn.uploadFile(devices_file, fileTag='f1', jobName='loadDevices')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 23,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "SnomedCode",
          "validObject": 23,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Device",
          "validObject": 23,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_DEVICE",
          "validObject": 23,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertex

In [65]:
encounters_file = './Synthea-Medgraph/data/encounters copy.csv'
results = conn.uploadFile(encounters_file, fileTag='f1', jobName='loadEncounters')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 54647,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Encounter",
          "validObject": 54647,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 54647,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 18276,
          "noIdFound": 36371,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
   

In [66]:
imaging_file = './Synthea-Medgraph/data/imaging_studies copy.csv'
results = conn.uploadFile(imaging_file, fileTag='f1', jobName='loadImaging')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 386,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "ImagingStudies",
          "validObject": 386,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 386,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_IMAGING",
          "validObject": 386,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "

In [67]:
immunizations_file = './Synthea-Medgraph/data/immunizations copy.csv'
results = conn.uploadFile(immunizations_file, fileTag='f1', jobName='loadImmunizations')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 25164,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Immunizations",
          "validObject": 25164,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 25164,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "PATIENT_HAS_IMMUNIZATION",
          "validObject": 25164,
          "noIdFound": 0,
          "invalidAttribute": 0,

In [68]:
medications_file = './Synthea-Medgraph/data/medications copy.csv'
results = conn.uploadFile(medications_file, fileTag='f1', jobName='loadMedications')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 20069,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Medication",
          "validObject": 20069,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 20069,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 13258,
          "noIdFound": 6811,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
   

In [69]:
observations_file = './Synthea-Medgraph/data/observations copy.csv'
results = conn.uploadFile(observations_file, fileTag='f1', jobName='loadObservations')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 406834,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Observations",
          "validObject": 406834,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 406834,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "ENCOUNTER_FOR_OBSERVATION",
          "validObject": 348391,
          "noIdFound": 58443,
          "invalidAttrib

In [70]:
organizations_file = './Synthea-Medgraph/data/organizations copy.csv'
results = conn.uploadFile(organizations_file, fileTag='f1', jobName='loadOrganizations')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 1273,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "AddressSynthea",
          "validObject": 1273,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Organizations",
          "validObject": 1273,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "ADDRESS_CITY_SYNTHEA",
          "validObject": 1273,
          "noIdFound": 0,
          "invalidAttribute": 0,
   

In [71]:
transitions_file = './Synthea-Medgraph/data/payer_transitions copy.csv'
results = conn.uploadFile(transitions_file, fileTag='f1', jobName='loadPayerTransitions')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 1535,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [],
      "edge": [
        {
          "typeName": "PAYER_TRANSITION",
          "validObject": 1535,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "deleteVertex": [],
      "deleteEdge": []
    }
  }
]


In [72]:
payers_file = './Synthea-Medgraph/data/payers copy.csv'
results = conn.uploadFile(payers_file, fileTag='f1', jobName='loadPayers')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 500,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "AddressSynthea",
          "validObject": 500,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Payer",
          "validObject": 494,
          "noIdFound": 0,
          "invalidAttribute": 6,
          "invalidAttributeLines": [
            "40:QOLS_Avg",
            "80:QOLS_Avg",
            "170:QOLS_Avg",
            "200:QOLS_Avg",
            "250:QOLS_Avg",
            "340:QOLS_Avg"
          ],
          "invalidAttributeLinesData": [
            "b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,NO_INSURANCE,,,,00nan,,0,43241.6,

In [73]:
procedures_file = './Synthea-Medgraph/data/procedures copy.csv'
results = conn.uploadFile(procedures_file, fileTag='f1', jobName='loadProcedures')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 58807,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "Procedures",
          "validObject": 58807,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 58807,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "SnomedCode",
          "validObject": 38979,
          "noIdFound": 19828,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
  

In [74]:
providers_file = './Synthea-Medgraph/data/providers copy.csv'
results = conn.uploadFile(providers_file, fileTag='f1', jobName='loadProviders')
print(json.dumps(results, indent=2))

[
  {
    "sourceFileName": "Online_POST",
    "statistics": {
      "validLine": 1436,
      "rejectLine": 0,
      "failedConditionLine": 0,
      "notEnoughToken": 0,
      "invalidJson": 0,
      "oversizeToken": 0,
      "vertex": [
        {
          "typeName": "AddressSynthea",
          "validObject": 1436,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        },
        {
          "typeName": "Providers",
          "validObject": 1436,
          "noIdFound": 0,
          "invalidAttribute": 0,
          "invalidVertexType": 0,
          "invalidPrimaryId": 0,
          "invalidSecondaryId": 0,
          "incorrectFixedBinaryLength": 0
        }
      ],
      "edge": [
        {
          "typeName": "ADDRESS_CITY_SYNTHEA",
          "validObject": 1436,
          "noIdFound": 0,
          "invalidAttribute": 0,
       