# Python HL7 to JSON converter

### MSDS 5023 Information Structures
### Professor Best
### Sarah Grotelueschen
### October 8, 2015

### Final Paper

#### Objectives: Write a program that will enable the transformation of an HL7 file into JSON format using Python. Test the program using 3 HL7 files provided by Professor Best. Recommended approaches include: parsing all of the segments, making sure to keep track of positions; creating an object for each segment, keeping track of the order of the segments, then outputting in JSON format.

##### I hand-typed this code based on original code developed by Mary Van Valkenburg. I added the comments and changed some of the variables as I worked through the code to understand it.

In HL7, order is extremely important, so we need to use a data structure that efficiently supports storing and processing data in order. To the extent possible, we will utilize arrays to convert the data.

There are five special characters in HL7 format:

| Used to separate the field values

^ Indicates a component (subset of a field)

~ Indicates repeat of a component

\ Indicates an escape character

& Indicates the continuation of a field

I attempted to address all, but was unable to complete addressing the ampersand (&).

##### Proposed approach:
I will create a function that simultaneously stores the original HL7 data in arrays and builds a JSON format. To do this, I will work from the outside in, utilizing the enumerate function to identify the location of the objects in the array for building the positional identifier.

##### Steps*:
1. Create an array of the segments (rows) in the HL7 file. 
2. Create an array of the fields (values between the ||) in the HL7 file.
3. Create an array of the components (field values with ^ ) in the HL7 file.
4. Create an array of the subcomponents (field values with ~ ) in the HL7 file.
5. Build the JSON file format using the above arrays, combined with the appropriate positional identifiers and {}

###### * Following creation of each array, utilizing if/else statements, I create the related JSON format, which includes a combination of {}, a positional identifier, and the value from the array.

Below is an example of how the positional identifers will output. After each of the positional identifiers, there will be a colon, followed by the appropriate value for the JSON file.
Example:  {EVN|A08|201507230021|||R.CA.NAH^HALL^NANCY^A^^^}

'EVN' ID = EVN.1.1

'A08' ID = EVN.1.2

'201507230021' = EVN.1.3

' ' = EVN.1.4

' ' = EVN.1.5

'R.CA.NAH' = EVN.1.6.1

'HALL' = EVN.1.6.2

'NANCY' = EVN.1.6.3

'A' = EVN.1.6.4

'^' = EVN.1.6.5

'^' = EVN.1.6.6

Additional steps that could be completed with more time:
(a) further splitting the date into day, month, year
(b) addressing the ampersand (continuation) special character

In [1]:
import simplejson as json

In [2]:
#Save the files for convenient reference later

rec1 = open('/Users/grotel/Desktop/msg1.txt', mode = 'U')
rec2 = open('/Users/grotel/Desktop/msg2.txt', mode = 'U')
rec3 = open('/Users/grotel/Desktop/msg3.txt', mode = 'U')

In [3]:
# Build a function that will create the JSON file

def hl7_to_json(record):
    segment_list = []                           #Build empty array to populate the segments (rows) into
    json_file = '{' + 'record:'                 #JSON format requires curly bracket to start, so I start there
    for row in record:
        segment = row.strip('\n')               #Remove the new row reference
        segment_list.append(segment)            #Output the segment
        for segment in segment_list:       
            seg_baseID = segment[0:3]           #Store the first three letters of the row to be used as an ID later
            fields = []                         #Create an empty array to populate the fields in
            fields = segment.split("|")         #Split the segment into fields by splitting on |
        for ID, field in enumerate(fields):     #Get position info for identifier and the related value
            if(field != '' and field != '^~\&'): #Ignore fields that are blank or filled with the special character
                if '^' in field:                #For fields that contain a carat...
                    components = []             #Create an empty array for the components
                    components = field.split('^')     #Split the fields on ^ to get the components
                    for ID2, component in enumerate(components):    #Get position info for identifier and the related value
                        if (component != ''):           #Ignore fields that are blank
                            if '~' in component:        #For the fields that contain ~                  
                                subcomponents =[]
                                subcomponents = components.split('~')     #Split the component on ~
                                for ID3, subcomponent in enumerate(subcomponents):       #Get position info for identifier
                                    if(subcomponent != ''):                        
                                        json_file = json_file + '{' + seg_baseID + '.' + str(ID +1) + '.' + str(ID2 +1) + '.' + str(ID3 +1) + ':' + subcomponent + '}'   #Add curlies and ID to output                   
                        else:
                            json_file = json_file + '{' + seg_baseID + '.' + str(ID + 1) + '.' + str(ID2 +1) +':' + component + '}'  #Add curlies and ID to output
                else:
                    json_file = json_file + '{' + seg_baseID + '.' + str(ID +1) + ':' + field + '}'   #Add curlies and ID to output    
    
    json_file = json_file + '}'   #Add final curly bracket to end of file    
    json_file2 = json.dumps(json_file, indent=2)    

    print json_file
            

In [4]:
# Call the function for the msg1 record

hl7_to_json(rec1)

{record:{MSH.1:MSH}{MSH.4:ABCDE}{MSH.7:201507230021}{MSH.10:CAGTADM.1.10532994}{MSH.11:P}{MSH.12:2.1}{EVN.1:EVN}{EVN.2:A08}{EVN.3:201507230021}{EVN.6.5:}{EVN.6.6:}{EVN.6.7:}{PID.1:PID}{PID.2:1}{PID.4:V000042610}{PID.5:M88604}{PID.6.4:}{PID.6.5:}{PID.6.6:}{PID.8:19510121}{PID.9:M}{PID.10.1:}{PID.10.2:}{PID.10.3:}{PID.10.4:}{PID.10.5:}{PID.10.6:}{PID.11:W}{PID.12.2:}{PID.12.7:}{PID.12.8:}{PID.14:(000)000-0000}{PID.15:(000)000-0000}{PID.16:ENG}{PID.17:D}{PID.18:OTH}{PID.19:V07016760770}{PID.20:000-00-0000}{NK1.1:NK1}{NK1.2:1}{NK1.3.3:}{NK1.3.4:}{NK1.3.5:}{NK1.3.6:}{NK1.4:OT}{NK1.5.2:}{NK1.5.7:}{NK1.5.8:}{NK1.6:(000)000-0000}{NK1.7:(000)000-0000}{NK1.1:NK1}{NK1.2:2}{NK1.3.3:}{NK1.3.4:}{NK1.3.5:}{NK1.3.6:}{NK1.4:OT}{NK1.5.2:}{NK1.5.7:}{NK1.5.8:}{NK1.6:(000)000-0000}{NK1.7:(000)000-0000}{PV1.1:PV1}{PV1.2:1}{PV1.3:E}{PV1.4.2:}{PV1.4.3:}{PV1.5:EM}{PV1.8.4:}{PV1.8.5:}{PV1.9.4:}{PV1.9.5:}{PV1.9.6:}{PV1.9.7:}{PV1.10.4:}{PV1.10.5:}{PV1.11:ER}{PV1.15:PR}{PV1.16:WI}{PV1.17:N}{PV1.19:ER}{PV1.21:03}{P

In [5]:
# Call the function for the msg2 record

hl7_to_json(rec2)

{record:{MSH.1:MSH}{MSH.4:ABCDE}{MSH.7:201507221833}{MSH.10:CAGTADM.1.10532250}{MSH.11:P}{MSH.12:2.1}{EVN.1:EVN}{EVN.2:A08}{EVN.3:201507221833}{EVN.6.5:}{EVN.6.6:}{EVN.6.7:}{PID.1:PID}{PID.2:1}{PID.4:V000303465}{PID.5:V256537}{PID.6.4:}{PID.6.5:}{PID.6.6:}{PID.6.7:}{PID.8:19961213}{PID.9:F}{PID.10.1:}{PID.10.2:}{PID.10.3:}{PID.10.4:}{PID.10.5:}{PID.10.6:}{PID.11:W}{PID.12.2:}{PID.12.7:}{PID.12.8:}{PID.14:(000)000-0000}{PID.15:(000)000-0000}{PID.16:ENG}{PID.17:S}{PID.18:NON}{PID.19:V07016754636}{PID.20:000-00-0000}{PV1.1:PV1}{PV1.2:1}{PV1.3:E}{PV1.4.2:}{PV1.4.3:}{PV1.5:EM}{PV1.8.4:}{PV1.8.5:}{PV1.9.4:}{PV1.9.5:}{PV1.10.4:}{PV1.10.5:}{PV1.11:OBED}{PV1.15:CR}{PV1.16:WI}{PV1.17:N}{PV1.19:ER}{PV1.21:03}{PV1.37:HOM}{PV1.40.2:}{PV1.41:CONTRACTIONS}{PV1.42:DEP}{PV1.45:201507202309}{PV1.46:201507210150}{ACC.1:ACC}{ACC.2.2:}{ACC.3:10}{GT1.1:GT1}{GT1.2:1}{GT1.4.4:}{GT1.4.5:}{GT1.4.6:}{GT1.4.7:}{GT1.6.2:}{GT1.6.7:}{GT1.6.8:}{GT1.7:(000)000-0000}{GT1.9:19961213}{GT1.10:F}{GT1.12:SA}{GT1.13:000-00-0

In [6]:
# Call the function for the msg3 record

hl7_to_json(rec3)

{record:{MSH.1:MSH}{MSH.4:ABCDE}{MSH.7:201507230002}{MSH.10:CAGTADM.1.10532923}{MSH.11:P}{MSH.12:2.1}{EVN.1:EVN}{EVN.2:A03}{EVN.3:201507230001}{EVN.6.2:}{EVN.6.3:}{EVN.6.4:}{EVN.6.5:}{EVN.6.6:}{EVN.6.7:}{PID.1:PID}{PID.2:1}{PID.4:V000027071}{PID.5:V24686}{PID.6.4:}{PID.6.5:}{PID.6.6:}{PID.8:19820215}{PID.9:M}{PID.10.1:}{PID.10.2:}{PID.10.3:}{PID.10.4:}{PID.10.5:}{PID.10.6:}{PID.11:W}{PID.12.2:}{PID.12.7:}{PID.12.8:}{PID.14:(000)000-0000}{PID.15:(000)000-0000}{PID.16:ENG}{PID.17:S}{PID.18:CHR}{PID.19:V07016746578}{PID.20:000-00-0000}{PV1.1:PV1}{PV1.2:1}{PV1.3:O}{PV1.4.2:}{PV1.4.3:}{PV1.5:EL}{PV1.8.3:}{PV1.8.4:}{PV1.8.6:}{PV1.9.3:}{PV1.9.4:}{PV1.9.6:}{PV1.10.5:}{PV1.10.6:}{PV1.10.7:}{PV1.11:SURG}{PV1.15:CR}{PV1.17:N}{PV1.19:SDC}{PV1.21:01}{PV1.37:HOM}{PV1.40.2:}{PV1.41:URETHERAL STRICTURE DISEASE      CPT:  52276,52341}{PV1.42:DEP}{PV1.45:201507220855}{PV1.46:201507220855}{AL1.1:AL1}{AL1.2:1}{AL1.3:DA}{AL1.5:SV}{AL1.6:RASH,ITCH,THROAT SWELLING}{AL1.7:20150720}{AL1.1:AL1}{AL1.2:2}{AL1.3:D