# Format NSSP Priority List

###  Files:

 
 <u>Input File(s):</u>     
<ul>
    <li> NSSP_Priority_Elements_copy.csv </li> 
</ul>

 <u>Output File(s):</u>

<ul>
    <li> <i> column_guide_with_key.csv  </i> </li>
</ul>

### Description:

We want to take the NSSP_Priority_Elements file and add columns to it that more precisely describe where exactly we need to look in a message to find an element.  Examine the following example:


![markdown_refs_NSSPkey.png](attachment:markdown_refs_NSSPkey.png)

The <b>Message_Date_Time</b> has a location of MSH.7.1.  Using the Python Library HL7, we can use the following code to calculate the value of <b>Message_Date_Time</b>.  

Note that the extra padding 0s are just a result of the library's output.  Great!  We were able to get <b>Message_Date_Time</b>. 

Now when we look at <b>C_Patient_Age_Years</b>, the location is 'OBX-2,OBX-3,OBX-5'.  Even though it seems to have a similar format: 

ABC (some separator) #

it means something very different.  It is actually the combination of OBX#.2 , OBX#.3 , OBX#.5 where # is some unknown number.  

----------------------------

By parsing through these HL7 descriptions, we can create additional columns that can be interpreted in Python the same way that we are able to interpret the text in our heads.

In [1]:
##########################################################################################################
# Import Libraries
##########################################################################################################

# Import[ant] libaries @('_')@

import pandas as pd
import numpy as np
import os

##########################################################################################################
# Import NSSP_Priority_Element_copy and format it to our liking
##########################################################################################################

# Change working directory
os.chdir('../data/raw/')

# Import nssp_priority csv file, drop NAs, clean up 
nssp_priority = pd.read_csv('NSSP_Priority_Elements_copy.csv',usecols=[0,1,2,3,4])

# get rid of all rows with NA
nssp_priority.dropna(inplace=True)

# remove spaces, new lines, replace - with . and $ with ,
nssp_priority['HL7'] = nssp_priority['HL7'].str.replace(' ','').str.replace('\\n','').str.replace('-','.')
nssp_priority['HL7'] = nssp_priority['HL7'].str.replace('$',',')

##########################################################################################################
# Parse the existing 'HL7' column into subcomponents that will form new columns
##########################################################################################################

# Create empty column names to eventually fill in
nssp_priority['Seg0'] = np.nan
nssp_priority['0_0'] = np.nan
nssp_priority['0_1'] = np.nan
nssp_priority['0_2'] = np.nan

nssp_priority['Seg1'] = np.nan
nssp_priority['1_0'] = np.nan

# Split any HL7 locations on the word 'or'
split_or = nssp_priority['HL7'].str.split('or',expand=True)

# Divide into segment name & number indeces
seg_or0 = split_or[0].str[:3]
nums_or0 = split_or[0].str[4:]

seg_or1 = split_or[1].str[:3]
nums_or1 = split_or[1].str[4:]

# Split Further into ',' sections.  Should be a max of 3
nums_or0_split = nums_or0.str.split(',',expand=True)

##########################################################################################################
# Assign new lists to dataframe as new columns
##########################################################################################################

# Begin assigning lists to dataframe columns.  Manual for Type and Key (see markdown for more info)
nssp_priority.loc[:,'Seg0'] = seg_or0
nssp_priority.loc[:,'0_0'] = nums_or0_split[0]
nssp_priority.loc[:,'0_1'] = nums_or0_split[1]
nssp_priority.loc[:,'0_2'] = nums_or0_split[2]

nssp_priority.loc[:,'Seg1'] = seg_or1
nssp_priority.loc[:,'1_0'] = nums_or1

# Manually enter type and keywords

nssp_priority['Keywords'] = np.nan
nssp_priority['Type'] = np.nan


nssp_priority.loc[:,'Type'] = [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0,
         1, 0, 0, 0, 0, 0, 0]

nssp_priority.loc[:,'Keywords'] = [np.nan, np.nan, 'SS003|Facility.Type', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan,
        '8661-1|Chief.?Complaint', np.nan, np.nan, np.nan, np.nan, 'SS003|Facility.Type', np.nan, np.nan,
        np.nan, np.nan, '21612-7|Patient\\W?Age', 'age.*(years?)|age.*(months?)|age.*(days?)',
        '21612-7|Patient\\W?Age', np.nan, np.nan, np.nan, np.nan, np.nan, '21612-7|Patient\\W?Age',
        np.nan, 'age.*(years?)|age.*(months?)|age.*(days?)', np.nan, np.nan, np.nan, np.nan,
        '8661-1|Chief.?Complaint', np.nan, np.nan, np.nan, '8661-1|Chief.?Complaint', np.nan, np.nan]

##########################################################################################################
# Send to CSV
##########################################################################################################

nssp_priority.to_csv('../processed/column_guide_with_key.csv',index=False)

The output with visible headers looks like this (with more rows):

![markdown_refs_NSSPkey2.png](attachment:markdown_refs_NSSPkey2.png)

#### Note on Keywords & Type

If you noticed in the code, I manually created lists to describe the Processed Column's <b>Keywords</b> and <b>Type</b>. 

For the OBX sections, we need to loop through all OBX[1] -> OBX[n] fields to search for the specific element we are looking for.  We use Regular Expressions to do this.  The expression searches a field for specific key indicators (keywords) related to the desired element.  That is why all OBX rows have a non-null value in the <b>Keywords</b> column

To indicate that a certain row is an OBX row and therefore requires RegEx, we assign it a <b>Type</b> value of 1.  Otherwise it is 0.  

* For the very specific instance where we look for age units, we have a special RegEx search utilizing RegEx groups.  It is the only row assigned a <b>Type</b> of 2.