# Extract BPMN from Knowledge Graph

This script creates a BPMN XML file from the nodes in the Knowledge Graph. It assumes that the user has run `./CPG_to_KG.ipynb` first. **Please** see there for setup of Neo4J and Jupyter Environment.

## Disclaimer
Nothing provided here is guaranteed or warrantied to work. It is provided as is. Using this notebook is at the risk of the user. 

The CPG used is published by the American Academy of Pediatrics and is solly their product. Nothing here should be inferred to supersede what it says. 

**Nothing here should be used for delivering care.** It is in no way certified by any credited medical organization or professional. It is provided solly for research purposes.

## Setup

In addition to the instructions from `./CPG_to_KG.ipynb`, you will also need to create a local folder called `./working`. This folder is ignored by git and is where the resulting BPMN XML will be written. 


## Import Python Libraries

In [None]:
import os
import uuid

import NEO4J_Graph

## Connect to the Graph DB

In [None]:
NEO4J_URI = os.getenv('CPG_URL')
USERNAME = os.getenv('CPG_USER')
PASSWORD = os.getenv('CPG_PASSWORD')
DATABASE = os.getenv('CPG_DATABASE')

graph = NEO4J_Graph.Graph(NEO4J_URI, USERNAME, PASSWORD, DATABASE)

## Define Helper Functions

In [None]:
def escape(str_xml:str)->str:
    """
    Escape a string so that it can be put in XML without causing issues.  
    """
    str_xml = str_xml.replace("&", "&amp;")
    str_xml = str_xml.replace("<", "&lt;")
    str_xml = str_xml.replace(">", "&gt;")
    str_xml = str_xml.replace("\"", "&quot;")
    str_xml = str_xml.replace("'", "&apos;")
    return str_xml

def feel_express(str_xml:str)->str:
    """
    Convert from JavaScript expression to a FEEL expression. 
    """
    str_xml = str_xml.replace("&&", "and")
    str_xml = str_xml.replace("||", "or")
    str_xml = str_xml.replace(";", "")
    return str_xml

class BpmnData:
    """
    Helper class to capture elements that will need to referred to multiple times.
    """
    def __init__(self, name:str, question:str):
        self.name = name
        self.question = question
        self.ext_id = f'_{str(uuid.uuid4())}'
        self.int_id = f'_{str(uuid.uuid4())}'
        self.obj_id = f'_{str(uuid.uuid4())}'
        self.obj_ref_id = f'_{str(uuid.uuid4())}'

## Write the BPMN XML

This cell does all the work. 

In [None]:
# Boilerplate XML stuff
KN_BPMN = '''<?xml version="1.0" encoding="UTF-8"?>
<bpmn2:definitions xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.omg.org/bpmn20"  xmlns:trisobpmn="http://www.trisotech.com/2014/triso/bpmn" xmlns:triso="http://www.trisotech.com/2015/triso/modeling" xmlns:bpmn2="http://www.omg.org/spec/BPMN/20100524/MODEL" exporterVersion="2.0" targetNamespace="http://www.omg.org/bpmn20" xmlns:feel="https://www.omg.org/spec/DMN/20230324/FEEL/">'''

# A definition of a number type to for inputs.
number_item_def_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
    <bpmn2:itemDefinition id="{number_item_def_id}" triso:readOnly="false" isCollection="false" triso:basicType="false" structureRef="feel:number" triso:definitionType="http://www.trisotech.com/2015/triso/modeling/ItemDefinitionType" triso:name="Item Definition Number"/>'''

# Start the process
KN_BPMN += '''
    <bpmn2:process id="kn_to_bpmn" name="CPG as BPMN" isExecutable="true" processType="Public">'''

# Load the global inputs and convert them to BpmnData objects
cypher = '''
match (input:GlobalInput) return input
'''
global_inputs, _ = graph.query(cypher)

bpmn_datas = []
for global_input in global_inputs:
    bpmn_datas.append(BpmnData(global_input[0]['name'], global_input[0]['question']))

# Define the global inputs within the XML
KN_BPMN += f'''
    <bpmn2:ioSpecification>'''

for data in bpmn_datas:
    KN_BPMN += f'''
        <bpmn2:dataInput name="{escape(data.question)}" itemSubjectRef="{number_item_def_id}" isCollection="false" id="{data.ext_id}" />'''


input_set_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
        <bpmn2:inputSet id="{input_set_id}">'''
for data in bpmn_datas:
    KN_BPMN += f'''
            <bpmn2:dataInputRefs>{data.ext_id}</bpmn2:dataInputRefs>'''
KN_BPMN += f'''
        </bpmn2:inputSet>'''

KN_BPMN += f'''
        <bpmn2:outputSet/>'''
KN_BPMN += f'''
    </bpmn2:ioSpecification>'''

# Start Event for the root node
cypher = '''
match (root:Header1) return root
'''
root, _ = graph.query(cypher)

root = root[0][0]

start_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
    <bpmn2:startEvent id="{start_id}" name="{escape(root['name'])}"/>'''


# Find all the header 2 nodes that are the stages
cypher = f'''
match (parent)-[]-(node:Header2)
where elementId(parent) = '{root.element_id}'
return node
order by node.order
'''
headers,_ = graph.query(cypher)

# Figure out the combined expression to stage the patient.
calc = ''
for header in headers:
    header = header[0]

    cypher = f'''
    match (parent)-[:FUNCTION]->(func:Function)
    where elementId(parent) = '{header.element_id}'
    return func
    '''
    func,_ = graph.query(cypher)
    func = func[0][0]
    
    if (len(calc) > 0):
        calc += '\n else '
    calc += f'''if ({func['name']}) then "{header['name']}"'''

calc += '\n else "Unknown"'   
calc = feel_express(calc)

# Create Script Task for the staging calculation
calc_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
        <bpmn2:scriptTask id="{calc_id}" name="Stage" triso:unparsed="false" scriptFormat="application/feel">'''

KN_BPMN += f'''
            <bpmn2:ioSpecification>'''

# Map global inputs into the Script Task
for data in bpmn_datas:
    KN_BPMN += f'''
                <bpmn2:dataInput name="{escape(data.name)}" itemSubjectRef="{number_item_def_id}" isCollection="false" id="{data.int_id}" />'''


input_set_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
                <bpmn2:inputSet id="{input_set_id}">'''
for data in bpmn_datas:
    KN_BPMN += f'''
                    <bpmn2:dataInputRefs>{data.int_id}</bpmn2:dataInputRefs>'''
KN_BPMN += f'''
                </bpmn2:inputSet>'''

output = BpmnData('Stage', 'What stage is it?')
output_set_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
                <bpmn2:dataOutput name="Stage" triso:readOnly="false" itemSubjectRef="_triso-default-bpmnItemDefinition-string_id" isCollection="false" triso:hidden="false" id="{output.int_id}"/>
                <bpmn2:outputSet id="{output_set_id}">
                    <bpmn2:dataOutputRefs>{output.int_id}</bpmn2:dataOutputRefs>
                </bpmn2:outputSet>'''
KN_BPMN += f'''
            </bpmn2:ioSpecification>'''

for data in bpmn_datas:
    association_id = f'_{str(uuid.uuid4())}'
    KN_BPMN += f'''
            <bpmn2:dataInputAssociation id="{association_id}">
                <bpmn2:sourceRef>{data.ext_id}</bpmn2:sourceRef>
                <bpmn2:targetRef>{data.int_id}</bpmn2:targetRef>
            </bpmn2:dataInputAssociation>'''

KN_BPMN += f'''
            <bpmn2:dataOutputAssociation id="_{str(uuid.uuid4())}">
                <bpmn2:sourceRef>{output.int_id}</bpmn2:sourceRef>
                <bpmn2:targetRef>{output.ext_id}</bpmn2:targetRef>
            </bpmn2:dataOutputAssociation>
            <bpmn2:script><![CDATA[{calc}]]></bpmn2:script>
        </bpmn2:scriptTask>'''

# Create a Data Object to capture the output from Script Task
KN_BPMN += f'''
        <bpmn2:dataObject id="{output.obj_id}" name="{output.name}" triso:readOnly="false" itemSubjectRef="_triso-default-bpmnItemDefinition-string_id" isCollection="false"/>
        <bpmn2:dataObjectReference id="{output.ext_id}" name="{output.name}" triso:readOnly="false" dataObjectRef="{output.obj_id}"/>'''

# Connect the Start Event to the Script Task
flow_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
    <bpmn2:sequenceFlow id="{flow_id}" sourceRef="{start_id}"  targetRef="{calc_id}"/>'''

# Create an Exclusive Gateway to handle the results of the Script Task
gateway_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
    <bpmn2:exclusiveGateway id="{gateway_id}" gatewayDirection="Diverging"/>'''

# Connect the Script Task with the Gateway
flow_id = f'_{str(uuid.uuid4())}'
KN_BPMN += f'''
    <bpmn2:sequenceFlow id="{flow_id}" sourceRef="{calc_id}"  targetRef="{gateway_id}"/>'''

# Create a path per header
for header in headers:
    header = header[0]

    # Find all the outputs for this section
    cypher = f'''
    match (parent)-[:OUTPUT]->(out:Output)
    where elementId(parent) = '{header.element_id}'
    return out
    order by out.order
    '''
    outs,_ = graph.query(cypher)
    
    # Create a task for the first output
    out = outs[0][0]
    header_id = f'_{str(uuid.uuid4())}'
    KN_BPMN += f'''
        <bpmn2:task id="{header_id}" name="{escape(out['output'])}">
          <bpmn2:documentation id="{f'_{str(uuid.uuid4())}'}"><![CDATA[{out['output']}]]></bpmn2:documentation>
        </bpmn2:task>'''
    
    # Connect the first output to the Gateway with the right expression
    stage_id = f'_{str(uuid.uuid4())}'
    KN_BPMN += f'''
        <bpmn2:sequenceFlow id="{stage_id}"  name="{escape(header['name'])}" sourceRef="{gateway_id}"  targetRef="{header_id}">
            <bpmn2:conditionExpression language="https://www.omg.org/spec/DMN/20230324/FEEL/" triso:unparsed="false" xsi:type="bpmn2:tFormalExpression"><![CDATA[{output.name} = "{header['name']}"]]></bpmn2:conditionExpression>
        </bpmn2:sequenceFlow>'''
    
    # Add any other outputs after the first one
    if len(outs) > 1:
        for i in range(1, len(outs)):
            out = outs[i][0]
            last_header_id = header_id
            header_id = f'_{str(uuid.uuid4())}'
            KN_BPMN += f'''
        <bpmn2:task id="{header_id}" name="{escape(out['output'])}">
          <bpmn2:documentation id="{f'_{str(uuid.uuid4())}'}"><![CDATA[{out['output']}]]></bpmn2:documentation>
        </bpmn2:task>'''

            stage_id = f'_{str(uuid.uuid4())}'
            KN_BPMN += f'''
        <bpmn2:sequenceFlow id="{stage_id}" sourceRef="{last_header_id}"  targetRef="{header_id}"/>'''
            
    # End each stages output list
    end_id = f'_{str(uuid.uuid4())}'
    flow_id = f'_{str(uuid.uuid4())}'
    KN_BPMN += f'''
        <bpmn2:endEvent id="{end_id}"/>
        <bpmn2:sequenceFlow id="{flow_id}" sourceRef="{header_id}"  targetRef="{end_id}"/>'''

# Close the xml tags
KN_BPMN += '''
  </bpmn2:process>
</bpmn2:definitions>'''

## Write BPMN XML to File

This cell outputs the XML to a file in the `./working` directory. 

In [None]:
with open('./working/cpg_as_bpm.bpmn', 'w') as file:
    file.write(KN_BPMN)

**Disclaimer:** Nothing provided here is guaranteed or warrantied to work. It is provided as is and has not been tested extensively. Using this notebook is at the risk of the user. Further, it is provided for research only and should not be used for treatment. 

Copyright &copy; 2024 Sam Schifman