# Exploring NC3Rs Experiment Design Assistant File Format from a Template

Here I go through the details off an NC3Rs Experiment Design Assistant `.eda` file. This file is the exported `.eda` file from a template. 

This is their website:  https://eda.nc3rs.org.uk/

In [1]:
import json

f = open('model', 'r')
j = json.load(f)

print("Model JSON Contains primary keys:\n")
for _k in j.keys():
    print(f"  {str(type(j[_k])):15s} {_k} ")

Model JSON Contains primary keys:

  <class 'str'>   resourceId 
  <class 'dict'>  properties 
  <class 'dict'>  propertyTypes 
  <class 'dict'>  stencil 
  <class 'list'>  childShapes 
  <class 'dict'>  bounds 
  <class 'dict'>  stencilset 
  <class 'list'>  ssextensions 


Now, I want to unpack each of these.First, I can see that many of these are dictionaries, but they are actually quite small with the exception of `childShapes`.


| Name | Description | Example |
| --- | --- | --- |
| resourceId | UID | 'oryx-canvas123' |
| properties | Properties | {'title': ""} |
| propertyTypes | Property data types | {'title': 'string'} |
| stencil		| Label | {'id': 'Diagram'} |
| childShapes	| Details of diagram contents | [more below] |
| bounds		| Diagram canvas bounding box | {'lowerRight': {'x': 1589, 'y': 1050 }, 'upperLeft': {'x': 0, 'y': 0 }} |
| stencilset | Link to a JSON file and namespace | {'url': '/eda/assets/eda/eda-750239aadc9892bd73639e9a7c59cffb.json', 'namespace': 'eda#'} |
| ssextensions | Empty list in the template I downloaded | ? |


Now that this level is mostly characterized, I'll dig into `j['childShapes']`. 

This is where  the bulk of EDA node and edge information, including labels, properties, locations, etc, is stored.

In [2]:
dl = []
for _i in j['childShapes']:
    _l = list(_i.keys())
    new = False
    if len(dl) == 0:
        new = True
    else:
        found = False
        for _d in dl:
            if _l == _d:
                found = True
                break
        if not found:
            new = True
    if new:
        dl.append(_l)

count = 1
for _dl in dl:
    print(f"Unique childShape {count}:")
    for _i in _dl:
        print(f" - {_i}")
    count += 1
    print()

Unique childShape 1:
 - resourceId
 - properties
 - propertyTypes
 - stencil
 - childShapes
 - outgoing
 - incoming
 - bounds
 - dockers

Unique childShape 2:
 - resourceId
 - properties
 - propertyTypes
 - stencil
 - childShapes
 - outgoing
 - incoming
 - bounds
 - dockers
 - target



Since there are only two unique childShapes, we can also quickly show the only unique key between the two is `target`, indicating what is likely the `childShape` associated with an edge.

In [3]:
def diff(l1, l2):
    l1_unique = list(set(l1))
    l2_unique = list(set(l2))
    all_unique = list(set(l1+l2))
    list_diff = [i for i in all_unique 
              if i not in l1_unique 
              or i not in l2_unique]
    return list_diff

print("Difference between two childShapes:")
for _d in diff(dl[0],dl[1]):
    print(f"  - {_d}")

Difference between two childShapes:
  - target


I'm guessing these represent nodes and edges as follows:

    Node <class 'dict'> 
         dict_keys([
            'resourceId', 
            'properties', 
            'propertyTypes', 
            'stencil', 
            'childShapes', 
            'outgoing', 
            'incoming', 
            'bounds', 
            'dockers']) 

    Edge <class 'dict'> 
         dict_keys([
            'resourceId', 
            'properties', 
            'propertyTypes', 
            'stencil', 
            'childShapes', 
            'outgoing', 
            'incoming', 
            'bounds', 
            'dockers', 
            'target']) 

This is helpful because now I can see how to parse each of these.

Again, the only difference in keys here is the `target` key. 

I explored this particular experimental diagram in `iPython`, and I found that `target` contains single resourceId value, and that value appears to correspond with the opposite, apparently `Node` entries. 

Here are the resulting notes:

|Name | Description | Example |  
|---|---|---| 
| resourceId  | UID for each node or edge | | 
| properties  | Properties for each node or edge | | 
| propertyTypes   | Key-Value pairs for data type of each property|| 
| stencil   | Label for each node or edge|| 
| childShapes | Recursive, next level down with unique resourceId's, mostly empty, but captures another layer of notes|| 
| outgoing  | List of Key-Value pairs for resourceId's, presumably outgoing edges|| 
| incoming  | List of Key-Value pairs for resourceId's, presumably incoming edges|| 
| bounds    | Dictionary of coordinates for bounding box (?) | {'lowerRight': {'x': 270, 'y': 575}, 'upperLeft': {'x': 150, 'y': 525}} | 
| dockers   | Pairs of geometric coordinate pairs for edge endpoints (?) | [{'x': 73, 'y': 35}, {'x': -0.9, 'y': 8.2}] | 
| target    | Single-entry dictionary of resourceId for an edge?| |

Based on this information, we should be able to read the `.eda` file and gather relevant information to a Neo4j schema. In order to do this, we need to know what is required to fully represent this information in Neo4j. 

I am also curious how to translate the coordinates accurately so we might transfer to Arrows.app and then into a Cypher query. For now, Arrows.app is probably the best way to visually verify we're capturing everything correctly. Bloom or Neo4j Browser will be the next step after that. For now, I will focus on the basic graph requirements:

| Neo4j | EDA | EDA `target` | 
|---|---|---|
| UID | resourceId | N/A |
| Label | stencil, id | False |
| Label Properties | properties | False |
| Label Properties | propertyTypes | False |
| Relationship | stencil, id | True |
| Relationship properties | properties | True |
| Relationship properties | propertyTypes | True |


In [21]:
for _opt in ['resourceId','outgoing','incoming']:
    print(f"{_opt}: \n\t{j['childShapes'][1][_opt]}")

resourceId: 
	oryx_7D22B9C7-F193-40BA-B31A-6D9F1A423D93
outgoing: 
	[{'resourceId': 'oryx_6B6458F7-0B89-4EC6-84DC-F398B6DF3716'}]
incoming: 
	[{'resourceId': 'oryx_6894C16E-1267-478F-B621-B6CA322390F9'}]


If I pick the lone `j['childShapes'][1]['outgoing']`, I find that it involves an edge with name `then` as you can see in the next cell...

In [12]:
# ogID = 'oryx_6B6458F7-0B89-4EC6-84DC-F398B6DF3716'
ogID = j['childShapes'][1]['outgoing'][0]['resourceId']
for _j in j['childShapes']:
    if _j['resourceId'] == ogID:
        print(_j['properties'])

{'name': 'then'}


Likewise, if I choose an `j['childShapes'][1]['incoming']`, I find an edge again, suggesting that these are typically nodes.

In [14]:
ogID = j['childShapes'][1]['incoming'][0]['resourceId']
for _j in j['childShapes']:
    if _j['resourceId'] == ogID:
        print(_j['properties'])

{'name': 'subjected to'}


In [24]:
nodes = []
edges = []
for _j in j['childShapes']:
    if 'target' in _j.keys():
        edges.append(_j)
    else:
        nodes.append(_j)

In [31]:
len(nodes)

23

Now that I have all the nodes, I want to check on these seemingly extraneous `childShapes`. They are all named the same thing: `Variable_category`. I wonder what this is and why it is not treated as a normal node, relationship, or property.

In [37]:
for _n in nodes:
    if len(_n['childShapes']) > 0:
        for _i in _n['childShapes']:
            print(_i['stencil'])
        print()

{'id': 'Variable_category'}

{'id': 'Variable_category'}

{'id': 'Variable_category'}



For this template, at least, no edges contain `childShapes` (see cell below). This makes sense generally, so I'm going to assume that this is generally true, but it would be great to confirm!

In [39]:
for _e in edges:
    if len(_e['childShapes']) > 0:
        for _i in _e['childShapes']:
            print(_i['stencil'])
        print()

In [71]:
class Node():
    def __init__(self, node):
        self.uid = node['resourceId']
        self.prop = node['properties']
        self.proptype = node['propertyTypes']
        self.label = node['stencil']['id']
        self.edge_out = node['outgoing']
        self.edge_in = node['incoming']
        self.bbox = node['bounds']
        self.dock = node['dockers']
        
        
class Edge():
    def __init__(self, edge):
        self.uid = edge['resourceId']
        self.prop = edge['properties']
        self.proptype = edge['propertyTypes']
        self.label = edge['stencil']['id']
        self.edge_out = edge['outgoing']
        self.edge_in = edge['incoming']
        self.bbox = edge['bounds']
        self.dock = edge['dockers']
        self.target = edge['target']['resourceId']

In [72]:
n = Node(nodes[1])

In [73]:
e = Edge(edges[1])

In [74]:
e.target

'oryx_EC08DE2B-AB9A-4D0D-B9C6-4C1325D08C76'

Now that I have all of these, I should be able to start connecting the dots, so to speak.