# Step 16 Combine Knowledge graphs

![](images/method.png)

|**[Overview](#Overview)** |**[Installation](#Installation)||**[Prior-steps](#Prior-steps)**|**[How-to-use](#How-to-use)**|**[Next-steps](#Next-steps)**|**[Postscript](#Postscript)**|**[Acknowledgements](#Acknowledgments)|

# Overview

We now have several knowledge graphs each representing the business domain or portfolio differently.

- step 13: a KG based on the topic modelling of the whole library

- Step 14 : a comparator KG based on best practice - in this case a key paper.

- Step 15: a KG based on keywords from the whole library

We do the following:
1. combine the KGs
2. rationalise labels
3. look for patterns across nodes
4. boost patterns that make sense to stakeholders through relabelling
5. find slices through the KG that will make sense to particular stakeholders
6. add a flag for each slice so that the slice can be quickly shown
7. query the graph to output project artefacts for the dominant slice


# Installation

Check installation has been made, as per the [READme](https://github.com/lawrencerowland/Data-Model-for-Project-Frameworks/blob/master/Project-frameworks-by-using-NLP-with-Python-libraries/README.md)

# Prior-steps
One or more KGs can feed into this. 
The minimum would be 1 KG. 
This minimum would require either:
## Step 14 for a KG from an expert / stakeholder discussion/whiteboard session / paper

*or*

## Step 6, 7 and 8 for a KG from keywords
First find keywords

<img src="images/Keywords-for-whole-library.png" width="45%">

Then select useful keyword relationships

<img src="images/graph-schema-2.png" width="30%">

<img src="images/knowledge-graph-2.png" width="30%"> 

Add back in lower-scoring keywords 

<img src="images/Keyword-graph-3.png" width="60%">


*or*

## Step 5,8,9,12,13 for a KG from topic models.
The LDA Topic algorithm is applied to the document library, via a Continuous Bag of Words transformation, and a TFIDF transformation.

# How-to-use

## Combine graphs

Open one of the three graphs in Neo4j.

Then add the other two.

These can be added either by:
- graphml import
- copying in the cypher code

This depends on how the models were saved.
Examples of both types of saving are in previous steps. 

***the three graphs pulled in together***

![3-graphs-together-1.png](images/3-graphs-together-1.png)

We can query the model to ask what is the higher level schema reflected by this graph. i.e. what is the implied relationship between the various node labels.
Use CALL db.schema()
The result looks like this. 

![](images/Combined_graph_schema.png)

## Label the graph

Rationalise the graph, as a team where possible

 - Look for useful clusters
 - combine duplicate concepts
 - relabel concepts (e.g. in this case, topics is no longer a useful label)
 - lose direct relationships where there already is an indirect relationship
 
 
 In this case, we have lost the Topics group, whilst retaining most of the keyords. The structure of Success criteria, and project services and Site were more powerful. We have also retained the key division between two topics -so that our project services have shaken out either into a Supply-chain/ Management type services, or into more Technical and Operationally focussed project services.
 A rough sequence of implementing Success criteria helped in turn to order the Project Services, although some of the dependencies are not strong ones - but rather introduced to allow for easy comprehension. 
 
 For each label we settle on, we look at the relationships within the group, and relationships with each other group, one at a time. Where useful sub-clusters are identified, we either create a new label, or we add a property for this label. For example the Requirements label is given a type property, which takes one of the following values: attribute or artefact. One useful method is to pull up one or two label groups only, and to arrange them in logical order within group. Cross-relationships can be seen. For large groups, it can be useful to use all four sides of the screen to arrange different nodes.

### Determine useful views (subgraphs)

Views can be generated for specific purposes and stakeholders. 
 
### OUTPUT 1 Work-breakdown view 
A project-management view focusses on tasks needed. Each node was reviewed, and given a boolean property as to whether the node was (or implied a task). For those which are tasks - another property was added, which described that task. When a cypher query is run to find only these tasks- then this becomes a work breakdown structure, which shows class relationships, and dependencies between tasks. 
- The view can be grouped by Success Factor.
- Or it can be grouped by task, and showing predecessors or successors

This can be used as a first draft of the Work Breakdown Structure in creating the project plan. 
 
 **Work-breakdown of tasks, here ordered by Success factors**

![](images/combined_graph_WBS_linear_view.png)

### OUTPUT 2: Document view
In just the same way, we identify which nodes are also associated with a project document. We add a boolean property flagging this, and we also add a node property for these nodes which is a document description.
The image below shows all the documents. It is the result of running a query for all nodes where the boolean property is 1. 
The way the documents are spread out below shows how the user interacts with query results, moving nodes around, looking for project-like patterns. 
In this particular case, the user has identified a meaningful pattern by pulling apart the documents into four clusters:
1. high level definitions
2. site specific 
3. design and system descriptions
4. stakeholder and controls documents. 
If desired, new labels can be added to capture meaningful clusters like this. We have not done this, as there are already enough labels to make sense of the project. 

This can be used :
- as the basis for a Document Management plan, and for establishing Document Configuration Mgt
- as Outputs attached to related tasks in the project plan
- to contribute to the Stakeholder Mgt plan, in terms of who seems what
- to confirm the natural order of development, in terms of what should be seen and reviewed first. 

![](images/project_data_items-4way-view.png)

 ### OUTPUT 3: Project feature view (data-model )
This exercise has highlighted a number of features relevant to successful DeCom projects within an ONR environment.

 *What* What attributes should be recorded and tracked for each work-package?
 
 *Where* They will be captured in project records. These records may be in:
- a spreadsheet
- a project folder
- a project database
- an Enterprise Project Management tool
- a cloud EPM tool
Some of these feature will appear in project reports and dashboards.

*Why* These features are key element of the data model for the project, to represent the special characteristics of projects in this business domain. 

If unnecessary features are captured, they will not be used to guide the project, and will be a waste of money and a distraction. 

*How* 

1. These are taken from the graph database by running a query asking for every requirement or site-element that is directly related to any of the Project services.

2. This produces Column 1, from which we define the relevant work-package characteristic we should know throughout the project (see Column 2)

*With what else?* These features are often added to features from standard project data models .i.e. scope, schedule, cost etc. This will normally come from the project framework that has been chosen and the Enterprise Project Management system which is being used. 

*How to use* 
Some or all of the below:
1. refine Project Data model and show and agree as a schema
2. confirm how Project Data will be managed in accord with this model
3. Set up / update the database which will be used for the project - so that it has this schema
    - this is likely to be a relational database
    - it might be an EPM or cloud EPM system that can be tailored to switch on the right data fields
    - it might just be a spreadsheet. 
4. Update what fields will be controlled and/or reported on as a subset of the data model properties. 

### OUTPUT 4: strategy -view
 
 The strategy view is determined by the following:
 - the documents and libraries used as input
 - the experts consulted to craft any other input knowledge graphs
 - the stakeholders with whom one works through this process.
 
 The strategy view can be created with different hierarchies to reflect the perspective of the stakeholder consuming that strategy view. For example, we have chosen to show the strategy in the following hierarchy:
 1. requirements
 2. Success-factors
 3. project-services
 4. site
 
 For example, other structures here could have:
 1. site elements  at highest level (waste/ facility etc)
 2. success factors at highest level
 These would all be more helpful for different audiences.
 
This approach has done the thinking via the clustering and labelling process. Another alternative does not start from one label or another. Instead it starts from the node that reflects the most important concept to a particular user, and uses that as an organising concept. From this node, one can then run a query that asks for nodes within a certain number of hops from that node. All nodes reached on the first hop then can be the highest level of a Strategy Table of Contents. 
 
 *** Example showing a TOC based on 2 hops from node 'Clarify Waste routes'***

![](images/2-hop-view-partial-for-waste-routes.png)

*** Using the Strategy TOC in the project***

The TOC can be exported as:
- a tree graph (using rawgraphs.io or similar)
- a CSV file

Accordingly it can be used as one or all of the following:
1. a TOC in a text or Word document
2. a diagram showing the hierarchy
3. a folder structure, by using text2folders or similar
4. a table with links to each place /module in the hierarchy

***A Strength of the knowledge graph for Strategy views***

All views are retained in one overall knowledge graph. 
Let us say that we create:
1. one Strategy TOC for finance with Financial reqts at top level
2. another Strategy TOC for Ops Directors starting from site
3. A third Strategy TOC starting from Success Factors for the project team

Each of these users can be given their own TOC, and only ever need to see the project in this way. 
Meantime, we retain the full knowledge graph in the graph database, and then use it to populate project and workpackage  details in more detail. We maintain it in Neo4j. 

As we maintain it, we can at any time print out an updated Strategy view or TOC that reflects an update of the original view requested by each user. This can be done by adding a boolean property for each view. Each node and relationship show 1 or 0 depending on whether they are represented in that particular view. 

### Getting to the strategy views
In the course of arriving at suitable views, it can be helpful to pull around the nodes into different clusters or levels, and to see how well this appeals to the User, and how many of the relationships effectively disappear into the simplification of nodes being at the same level. This is in the spirit of Herb Simon's Architecture of Complexity paper. 

for example this was an interim step:
It is too complex for a stakeholder as it is, but is a useful waymark. 

![](images/strategic-view.png)

 
 to make the strategy view easy to generate:
 - a series of Optional Match queries were run. 
 - This generates some null results. The null results are not a problem, but we used them as a prompt to xxx
 - viewed the schema CALL apoc.meta.graph
 - we simplified the instances which were unncessarily complicating schema , using the apoc.refactor.invert(r), (i.e. where the eponymous direction was not giving helpful information)
 - there were one or two nodes that were not linked to many other nodes, and so were not appearing on a simple Optional match which slices through the graph one way. We added appropriate relationships so that the Optional Match picked them up. 
 - where there were nulls,in this case where some Project services did not link to a a Success factor, we added an "Other success factor" node, for ease of reading. 
 
 MATCH (p:project_task)
OPTIONAL MATCH (k:Key_factor)-->(p:project_task)
OPTIONAL MATCH (k:Key_factor)-->(p:project_task)-->(s:site)
OPTIONAL MATCH (r:requirement)-->(k:Key_factor)-->(p:project_task)
OPTIONAL MATCH (r:requirement)-->(k:Key_factor)-->(p:project_task)-->(s:site)
RETURN r.name,k.name,p.name,s.name ORDER BY r.name, k.name

# Next-steps
Implement the Outputs in project controls