# A tutorial on understanding *and building* knowledge graphs/networks

##### and an uncommonly illustrative example

### <span style="background-color: #FFDD4A">The Convergence Hub for the Exploration of Space Science (CHESS)</span>
###### <span style="background-color: #FFDD4A">towards a</span>
### <span style="background-color: #FFDD4A">SPACE WEATHER OPEN KNOWLEDGE NETWORK!</span>
      

To Do:
1. Enrich the CHESS example with the lessons I learned from Phase I interactions to create an ontology
2. Complete the simple steps with links to the technologies
3. Make it uncommonly useful

The purpose of this script is to be an interactive guide for *linking* -- both linking data through [knowledge networks](https://www.nitrd.gov/news/Open-Knowledge-Network-Workshop-Report-2018.aspx) and linking communities (through methodology transfer). This Python Notebook serves as an illustration of a new mode of linking through technologies. The objective is to **foster creation of the collaborative infrastructure required to make progress on our grand space science challenges**. 

This is accomplished by examining a powerful use case -- space weather impacts on the electric power grid, or geomagnetically induced currentcs (GIC). We reveal progress and future directions of the National Science Foundation [Convergence Accelerator](https://www.nsf.gov/od/oia/convergence-accelerator/index.jsp) project: [The Convergence Hub for the Exploration of Space Science (CHESS)](https://www.nsf.gov/od/oia/convergence-accelerator/Award%20Listings/Track%20A%20Abstracts/A-7152-McGranaghan-ASTRA.pdf).  

This notebook has been written for both space weather domain experts and for those unfamiliar with space weather terminology and data. To achieve this, it provides rich descriptive and metadata so all can use it and to facilitate scientist-data scientist interaction. Indeed, it is a guide to anyone who sees the value in structuring data in a more intelligent and discoverable way. 

We encourage all contributions to this work.



### Rules of usage


1. Document all development
    - All code and progress in this script must be *usable* to anyone that picks it up
2. Crystallize knowledge after each development 
    - Attempt to define lessons learned and to share those in a way that will allow others to pick up where you left off
3. *Extend this list as new rules emerge*

### The 'What' and 'How' 

We begin with an exploration of just how powerful OKNs can be and the ubiquity of their need across society. 


OKNs provide the **ability to link disparate information by traversing links across the network** and by deducing
linkages among entities. 

With an OKN researchers could develop: 
- More robust and efficient approaches to answering questions
- More expressive frameworks to capture knowledge
- More natural interfaces to access that knowledge

[OPEN KNOWLEDGE NETWORK: SUMMARY OF THE BIG DATA INTERAGENCY WORKING GROUP (IWG) WORKSHOP OCTOBER 4–5, 2017](https://www.nitrd.gov/nitrdgroups/index.php?title=Open_Knowledge_Network#Presentations)

Below we first introduce and motivate the concept of an OKN with specific exampless. Then, we detail the steps to create an OKN. 

#### Linking Data and The OKN

![Metaphor of trees](figures/Trees-and-classification.png)

Most of our knowledge is structured in the tree format.



###### Example: NASA SMD Structuring of Knowledge


<img src="figures/NASA_SMD_code600_tree.png" alt="drawing" width="500"/>

... and further


<img src="figures/NASA_SMD_code670_tree.png" alt="drawing" width="500"/>


###### Evolution of Knowledge Structuring

![Weaver](figures/Weaver_PeriodsOfComplexity.png)

1-2 items connected --> Realization that more than a few elements exist, but disconnected/disorganized --> Understanding that those many elements are complexly connected

(explore further: [Manuel Lima presentation on the power of networks](https://www.brainpickings.org/2012/01/16/manuel-lima-the-power-of-networks/))

Linking and networks are the responses to an increasingly complex world


No longer can Heliophysics be kept separate from e.g., Earth Science

![NASA Science Mission Directorates](figures/NASA_SMDs.png)



###### Wikipedia

Wikipedia is a powerful example

[Wikiverse - 3D visualization of Wikipedia](https://www.wikiverse.io/)

![Wikiverse still frame](figures/WikiVerse.png)


###### What is a knowledge network? 

A Knowledge Network is a graph-based way to structure data that allows reasoning about entities, attributes and relationships [Kejriwal, 2019](link.springer.com/book/10.1007%2F978-3-030-12375-8)


Every time you use Google search you are accessing one of the world's largest and most successful knowledge networks

![Google KN demonstration](figures/CHESS_KNintro.png)


Imagine what Earth and space science would look like if we could discover like this?


###### What must data look like to create structures like this? 

Formatting the data:
![triple representation](figures/triple_representation.png)

How does wikipedia structure the data? Through a knowledge network: [DBpedia](https://wiki.dbpedia.org/about)



#### Practical steps to building a knowledge network: 


We have provided a clarion call for OKNs, yet the steps to create one are not clear. 

Below we lay out those steps and, importantly, the technologies that are needed and available now to achieve each step. 

The key challenge is to determine how to semantically structure information so that it is interoperable (simply put: using the same terminology and mapping between terminologies).  

###### 1. Define a use case and determine the *notion* that you want to model with the knowledge network (e.g., 'solar event to power grid impact')

Tools for step one: 
- The ['user' interview](https://www.designkit.org/methods/2): determine what is most important and distill what purpose the knowledge network will serve as well as *who* it will serve


###### 2. Understand the *competency questions* (What questions must the network be able to answer? What questions will users ask of the network?)

Tools for step two: 
- Competency questions design and iteration: [Competency Questions (CQs)](https://medium.com/@tishchungoora/ontology-competency-questions-3d213eb08d33) are natural language questions outlining and constraining the scope of knowledge represented by an ontology


###### 3. Diagram the ENTITIES and RELATIONSHIPS


Tools for step three: 
- Some tool for online whiteboarding, like [Miro](miro.com/) or [Visio](https://en.wikipedia.org/wiki/Microsoft_Visio), to just get the entities and relations mapped out. The full development team AND users must be involved in this iterative process.
- Draw up the finalized whiteboarded ontological patterns with a digital tool like - [Protégé](https://protege.stanford.edu/) - a free, open-source ontology editor and framework for building intelligent systems. This will help you move your whiteboarded model to a machine-readable format, such as...
- [Resource Description Framework (RDF)](https://en.wikipedia.org/wiki/Resource_Description_Framework) - framework for modeling metadata information
- [Web Ontology Language (OWL)](https://www.w3.org/OWL/) - a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things
- After the ontological model is in machine readable format, next will be to add data to the knowledge graph. [GraphDB](https://graphdb.ontotext.com/) - a Semantic Graph Database/RDF TripleStore; very useful for visualization and exploration
- Then we need to know and utilize a query language to search over the triple store, e.g., [SPARQL](https://www.w3.org/TR/rdf-sparql-query/)
- Finally, document the ontology with, e.g., [WIzard for DOCumenting Ontologies (WIDOCO)](https://github.com/dgarijo/Widoco) - tool to publish and create an enriched and customized documentation of your ontology



###### 4. Enrich with ontology/terms used in the domain


Tools for step four:
- Ontology repositories such as the Earth Science Information Partners' [Community Ontology Repository](http://cor.esipfed.org/) (find out what has already been done and attempt to harmonize/ontology match)


###### 5. Refine

*This step is never complete. Like all meaningful things, the process is circular, feeding back on itself and adapting. 

### An example: [The Convergence Hub for the Exploration of Space Science (CHESS) Project](https://www.nsf.gov/od/oia/convergence-accelerator/Award%20Listings/Track%20A%20Abstracts/A-7152-McGranaghan-ASTRA.pdf)



#### Why an OKN?

Three barriers hold back a space weather understanding commensurate with society's needs:
1. The lack of a cohesive community, owing to the wide variety of subject matter experts
required;
2. The lack of effective data sharing, coordination, and analysis (e.g., data science) to
leverage existing resources and knowledge efficiently; and
3. The diversity of physically dominant processes in each section of the space weather
environment, making it difficult to relate various models and observations.


We will address the root cause of these barriers, namely that large volumes of data can often be found in communities that are disconnected from each other, yet study the effects of essentially the same phenomena. We will focus on one of the most severe areas threatened and impacted by space weather - the electric power grid.


We will create a cohesive, inclusive, cross-cutting, digitally-empowered community for a flourishing electric power grid by developing next-generation data technologies that can be used by each layer in the **‘Geomagnetic Disturbance (GMD) Information Flow Pipeline’** for space weather impact on the power grid.

![GMD information flow](figures/CHESS_GMD_InformationFlow_v2.png)

#### The Prototype solution

##### **<span style="background-color: #FC3D21">Step One</span>**

An **[Open Knowledge Network (OKN)](https://www.nitrd.gov/news/Open-Knowledge-Network-Workshop-Report-2018.aspx)** that semantically links concepts and data that span the relevant disciplines of space weather and the areas that are impacted. 

Our OKN will provide resources to each component in the GMD Information Flow Pipeline, allowing the relevant data, previously dispersed and disconnected, to be readily searched and used (we define **usable** as accessible, labeled structured, and organized).

![the technology](figures/CHESS_Dashboard_MoneyChart.png)

##### **<span style="background-color: #FC3D21">Step Two</span>**

What are competency questions? Competency Questions (CQs) are natural language questions outlining and constraining the scope of knowledge represented by an ontology


First cut at questions for the CHESS knowledge graph and (*and realization of the query*):
1. What information are available to me to look at (and what do they mean)?
    - *Table with links to available data and model sources for solar wind, geomagnetic activity indices, and nearby ground-based magnetometer information for the requesting entity (e.g., utility)*
2. What are the relevant geomagnetic conditions on the Earth for a given time?
    - *Plot (time series centered at requested timestamp) showing the global and regional geomagnetic indices, e.g., the Kp index and local time-dependent SuperMAG indices*
    - *Plot (time series centered at requested timestamp) showing local magnetic field conditions, e.g.,, magnetometer readings from nearest three magnetometers or magnetometers covering the utility’s footprint*
3. What were the solar wind conditions at the time of a given GIC observation? 
    - *Plot (time series centered at requested timestamp) showing the solar wind variables*
4. What were the geomagnetic activity indicator levels at the time of a given GIC observation? 
    - *Plot (time series of geomagnetic indices and corresponding time series of GIC data centered at requested timestamp with time of GIC observation identified)*
5. What is the attribution of the GIC observations?
    - *Plot (time series of GIC levels up to current time) with shaded regions showing results of a classification algorithm that attributes the data point to: 1) space weather; 2) utility action (e.g., element taken offline); 3) unknown; 4) other (e.g., animal)*
6. Is this GIC node at risk of exceeding a threshold GIC level in the next X time?
    - *Plot (time series of GIC levels up to current time and predicted levels for next Y times) with specified threshold identified* 
7. (future) What is the risk of solar weather impacts to me in the next 30 minutes, 1 hour, 12 hours, etc. and can I set an alert?
    - *Plot (time series centered at requested timestamp) showing the GIC prediction with markers for the requested prediction times and an indication of threat level based on specified threshold (low, medium, high)* 
8. (future) Do I need to shut down a vulnerable transformer? What impact will that cascading current have on the rest of my grid?
    - *Plot (time series centered at requested timestamp) showing the transformer power flow calculations and an indication of threat level based on specified threshold (low, medium, high)*

##### **<span style="background-color: #FC3D21">Step Three</span>**

We first spent six months, beginning with an in-person two-day intensive workshop to brainstorm, identify, and refine the ENTITIES and RELATIONSHIPS with ontological engineers and semantic technology experts. This kind of interaction is inextricable from the process. We designed a flow of how these interactions should be run:
![]()

These domain-ontological/data science engineer interactions are inextricable from the process of producing a KG, so we detail them here to guide your own efforts...

<span style="background-color: #FC3D21">Write about interactions here</span>


A critical point is that these interactions must be regular and sustained. 

The result was an ontology, which we detail below. 

**We designed the ontology using a ['modular' approach](ceur-ws.org/Vol-2459/paper4.pdf)**
Read the [full paper development of the CHESS ontology](https://arxiv.org/abs/2009.12285)


    1. Data Transformation Pattern: allows for the description of data-driven workflows
    2. Simulation Activity Module: instance of the Data Transformation Pattern for algorithms and simulations
    3. Full CHESS ontology


**Core concept: Data Transformation Pattern**

![data transformation pattern](figures/CHESS_ontology1.png)

- The data participates in ia transformation via roles
- The data do not represent actual data, but points to the data location (e.g., a Universal Record Indicator)

<span style="background-color: #FFFF00">Yellow boxes</span> denote classes. <span style="background-color: #427AA1">Blue boxes</span> with dashed outline denote external patterns or modules. <span style="background-color: #B66BA3">Purple boxes</span> denote controlled vocabularies. Solid arrows denote object or date properties,while open arrows denote the subclass relationship.



**Specializing the pattern: Simulation Activity Module**

![simulation activity module](figures/CHESS_ontology2.png)

- Created from the Data Transformation Pattern but implements an algorithm

**Assembling the pieces for Space Weather: the full modular ontology**

![full space weather ontology](figures/CHESS_ontology3.png)

- The assembled ontology ties together Activities, such as simulations or interpretations; Agents, such as power grid operators; SolarEvents, such as coronal mass ejections; and Responses, such as simulate, monitor, or update
- New activities are added to the Simulation Activity: Monitor and Interpretation

##### **<span style="background-color: #FC3D21">Step Four</span>**

We are working to harmonize the CHESS ENTITIES and RELATIONSHIPS with those used by metadata standards in the Heliophysics and sensor/observation communities. 

There are two major standards/ontologies to link to: 
1. [The Space Physics Archive Search and Extract (SPASE) Data Model](https://spase-group.org/)
2. [The Semantic Sensor Network Ontology](https://www.w3.org/TR/vocab-ssn/)

##### **<span style="background-color: #FC3D21">Step Five</span>**

Next steps are to further develop the modules left as stubs, namely Observation Data, which has obvious connections to the [SSN/SOSA](https://www.w3.org/TR/vocab-ssn/) and the Monitoring activity. We will also be exploring deployment and integration of real world data that the CHESS team has curated.

### How do you benefit from this work and contribute to it? 


###### Calls to action

    
    


1. Play around with this script -- i.e., ***ignite*** domain integration and new connections
    - **Spend 15 minutes with this script to explore actually building a knowledge network**
    - 
    
    


2. Learn more...
    - Best resources for knowledge capture, structuring, graphs/networks
        - [Domain-Specific Knowledge Graph Construction](https://link.springer.com/book/10.1007%2F978-3-030-12375-8)
        - [OPEN KNOWLEDGE NETWORK: SUMMARY OF THE BIG DATA INTERAGENCY WORKING GROUP (IWG) WORKSHOP OCTOBER 4–5, 2017](https://www.nitrd.gov/nitrdgroups/index.php?title=Open_Knowledge_Network#Presentations)
    