<a href="https://colab.research.google.com/github/i40-Tools/I40KG-Embeddings/blob/master/Similarity_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# I40 standards landscape similarity analysis using embeddings

## Overview

In this notebook, we show the similarity analysis between Industry 4.0 Standards. 
To do so, we create embeddings about the Industry 4.0 Standards Knowledge Graph (I40KG) developed by [Grangel-Gonzales et. al.](https://www.researchgate.net/publication/318208930_The_Industry_40_Standards_Landscape_from_a_Semantic_Integration_Perspective)

The embeddings are located here: [I40 Embeddings](https://github.com/i40-Tools/I40KG-Embeddings/tree/master/logs_sto)

## Initial Configurations
First, let's import the required libraries to perform the similarity analysis.

In [0]:
!git clone https://github.com/i40-Tools/I40KG-Embeddings.git

Cloning into 'I40KG-Embeddings'...
remote: Enumerating objects: 109, done.[K
remote: Counting objects: 100% (109/109), done.[K
remote: Compressing objects: 100% (93/93), done.[K
remote: Total 109 (delta 52), reused 37 (delta 9), pack-reused 0[K
Receiving objects: 100% (109/109), 22.61 MiB | 17.69 MiB/s, done.
Resolving deltas: 100% (52/52), done.


In [0]:
%ls

[0m[01;34mI40KG-Embeddings[0m/  [01;34msample_data[0m/


In [0]:
import scipy
from scipy import spatial
import numpy as np
import math
import json
from scipy.spatial.distance import cdist

### Define function to print the result in tabular format 

For Framework

In [0]:
#function to print result in table for similarity of frameworks
def print_result_framework(result):
    print ("{:<8}                                            {:<15}                         {:<10}".format('Framework A','Framework B','Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))



In [0]:
#function to print result in table for similarity of standards, similarity of standards for same framework and different framework
def print_result_standard(result):
    print ("{:<8}                                            {:<15}                         {:<10}".format('Standard A','Standard B','Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))



In [0]:
#function to print result in table for similarity for layers
def print_layers(result,framework1,framework2):
    print ("{:<8}                     {:<15}                         {:<10}".format(framework1,framework2,'Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))


### Define similarity function
We are going to us cosine distance to measure the similarity between two embeddings

In [0]:
#function to calculate cosine distance 
def cosine_similarity(vec1,vec2):
    sum11, sum12, sum22 = 0, 0, 0
    for i in range(len(vec1)):
        x = vec1[i]; y = vec2[i]
        sum11 += x*x
        sum22 += y*y
        sum12 += x*y
    return sum12/math.sqrt(sum11*sum22)

## Similarity among I40 Standarization Frameworks
In this section we show the analysis of similarity among standarization frameworks

In [0]:
!pip3 install rdflib

Collecting rdflib
[?25l  Downloading https://files.pythonhosted.org/packages/3c/fe/630bacb652680f6d481b9febbb3e2c3869194a1a5fc3401a4a41195a2f8f/rdflib-4.2.2-py3-none-any.whl (344kB)
[K     |████████████████████████████████| 348kB 9.6MB/s 
Collecting isodate (from rdflib)
[?25l  Downloading https://files.pythonhosted.org/packages/9b/9f/b36f7774ff5ea8e428fdcfc4bb332c39ee5b9362ddd3d40d9516a55221b2/isodate-0.6.0-py2.py3-none-any.whl (45kB)
[K     |████████████████████████████████| 51kB 16.9MB/s 
Installing collected packages: isodate, rdflib
Successfully installed isodate-0.6.0 rdflib-4.2.2


In [0]:
import json
from rdflib import Graph
import pprint

###Similarity among Frameworks
In this section we analyze the similarity of I4.0 frameworks at the high level, i.e., StandardizationFramework.


In [0]:
g = Graph()
g.parse("/content/I40KG-Embeddings/sto/sto.nt", format="nt")
    
len(g) # prints 2
 #check printing of the graph    
'''for stmt in g:
    pprint.pprint(stmt)'''

#query to get the framework/standard from the sto.nt file
#to get standards we have to replace sto:StandardizationFramework by sto:Standard in the query
#we can get standards of the frameworks as well by just changing the query

qres = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:StandardizationFramework .
            } limit 1000""")
    
       
#to get the corresponding embeddings of the frameworks/standards from the json file 
with open("/content/I40KG-Embeddings/logs_sto/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict = {}
for row in qres:
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
            print(new_dict)

#to put the frameworks/standards with their corresponding embeddings in a file            
with open('/content/I40KG-Embeddings/output_framework.json','w') as f:
    # this would place the entire output on one line
    # use json.dump(new_dict, f, indent=4) to "pretty-print" with four spaces per indent
    json.dump(new_dict, f)
    
    
#to read the file containing standards/frameworks along with their embeddings   
with open('/content/I40KG-Embeddings/output_framework.json', 'r') as f:
    array = json.load(f)
    
#compare each standard/framework with all the other standards/frameworks to find cosine similarity
result = {}
for key,value in array.items():
    temp,tempDict= 0,{}
    for keyC,valueC in array.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC) #send the values of the standards/frameworks to cosine similarity function
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res
    #result[key]= tempDict
        
#print the standards/frameworks along with their similar standards/frameworks and their similarity distance
print_result_framework(result)

{'https://w3id.org/i40/rami#Header': [-0.05846023932099342, -0.19150541722774506, -0.05162709206342697, -0.1591479331254959, 0.05834261327981949, 0.07690666615962982, -0.11871955543756485, -0.02744811587035656, -0.1790369600057602, 0.018498294055461884, -0.21848168969154358, -0.21447916328907013, 0.10905816406011581, 0.14641466736793518, 0.15883386135101318, 0.19062480330467224, 0.15035036206245422, 0.08137045055627823, 0.23082101345062256, 0.08103805780410767, -0.07569720596075058, -0.13604457676410675, 0.07563043385744095, -0.06335669755935669, -0.047055527567863464, -0.0789257064461708, -0.15503999590873718, -0.032964207231998444, 0.19128111004829407, -0.08474723249673843, -0.04816175997257233, -0.1986766904592514, -0.24972543120384216, 0.1286625862121582, 0.012997523881494999, -0.04585658758878708, -0.23919402062892914, 0.055835600942373276, 0.0863007977604866, 0.19961115717887878, 0.11098377406597137, -0.1371253877878189, -0.17279928922653198, 0.2157992124557495, -0.06607788056135

### Similarity among Standards

In this section we analyze the similarity of I4.0 standards at the high level, i.e., Standard.

In [0]:
g = Graph()
g.parse("/content/I40KG-Embeddings/sto/sto.nt", format="nt")
    
len(g) # prints 2
 #check printing of the graph    
'''for stmt in g:
    pprint.pprint(stmt)'''

#query to get the framework/standard from the sto.nt file
#to get standards we have to replace sto:StandardizationFramework by sto:Standard in the query
#we can get standards of the frameworks as well by just changing the query

qres = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:Standard .
            } limit 1000""")
    
       
#to get the corresponding embeddings of the frameworks/standards from the json file 
with open("/content/I40KG-Embeddings/logs_sto/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict = {}
for row in qres:
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
    print(new_dict)

#to put the frameworks/standards with their corresponding embeddings in a file            
with open('/content/I40KG-Embeddings/output_standard.json','w') as f:
    # this would place the entire output on one line
    # use json.dump(new_dict, f, indent=4) to "pretty-print" with four spaces per indent
    json.dump(new_dict, f)
    
    
#to read the file containing standards/frameworks along with their embeddings   
with open('/content/I40KG-Embeddings/output_standard.json', 'r') as f:
    array = json.load(f)
    
#compare each standard/framework with all the other standards/frameworks to find cosine similarity
result = {}
for key,value in array.items():
    temp,tempDict= 0,{}
    for keyC,valueC in array.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC) #send the values of the standards/frameworks to cosine similarity function
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res
    #result[key]= tempDict
        
#print the standards/frameworks along with their similar standards/frameworks and their similarity distance
print_result_standard(result)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Standard A                                            Standard B                              Score     
----------------------------------------------------------------------------------------------------------------------
https://w3id.org/i40/sto#ISO_ASTM_52915             https://w3id.org/i40/sto#IEC_14443      0.6035629960902923
https://w3id.org/i40/sto#RFC_7540             https://w3id.org/i40/sto#W3C_DCAT      0.6660452590449815
https://w3id.org/i40/sto#IEC_60204_2000             https://w3id.org/i40/sto#IEC_60839_P7_6      0.6059189018183244
https://w3id.org/i40/sto#IEC_60382_1991             https://w3id.org/i40/sto#IEC_60255_2013      0.602626974021053
https://w3id.org/i40/sto#IEC_61690_P1_2000             https://w3id.org/i40/sto#IEC_62264      0.557831569925525
https://w3id.org/i40/sto#ISO_11898-1             https://w3id.org/i40/sto#ISO_14739-1      0.6596964366536424
https://w3id.org/i40/sto#ISO_16739             https://w3id.org/i40/sto#TS-0009      0.6818665886633948
htt

####Results
The following table shows the frameworks with high degree of similarites.

| Framework A | Framework B | Score |
| ----------------------|:---------------------:| ----------:|
| RAMI 4.0 |  IIRA  | 0.75 |


In [0]:
# TODO implement a fuction to print a table as the one above

###Similarity among different layers of the Standards
In this section we analyze the similarity among layers of different standards, e.g., RAMI 4.0 -> Layer A vs IIRA -> Layer X

In [0]:

g = Graph()
g.parse("C:/Users/Kaushikee/.spyder-py3/I40KG-Embeddings-master/sto/sto.nt", format="nt")
    
len(g) # prints 2
    
'''for stmt in g:
    pprint.pprint(stmt)'''
    
qres1 = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select DISTINCT ?c where {
            ?s sto:hasClassification ?c .
            ?c sto:isDescribedin <https://w3id.org/i40/sto#RAMI> .
            } limit 1000""")

qres2 = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
     select DISTINCT ?c where {
            ?s sto:hasClassification ?c .
            ?c sto:isDescribedin <https://w3id.org/i40/sto#ISA95> .
            } limit 1000""") 


'''
with open("framework_entity.nt", "w") as fd:
    for row in qres:
        fd.write("%s" % row + "\n")
        #print("%s" % row)'''
        

with open("C:/Users/Kaushikee/.spyder-py3/I40KG-Embeddings-master/logs_sto/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict1 = {}
new_dict2 = {}
for row in qres1:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict1[tem] = array[key] 
    print(new_dict1)
for row in qres2:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict2[tem] = array[key] 
    print(new_dict2)

    
framework1 = 'https://w3id.org/i40/sto#RAMI'    
framework2 = 'https://w3id.org/i40/sto#ISA95'
    
result = {}
for key,value in new_dict1.items():
    temp,tempDict= 0,{}
    for keyC,valueC in new_dict2.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC)
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res 
print_layers(result,framework1,framework2)

####Results
The following table shows the similarity among different layers of two frameworks.

| RAMI4.0 | IIRA | Score |
| --------------|:-------:| ----------:|
| Layer A |  Layer X  | 0.85 |
| Layer B |  Layer Y  | 0.75 |
| Layer C |  Layer Z  | 0.85 |


In [0]:
# TODO print a table like the above

##Similarity among Standards of the same Framework
In this section we show the analysis of similarity among standards belonging to the same framework.

In [0]:
import json
from rdflib import Graph
import pprint



g = Graph()
g.parse("/content/I40KG-Embeddings/sto/sto.nt", format="nt")
    
len(g) # prints 2
    
'''for stmt in g:
    pprint.pprint(stmt)'''
    
qres = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:Standard .
            ?s sto:hasClassification ?c .
            ?c sto:isDescribedin <https://w3id.org/i40/sto#RAMI> .
            } limit 1000""")

'''    
with open("framework_entity.nt", "w") as fd:
    for row in qres:
        fd.write("%s" % row + "\n")
        #print("%s" % row)'''
        
#oldfile = open("framework_entity.nt", "r")
with open("/content/I40KG-Embeddings/logs_sto/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict = {}
for row in qres:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
    print(new_dict)

with open('/content/I40KG-Embeddings/output_standard_same_framework.json','w') as f:
    # this would place the entire output on one line
    # use json.dump(lista_items, f, indent=4) to "pretty-print" with four spaces per indent
    json.dump(new_dict, f) 

    
#to read the file containing standards/frameworks along with their embeddings   
with open('/content/I40KG-Embeddings/output_standard_same_framework.json', 'r') as f:
    array = json.load(f)
    
#compare each standard/framework with all the other standards/frameworks to find cosine similarity
result = {}
for key,value in array.items():
    temp,tempDict= 0,{}
    for keyC,valueC in array.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC) #send the values of the standards/frameworks to cosine similarity function
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res
    #result[key]= tempDict
        
#print the standards/frameworks along with their similar standards/frameworks and their similarity distance
print_result_standard(result)

{'https://w3id.org/i40/sto#IEC_61131': [0.04427987337112427, 0.02123975194990635, -0.009770496748387814, 0.21871952712535858, 0.2818382978439331, -0.24576450884342194, -0.10435811430215836, 0.02727627009153366, 0.11663924902677536, -0.074756920337677, -0.04958094656467438, 0.020205926150083542, 0.005230627488344908, 0.12727558612823486, 0.24037425220012665, 0.00566554581746459, -0.005580421537160873, 0.0003657075867522508, -0.22634616494178772, 0.2681357264518738, -0.10449211299419403, -0.11835897713899612, 0.1994495689868927, 0.13494756817817688, -0.21640567481517792, 0.07247631996870041, 0.06669963151216507, -0.07744915038347244, 0.021403685212135315, 0.11026694625616074, -0.06920012086629868, -0.025213362649083138, 0.17295943200588226, 0.2143331617116928, -0.07114788889884949, 0.22453629970550537, 0.11279267817735672, 0.0857628658413887, -0.06591077893972397, 0.0675312802195549, 0.05349031835794449, -0.15653164684772491, -0.2823489308357239, -0.06057249754667282, 0.00595594244077801

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Standard A                                            Standard B                              Score     
----------------------------------------------------------------------------------------------------------------------
https://w3id.org/i40/sto#IEC_61131             https://w3id.org/i40/sto#ISO_18828-2      0.6189733532264716
https://w3id.org/i40/sto#IEC_61987_X             https://w3id.org/i40/sto#IEC_62890      0.704515284474483
https://w3id.org/i40/sto#ISO_18828-2             https://w3id.org/i40/sto#ISO_18629      0.5380167624781809
https://w3id.org/i40/sto#IEC_61512             https://w3id.org/i40/sto#ISO_8062-4      0.7165690659098742
https://w3id.org/i40/sto#IEC_62541             https://w3id.org/i40/sto#RFC_2616      0.7231069578353885
https://w3id.org/i40/sto#ISO_16739             https://w3id.org/i40/sto#IEC_61987_X      0.7554732872480179
https://w3id.org/i40/sto#IEC_62890             https://w3id.org/i40/sto#IEC_81714      0.6713253708034757
https://w3id.org/i40/sto#eC

##Similarity among Standards of different Framework
In this section we show the analysis of similarity among standards belonging to different frameworks.

In [0]:
import json
from rdflib import Graph
import pprint



g = Graph()
g.parse("/content/I40KG-Embeddings/sto/sto.nt", format="nt")
    
len(g) # prints 2
    
'''for stmt in g:
    pprint.pprint(stmt)'''
    
qres1 = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:Standard .
            ?s sto:hasClassification ?c .
            ?c sto:isDescribedin <https://w3id.org/i40/sto#RAMI> .
            } limit 10""")

qres2 = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:Standard .
            ?s sto:hasClassification ?c .
            ?c sto:isDescribedin <https://w3id.org/i40/sto#ISA95> .
            } limit 10""") 

'''    
with open("framework_entity.nt", "w") as fd:
    for row in qres1:
        fd.write("%s" % row + "\n")
        #print("%s" % row)
    for row in qres2:
        fd.write("%s" % row + "\n")
        #print("%s" % row)'''
        
#oldfile = open("framework_entity.nt", "r")
with open("/content/I40KG-Embeddings/logs_sto/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict = {}
for row in qres1:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
    print(new_dict)
for row in qres2:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
    print(new_dict)



with open('/content/I40KG-Embeddings/output_standard_different_framework.json','w') as f:
    # this would place the entire output on one line
    # use json.dump(lista_items, f, indent=4) to "pretty-print" with four spaces per indent
    json.dump(new_dict, f)
    
    
#to read the file containing standards/frameworks along with their embeddings   
with open('/content/I40KG-Embeddings/output_standard_different_framework.json', 'r') as f:
    array = json.load(f)
    
#compare each standard/framework with all the other standards/frameworks to find cosine similarity
result = {}
for key,value in array.items():
    temp,tempDict= 0,{}
    for keyC,valueC in array.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC) #send the values of the standards/frameworks to cosine similarity function
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res
    #result[key]= tempDict
        
#print the standards/frameworks along with their similar standards/frameworks and their similarity distance
print_result_standard(result)


{'https://w3id.org/i40/sto#ISO_8062-4': [-0.1598663181066513, -0.18560346961021423, -0.05191608518362045, 0.013133599422872066, 0.04882574453949928, 0.1600474715232849, -0.01081899181008339, -0.009516522288322449, -0.11761683225631714, -0.1884804666042328, 0.23876811563968658, -0.17497804760932922, -0.13787971436977386, -0.09080846607685089, -0.05480845272541046, 0.01788640022277832, 0.05146705359220505, 0.2126259058713913, 0.10729824751615524, 0.17797525227069855, -0.01492366660386324, -0.015929723158478737, 0.11509052664041519, 0.07533158361911774, -0.17348097264766693, -0.027327628806233406, 0.24293673038482666, 0.1250971406698227, 0.03407265990972519, 0.0674484372138977, 0.2046169936656952, 0.1567152440547943, 0.17562197148799896, 0.20931348204612732, -0.1514303982257843, -0.0972663164138794, 0.13717345893383026, 0.013828077353537083, -0.21656259894371033, -0.04058666154742241, -0.22095747292041779, 0.2349260151386261, -0.16182586550712585, 0.20125870406627655, 0.0673457533121109, 

####Results
The following table shows the similarity among different layers of two frameworks.

| RAMI4.0 | IIRA | Score |
| --------------|:-------:| ----------:|
| Standard A |  Standard X  | 0.85 |
| Standard B |  Standard Y  | 0.75 |
| Standard C |  Standard Z  | 0.85 |
