<a href="https://colab.research.google.com/github/i40-Tools/I40KG-Embeddings/blob/master/Community-Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# I40 standards landscape similarity analysis using embeddings

## Overview

In this notebook, we show the similarity analysis between Industry 4.0 Standards. 
To do so, we create embeddings about the Industry 4.0 Standards Knowledge Graph (I40KG) developed by [Grangel-Gonzales et. al.](https://www.researchgate.net/publication/318208930_The_Industry_40_Standards_Landscape_from_a_Semantic_Integration_Perspective)

The embeddings are located here: [I40 Embeddings](https://github.com/i40-Tools/I40KG-Embeddings/tree/master/logs_sto)

## Initial Configurations
First, let's import the required libraries to perform the similarity analysis.

In [0]:
!git clone https://github.com/i40-Tools/I40KG-Embeddings.git

Cloning into 'I40KG-Embeddings'...
remote: Enumerating objects: 122, done.[K
remote: Counting objects: 100% (122/122), done.[K
remote: Compressing objects: 100% (106/106), done.[K
remote: Total 122 (delta 60), reused 37 (delta 9), pack-reused 0[K
Receiving objects: 100% (122/122), 22.65 MiB | 23.67 MiB/s, done.
Resolving deltas: 100% (60/60), done.


In [0]:
%ls

[0m[01;34mI40KG-Embeddings[0m/  [01;34msample_data[0m/


In [0]:
import scipy
from scipy import spatial
import numpy as np
import math
import json
from scipy.spatial.distance import cdist

### Define function to print the result in tabular format 

In [0]:
#function to print result in table for similarity of frameworks
def print_result_framework(result):
    print ("{:<8}                                            {:<15}                         {:<10}".format('Framework A','Framework B','Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))



In [0]:
#function to print result in table for similarity of standards, similarity of standards for same framework and different framework
def print_result_standard(result):
    print ("{:<8}                                            {:<15}                         {:<10}".format('Standard A','Standard B','Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))



In [0]:
#function to print result in table for similarity for layers
def print_layers(result,framework1,framework2):
    print ("{:<8}                     {:<15}                         {:<10}".format(framework1,framework2,'Score'))
    print ("----------------------------------------------------------------------------------------------------------------------")
    for key,value in result.items():
        val = str(value)
        val = val.strip("{}")
        val = val.strip("''")
        val = val.replace("':","     ")
        print("{:<8}       {:<15}".format(key +"      ", val))


### Define similarity function
We are going to us cosine distance to measure the similarity between two embeddings

In [0]:
#function to calculate cosine distance 
def cosine_similarity(vec1,vec2):
    sum11, sum12, sum22 = 0, 0, 0
    for i in range(len(vec1)):
        x = vec1[i]; y = vec2[i]
        sum11 += x*x
        sum22 += y*y
        sum12 += x*y
    return sum12/math.sqrt(sum11*sum22)

##Similarity among Standards of the same Framework
In this section we show the analysis of similarity among standards belonging to the same framework.

In [0]:
import json
from rdflib import Graph
import pprint



g = Graph()
g.parse("/content/I40KG-Embeddings/sto/sto.nt", format="nt")
    
len(g) # prints 2
    
'''for stmt in g:
    pprint.pprint(stmt)'''
    
qres = g.query(
    """PREFIX owl: <http://www.w3.org/2002/07/owl#>
       PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
       PREFIX sto: <https://w3id.org/i40/sto#>
    
    select ?s where {
            ?s rdf:type sto:Standard .
    } limit 1000""")

'''    
with open("framework_entity.nt", "w") as fd:
    for row in qres:
        fd.write("%s" % row + "\n")
        #print("%s" % row)'''
        
#oldfile = open("framework_entity.nt", "r")
with open("/content/I40KG-Embeddings/embeddings/TransE/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
new_dict = {}
for row in qres:
#for line in oldfile:
    #line1 = line.strip("\n")
    for key,value in array.items():
        if key == "%s" % row:
            tem = key
            new_dict[tem] = array[key] 
    print(new_dict)

with open('/content/I40KG-Embeddings/output_standard_same_framework.json','w') as f:
    # this would place the entire output on one line
    # use json.dump(lista_items, f, indent=4) to "pretty-print" with four spaces per indent
    json.dump(new_dict, f) 

    
#to read the file containing standards/frameworks along with their embeddings   
with open('/content/I40KG-Embeddings/output_standard_same_framework.json', 'r') as f:
    array = json.load(f)
    
#compare each standard/framework with all the other standards/frameworks to find cosine similarity
result = {}
for key,value in array.items():
    temp,tempDict= 0,{}
    for keyC,valueC in array.items():
        if keyC == key:
            continue
        temp = scipy.spatial.distance.cosine(value,valueC) #send the values of the standards/frameworks to cosine similarity function
        tempDict[keyC] = temp
        val1 = min(tempDict, key=tempDict.get)
    res = {}
    res[val1] = tempDict[val1]
    #print (res)
    result[key]= res
    #result[key]= tempDict
        
#print the standards/frameworks along with their similar standards/frameworks and their similarity distance
print_result_standard(result)

{'https://w3id.org/i40/sto#IEC_61131': [0.04427987337112427, 0.02123975194990635, -0.009770496748387814, 0.21871952712535858, 0.2818382978439331, -0.24576450884342194, -0.10435811430215836, 0.02727627009153366, 0.11663924902677536, -0.074756920337677, -0.04958094656467438, 0.020205926150083542, 0.005230627488344908, 0.12727558612823486, 0.24037425220012665, 0.00566554581746459, -0.005580421537160873, 0.0003657075867522508, -0.22634616494178772, 0.2681357264518738, -0.10449211299419403, -0.11835897713899612, 0.1994495689868927, 0.13494756817817688, -0.21640567481517792, 0.07247631996870041, 0.06669963151216507, -0.07744915038347244, 0.021403685212135315, 0.11026694625616074, -0.06920012086629868, -0.025213362649083138, 0.17295943200588226, 0.2143331617116928, -0.07114788889884949, 0.22453629970550537, 0.11279267817735672, 0.0857628658413887, -0.06591077893972397, 0.0675312802195549, 0.05349031835794449, -0.15653164684772491, -0.2823489308357239, -0.06057249754667282, 0.00595594244077801

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Standard A                                            Standard B                              Score     
----------------------------------------------------------------------------------------------------------------------
https://w3id.org/i40/sto#IEC_61131             https://w3id.org/i40/sto#ISO_18828-2      0.6189733532264716
https://w3id.org/i40/sto#IEC_61987_X             https://w3id.org/i40/sto#IEC_62890      0.704515284474483
https://w3id.org/i40/sto#ISO_18828-2             https://w3id.org/i40/sto#ISO_18629      0.5380167624781809
https://w3id.org/i40/sto#IEC_61512             https://w3id.org/i40/sto#ISO_8062-4      0.7165690659098742
https://w3id.org/i40/sto#IEC_62541             https://w3id.org/i40/sto#RFC_2616      0.7231069578353885
https://w3id.org/i40/sto#ISO_16739             https://w3id.org/i40/sto#IEC_61987_X      0.7554732872480179
https://w3id.org/i40/sto#IEC_62890             https://w3id.org/i40/sto#IEC_81714      0.6713253708034757
https://w3id.org/i40/sto#eC