<a href="https://colab.research.google.com/github/collarad/PyKEEN/blob/master/notebooks/visualization/Embedding_Projector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exporting Vectors and Metadata to tsv
In this Notebook we are going to export PyKeen created embeddings to tsv files expected by the [Embedding Projector](https://projector.tensorflow.org/) tool.

# Initial Configuration

## Clonning Repository
Cloning the git repository containing PyKeen embeddings. In this example we are going to clone embeddings create for the Industry 4.0 Standards Landscape Knowledge Graph.

In [0]:
!git clone https://github.com/i40-Tools/I40KG-Embeddings.git

fatal: destination path 'I40KG-Embeddings' already exists and is not an empty directory.


Install and import the [rdflib](https://github.com/RDFLib/rdflib) library, to execute SPARQL queries to limit the scope of the embeddings we would like to display.

In [0]:
!pip install rdflib



In [0]:
import json
from rdflib import Graph
import pprint

In [0]:
embeddings_path = "/content/I40KG-Embeddings/embeddings/sto/sto-enriched.nt"

In [0]:
g = Graph()
g.parse("/content/I40KG-Embeddings/embeddings/sto/sto-enriched.nt", format="nt")
len(g) # prints 2

#query to get the framework/standard from the sto.nt file
qres = g.query("""
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX sto: <https://w3id.org/i40/sto#>
        
        select ?s where {
           ?s rdf:type sto:Standard .
         } limit 1000 
        """)

labels = []
tokens = []

#to get the corresponding embeddings of the frameworks/standards from the json file 
with open("/content/I40KG-Embeddings/embeddings/training_set_relatedTo/TransH/entities_to_embeddings.json",'rb') as f:
    array = json.load(f)
for row in qres:
    for key,value in array.items():
        if key == "%s" % row:
            labels.append(key.replace('https://w3id.org/i40/', ''))
            tokens.append(array[key])


In [0]:
print(len(labels))
print(len(tokens))

249
249


In [0]:
import io

out_v = io.open('vecs.tsv', 'w', encoding='utf-8')
out_m = io.open('meta.tsv', 'w', encoding='utf-8')

standards_size = len(labels)

for standard_num in range(1, standards_size):
  out_m.write(labels[standard_num] + "\n")
  out_v.write('\t'.join([str(x) for x in tokens[standard_num]]) + "\n")
out_v.close()
out_m.close()

In [0]:
try:
  from google.colab import files
except ImportError:
  pass
else:
  files.download('vecs.tsv')
  files.download('meta.tsv')