<a href="https://colab.research.google.com/github/collarad/PyKEEN/blob/master/notebooks/visualization/Embedding_Projector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exporting Vectors and Metadata to tsv
In this Notebook we are going to export PyKeen created embeddings to tsv files expected by the [Embedding Projector](https://projector.tensorflow.org/) tool.

# Initial Configuration

## Clonning Repository
Cloning the git repository containing PyKeen embeddings. In this example we are going to clone embeddings create for the Industry 4.0 Standards Landscape Knowledge Graph.

In [0]:
!git clone https://github.com/i40-Tools/I40KG-Embeddings.git

## Installing Libraries

Install and import the [rdflib](https://github.com/RDFLib/rdflib) library, to execute SPARQL queries to limit the scope of the embeddings we would like to display.

In [0]:
!pip install rdflib

In [0]:
import json
from rdflib import Graph
import pprint

## Defining Main Variables
We use two main variables in the rest of the notebook:

1.   **kg_path**: Correspont to the path of the knowledge graph, i.e., ***file-name.nt.*** Do not forget that you have to add /content/ folder at the begining of the path to find in the clonned repo in Colab.
2.   **emb_path**: Correspont to the path containing the embedding output after PyKeen training phase, i.e., ***file-name.json***



In [0]:
kg_path = "/content/I40KG-Embeddings/embeddings/sto/sto-enriched.nt"
emb_path = "/content/I40KG-Embeddings/embeddings/training_set_relatedTo/TransH/entities_to_embeddings.json"

# Exporting Vecs and Meta TSV Files

## Creating labels and tokens
First, we will run a SPARQL query to select the entities embeddings we would like to export for visualization on Embedding Projector. We create at the same time labels and tokens to be exported as TSV files.

In [0]:
g = Graph()
g.parse(kg_path, format="nt")
len(g) # prints 2

#query to get all the standards from the sto.nt file
qres = g.query("""
        PREFIX owl: <http://www.w3.org/2002/07/owl#>
        PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
        PREFIX sto: <https://w3id.org/i40/sto#>
        
        select ?s where {
           ?s rdf:type sto:Standard .
         } limit 1000 
        """)

labels = []
tokens = []

#to get the corresponding embeddings of the frameworks/standards from the json file 
with open(emb_path,'rb') as f:
    array = json.load(f)
for row in qres:
    for key,value in array.items():
        if key == "%s" % row:
            labels.append(key.replace('https://w3id.org/i40/', '')) # Just to clean the namespace for better visualization in embedding projector
            tokens.append(array[key])


In [0]:
print(len(labels))
print(len(tokens))

## Saving vecs and meta TSV files
Now, based on the labels and tokens we create the TSV files.

In [0]:
import io

out_v = io.open('vecs.tsv', 'w', encoding='utf-8')
out_m = io.open('meta.tsv', 'w', encoding='utf-8')

standards_size = len(labels)

for standard_num in range(1, standards_size):
  out_m.write(labels[standard_num] + "\n")
  out_v.write('\t'.join([str(x) for x in tokens[standard_num]]) + "\n")
out_v.close()
out_m.close()

## Downloading Files
If you are working in colab, the following piece of code is required to download vecs and meta TSV files.

In [0]:
try:
  from google.colab import files
except ImportError:
  pass
else:
  files.download('vecs.tsv')
  files.download('meta.tsv')

# Uploading files in Embedding Projector
Now you are ready to visualize the embeddings using [Embedding Projector](https://projector.tensorflow.org/). We have created a video showing how to upload the resulting data from this notebook into Embedding Projector: [video](https://youtu.be/cNhX2XtiWQM).
