# Nanopub playground

Take the data from LanceDB and form up some dataframes to load up KuzuDB with.  Then do a few simple visualizations to see relations.

Next steps:
- https://docs.kuzudb.com/extensions/vector/
- https://docs.kuzudb.com/extensions/full-text-search/


In [1]:
import kuzu
from yfiles_jupyter_graphs_for_kuzu import KuzuGraphWidget
import lancedb
import pandas as pd
import hashlib
from sentence_transformers import SentenceTransformer


In [2]:
db = lancedb.connect("../lancedb")

nanopubs_df = pd.DataFrame(db.open_table("nanopubs").to_pandas())


In [3]:
print(nanopubs_df.columns.values)

['subject' 'predicate' 'object' 'filename' 'nodename' 'index']


## Update nanopubs

Similar to what we do with entities.  However, since these are triples, we have a different way to visualizing these.

In [4]:
nanopubs_df['hashid'] = (nanopubs_df['subject'] + nanopubs_df['predicate'] + nanopubs_df['object']).apply(
    lambda x: hashlib.md5(str(x).encode()).hexdigest())


In [5]:
nanopubs_df['composite_id'] = nanopubs_df['filename'] + '_' + nanopubs_df['nodename'] + '_' + nanopubs_df['index'].astype(str)


#### nanopub viz

Since the nanopubs is a representation of triples, we can build the nodes and relations
if directly:

In [6]:
nanonodes = pd.DataFrame(pd.concat([nanopubs_df['subject'], nanopubs_df['object']]).unique(), columns=['node'])


In [7]:
nanorels = pd.DataFrame({
    'from': nanopubs_df['subject'],
    'to': nanopubs_df['object']
})


In [8]:
dbnp = kuzu.Database()
connnp = kuzu.Connection(dbnp)
# connnp.execute("INSTALL vector; LOAD vector;")

In [9]:
connnp.execute("CREATE NODE TABLE Claim(node STRING PRIMARY KEY)") # add in description and desc_embedding

<kuzu.query_result.QueryResult at 0x7e052c0f4980>

In [10]:
connnp.execute("COPY Claim FROM nanonodes (ignore_errors=true)")

<kuzu.query_result.QueryResult at 0x7e052c051bd0>

In [11]:
connnp.execute("CREATE REL TABLE IF NOT EXISTS rels( FROM Claim TO Claim)")

<kuzu.query_result.QueryResult at 0x7e052c051a90>

In [12]:
res = connnp.execute("COPY rels FROM nanorels")


In [13]:
gnp = KuzuGraphWidget(connnp)

In [14]:
gnp.show_cypher("MATCH (a)-[b]->(c) RETURN *")


GraphWidget(layout=Layout(height='800px', width='100%'))