# Nanopub playground

Take the data from LanceDB and form up some dataframes to load up KuzuDB with.  Then do a few simple visualizations to see relations.

Next steps:
- https://docs.kuzudb.com/extensions/vector/
- https://docs.kuzudb.com/extensions/full-text-search/


In [22]:
import kuzu
from yfiles_jupyter_graphs_for_kuzu import KuzuGraphWidget
import lancedb
import pandas as pd
import hashlib
from sentence_transformers import SentenceTransformer


In [23]:
db = lancedb.connect("../lancedb")

nanopubs_df = pd.DataFrame(db.open_table("nanopubs").to_pandas())


In [24]:
print(nanopubs_df.columns.values)

['subject' 'predicate' 'object' 'filename' 'nodename' 'index']


## Update nanopubs

Similar to what we do with entities.  However, since these are triples, we have a different way to visualizing these.

In [25]:
nanopubs_df['hashid'] = (nanopubs_df['subject'] + nanopubs_df['predicate'] + nanopubs_df['object']).apply(
    lambda x: hashlib.md5(str(x).encode()).hexdigest())


In [26]:
nanopubs_df['composite_id'] = nanopubs_df['filename'] + '_' + nanopubs_df['nodename'] + '_' + nanopubs_df['index'].astype(str)


In [27]:
nanopubs_df


Unnamed: 0,subject,predicate,object,filename,nodename,index,hashid,composite_id
0,convolutional neural network's (CNN),Predict,"embedded magnetic flux rope's orientation, imp...",2202.05901v2.pdf,Cardinal,0,9208d8c8fa21e1fd60621fe730fdb38f,2202.05901v2.pdf_Cardinal_0
1,neural networks,DemonstratedAnIncreaseInPredictionAccuracyTo,95%,2202.05901v2.pdf,Cardinal,1,9f27718c4254ab6e2ada59d32716e6f7,2202.05901v2.pdf_Cardinal_1
2,accuracy of the neural network in predicting t...,IllustratesShortcomingsIn,current physics-based models,2202.05901v2.pdf,Cardinal,2,45570e79fd8ce443048d7a7a66fed223,2202.05901v2.pdf_Cardinal_2
3,convolutional neural network,Predict,"flux rope orientation, impact parameter, and c...",2202.05901v2.pdf,Supporting,0,a9e79cf14a4ad80e0040fad2dc760335,2202.05901v2.pdf_Supporting_0
4,The neural networks,AreTrainedWith,full and partial duration flux ropes,2202.05901v2.pdf,Supporting,2,efa12b20dda9951aabd80954a16ca03d,2202.05901v2.pdf_Supporting_2
5,CNNs,Predict,errors generally under specified thresholds in...,2202.05901v2.pdf,Supporting,4,de6180cf9b4c66dbca767f3046e9f247,2202.05901v2.pdf_Supporting_4
6,A neural network,Predict,a flux rope’s orientation,2202.05901v2.pdf,hypothesis,0,caca6b97dc5bd8acaf71a30e640239d9,2202.05901v2.pdf_hypothesis_0
7,a flux rope’s orientation,IsIdentified,after an ICME,2202.05901v2.pdf,hypothesis,0,d1cfa8a8db18b727ec4e3520609c4b56,2202.05901v2.pdf_hypothesis_0
8,Recent in situ observations and increased data...,HaveSparkedInterestIn,machine learning applications within the space...,2202.05901v2.pdf,supportingArguments,1,7f2834f720cf225f61ca82ccadf6f249,2202.05901v2.pdf_supportingArguments_1
9,Previous studies,HaveSparkedInterestIn,flux rope identification and forecasting using...,2202.05901v2.pdf,supportingArguments,2,c5f176f2ac9d19f5572e7247e243017c,2202.05901v2.pdf_supportingArguments_2


#### nanopub viz

Since the nanopubs is a representation of triples, we can build the nodes and relations
if directly:

In [28]:
nanonodes = pd.DataFrame(pd.concat([nanopubs_df['subject'], nanopubs_df['object']]).unique(), columns=['node'])


In [29]:
nanorels = pd.DataFrame({
    'from': nanopubs_df['subject'],
    'to': nanopubs_df['object']
})


In [30]:
dbnp = kuzu.Database()
connnp = kuzu.Connection(dbnp)
# connnp.execute("INSTALL vector; LOAD vector;")

In [31]:
connnp.execute("CREATE NODE TABLE Claim(node STRING PRIMARY KEY)") # add in description and desc_embedding

<kuzu.query_result.QueryResult at 0x7e052c08f830>

In [32]:
connnp.execute("COPY Claim FROM nanonodes (ignore_errors=true)")

<kuzu.query_result.QueryResult at 0x7e052c08fdd0>

In [33]:
connnp.execute("CREATE REL TABLE IF NOT EXISTS rels( FROM Claim TO Claim)")

<kuzu.query_result.QueryResult at 0x7e052c08f590>

In [34]:
res = connnp.execute("COPY rels FROM nanorels")


In [35]:
gnp = KuzuGraphWidget(connnp)

In [36]:
gnp.show_cypher("MATCH (a)-[b]->(c) RETURN *")


GraphWidget(layout=Layout(height='800px', width='100%'))