You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for putting the notebooks together my main feedback points summarized would be:
have summary / conclusion for each in readme (also findings, learnings)
use same PDF documents for each example
generate the same machine processable (JSON) output from each (plus the human baseline) for comparison and further analysis
use dotenv in the notebooks to allow loading environment variables from an .env file for the notebook so we don't need to ensure to add / remove credentials
Better use JSON than CSV actually because then we can handle multiple properties for each entity, align it with the structure we get from diffbot -> nodes / relationships
Just comments, no action needed here:
for the triple ones (like rebel / llama-index) our challenge here is that we can't use the results out of the box, we would either have to:
modify them to output property graph nodes or relationships
or post-process the triples to aggregate all entity attribute triples into properties and only keep the triples that represent semantic relationships as such
or do this during insertion of the data into the graph - aggregating when inserting, e.g. initially create/merge the nodes with their ID and subsequently merge on id + add property and for the relationships find start and end node with label and id and create relationship
for the Rebel one in: def create_triplets(tx, triplet) : if we want to look at this approach in the future we should see if we can carry the entity-type over, so we can use not just the generic :Node but in addition also a label for the type like :Person or :Organization
and then also do the attribute aggregation there
No description provided.
The text was updated successfully, but these errors were encountered: