Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend - Checking on multiple LLMs for entity and relations extraction and populating KG #16

Closed
aashipandya opened this issue Jan 30, 2024 · 1 comment
Assignees

Comments

@aashipandya
Copy link
Contributor

No description provided.

@jexp
Copy link
Contributor

jexp commented Feb 1, 2024

Comments from our discussion:

Thanks for putting the notebooks together my main feedback points summarized would be:

  • have summary / conclusion for each in readme (also findings, learnings)
  • use same PDF documents for each example
  • generate the same machine processable (JSON) output from each (plus the human baseline) for comparison and further analysis
  • use dotenv in the notebooks to allow loading environment variables from an .env file for the notebook so we don't need to ensure to add / remove credentials

Better use JSON than CSV actually because then we can handle multiple properties for each entity, align it with the structure we get from diffbot -> nodes / relationships

Just comments, no action needed here:

for the triple ones (like rebel / llama-index) our challenge here is that we can't use the results out of the box, we would either have to:

  • modify them to output property graph nodes or relationships
  • or post-process the triples to aggregate all entity attribute triples into properties and only keep the triples that represent semantic relationships as such
  • or do this during insertion of the data into the graph - aggregating when inserting, e.g. initially create/merge the nodes with their ID and subsequently merge on id + add property and for the relationships find start and end node with label and id and create relationship

for the Rebel one in: def create_triplets(tx, triplet) : if we want to look at this approach in the future we should see if we can carry the entity-type over, so we can use not just the generic :Node but in addition also a label for the type like :Person or :Organization
and then also do the attribute aggregation there

@rakshita-arora rakshita-arora removed their assignment Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants