Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions to parse YAML output to KGX-compliant CSV #349

Merged
merged 21 commits into from
Mar 22, 2024

Conversation

serenalotreck
Copy link
Contributor

Added two functions to src/ontogpt/io/csv_wrapper.py to convert the YAML output to KGX-compliant CSVs as per #343.I left the original two functions untouched, as I wasn't sure how to integrate my changes into them.

This code relies on the assumptions that:

  1. User has followed the naming conventions in the schema re: camel case & underscored versions of the entity and relation types (i.e. that there is a class called EntityContainingDocument that contains pluralized, underscored versions of the camelcase names for all entities and relations defined in the rest of the classes section).
  2. OntoGPT's extracted_object will always have entities as lists of strings and relations as lists of dictionaries.

@caufieldjh caufieldjh self-requested a review March 20, 2024 22:45
Copy link
Member

@caufieldjh caufieldjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big thanks @serenalotreck !
I'll finish integrating this and add the options to the CLI.

@caufieldjh
Copy link
Member

One detail - I'm going to make it such that the root class in the schema doesn't have to be EntityContainingDocument - it will just use the root class or accept an argument to specify a different class.

@caufieldjh
Copy link
Member

OK, basic functionality is all there, though it's a bit rough.
The parse_yaml_predictions function will take a path to YAML output and the corresponding schema, yielding dataframes.
Specifying -O kgx along with a ontogpt extract will do the same but output node and edge files (as nodes.tsv and edges.tsv) to the working directory.

@caufieldjh caufieldjh merged commit 29a3b5f into monarch-initiative:main Mar 22, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants