-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a generate-extract command for parsing the results of generated text #158
Comments
cmungall
added a commit
that referenced
this issue
Jul 31, 2023
) This PR does two things: - Add a combined generate-extract command, fixes #158 - Adds cell type templates, fixes #159 ## Generate-Extract `ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"` This does two things 1. asks GPT to generate a summary of the cell type 2. parses/extracts knowledge from that cell type This rescuscitates the original HALO idea. We could in principle **directly generate an entire knowledgebase in structured form from the latent GPT KB** Example output: ```yaml extracted_object: cell_type: Acinar cell of a salivary gland parents: - CL:0000066 subtypes: - CL:0000313 - CL:0000319 localizations: - UBERON:0001044 - UBERON:0009842 diseases: - AUTO:Sj%C3%B6gren%27s%20syndrome - MONDO:0021357 named_entities: - id: CL:0000066 label: Epithelial cell - id: CL:0000313 label: Serous cells - id: CL:0000319 label: Mucous cells - id: UBERON:0001044 label: Salivary gland - id: UBERON:0009842 label: Acinus - id: AUTO:Sj%C3%B6gren%27s%20syndrome label: Sjögren's syndrome - id: MONDO:0021357 label: Salivary gland tumors ``` ## Cell Type Templates This PR also demonstrates using subclasses for more refined subtypes Compare the two: 1. `ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"` 2. 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"` The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields Example output: ```yaml extracted_object: cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary Motor Cortex range: Not mentioned parents: - AUTO:excitatory%20neuron subtypes: - AUTO:Not%20mentioned localizations: - UBERON:0000956 - UBERON:0001384 genes: - AUTO:Not%20mentioned diseases: - MONDO:0005180 - MONDO:0020128 projects_to_or_from: - UBERON:0001893 named_entities: - id: UBERON:0001893 label: telencephalon - id: AUTO:excitatory%20neuron label: excitatory neuron - id: AUTO:Not%20mentioned label: Not mentioned - id: UBERON:0000956 label: cerebral cortex - id: UBERON:0001384 label: primary motor cortex - id: MONDO:0005180 label: Parkinson's disease - id: MONDO:0020128 label: motor neuron disease ```
ruchira
pushed a commit
to ruchira/OntoLLM
that referenced
this issue
Aug 10, 2023
…-initiative#159 This PR does two things: - Add a combined generate-extract command, fixes monarch-initiative#158 - Adds cell type templates, fixes monarch-initiative#159
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Use case: generate a description of a concept (e.g. cell type) entirely from LLM's "latent knowledge base", and then extract structure knowledge from it, thus bypassing the need for an incomplete pubmed search.
Could also be used as a kind of validation procedure on generated text - compare extracted knowledge with what is in KB. The difference is either hallucination or KB gaps.
The text was updated successfully, but these errors were encountered: