Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding generate-extract command, 158. Add cell type templates #159 #162

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

cmungall
Copy link
Member

@cmungall cmungall commented Jul 31, 2023

This PR does two things:

Generate-Extract

ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary Gland"

This does two things

  1. asks GPT to generate a summary of the cell type
  2. parses/extracts knowledge from that cell type

This rescuscitates the original HALO idea. We could in principle directly generate an entire knowledgebase in structured form from the latent GPT KB

Example output:

extracted_object:
  cell_type: Acinar cell of a salivary gland
  parents:
    - CL:0000066
  subtypes:
    - CL:0000313
    - CL:0000319
  localizations:
    - UBERON:0001044
    - UBERON:0009842
  diseases:
    - AUTO:Sj%C3%B6gren%27s%20syndrome
    - MONDO:0021357
named_entities:
  - id: CL:0000066
    label: Epithelial cell
  - id: CL:0000313
    label: Serous cells
  - id: CL:0000319
    label: Mucous cells
  - id: UBERON:0001044
    label: Salivary gland
  - id: UBERON:0009842
    label: Acinus
  - id: AUTO:Sj%C3%B6gren%27s%20syndrome
    label: Sjögren's syndrome
  - id: MONDO:0021357
    label: Salivary gland tumors

Cell Type Templates

This PR also demonstrates using subclasses for more refined subtypes

Compare the two:

  1. ontogpt generate-extract -m gpt-4 -t cell_type "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"
  2. 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument "L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor Cortex"`

The first uses the generic base class. the second uses a subclass designed for interneurons, which has an extra slot for projection fields

Example output:

extracted_object:
  cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary
    Motor Cortex
  range: Not mentioned
  parents:
    - AUTO:excitatory%20neuron
  subtypes:
    - AUTO:Not%20mentioned
  localizations:
    - UBERON:0000956
    - UBERON:0001384
  genes:
    - AUTO:Not%20mentioned
  diseases:
    - MONDO:0005180
    - MONDO:0020128
  projects_to_or_from:
    - UBERON:0001893
named_entities:
  - id: UBERON:0001893
    label: telencephalon
  - id: AUTO:excitatory%20neuron
    label: excitatory neuron
  - id: AUTO:Not%20mentioned
    label: Not mentioned
  - id: UBERON:0000956
    label: cerebral cortex
  - id: UBERON:0001384
    label: primary motor cortex
  - id: MONDO:0005180
    label: Parkinson's disease
  - id: MONDO:0020128
    label: motor neuron disease

This PR does two things:

- Add a combined generate-extract command, fixes #158
- Adds cell type templates, fixes #159
@cmungall cmungall requested a review from caufieldjh July 31, 2023 21:32
@cmungall cmungall merged commit 90d3eaa into main Jul 31, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a cell type template Add a generate-extract command for parsing the results of generated text
1 participant