Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generate-extract command for parsing the results of generated text #158

Closed
cmungall opened this issue Jul 31, 2023 · 0 comments · Fixed by #162
Closed

Add a generate-extract command for parsing the results of generated text #158

cmungall opened this issue Jul 31, 2023 · 0 comments · Fixed by #162

Comments

@cmungall
Copy link
Member

Use case: generate a description of a concept (e.g. cell type) entirely from LLM's "latent knowledge base", and then extract structure knowledge from it, thus bypassing the need for an incomplete pubmed search.

Could also be used as a kind of validation procedure on generated text - compare extracted knowledge with what is in KB. The difference is either hallucination or KB gaps.

cmungall added a commit that referenced this issue Jul 31, 2023
This PR does two things:

- Add a combined generate-extract command, fixes #158
- Adds cell type templates, fixes #159
cmungall added a commit that referenced this issue Jul 31, 2023
)

This PR does two things:

- Add a combined generate-extract command, fixes #158
- Adds cell type templates, fixes #159

## Generate-Extract

`ontogpt generate-extract -m gpt-4 -t cell_type "Acinar Cell Of Salivary
Gland"`

This does two things

1. asks GPT to generate a summary of the cell type
2. parses/extracts knowledge from that cell type

This rescuscitates the original HALO idea. We could in principle
**directly generate an entire knowledgebase in structured form from the
latent GPT KB**

Example output:

```yaml
extracted_object:
  cell_type: Acinar cell of a salivary gland
  parents:
    - CL:0000066
  subtypes:
    - CL:0000313
    - CL:0000319
  localizations:
    - UBERON:0001044
    - UBERON:0009842
  diseases:
    - AUTO:Sj%C3%B6gren%27s%20syndrome
    - MONDO:0021357
named_entities:
  - id: CL:0000066
    label: Epithelial cell
  - id: CL:0000313
    label: Serous cells
  - id: CL:0000319
    label: Mucous cells
  - id: UBERON:0001044
    label: Salivary gland
  - id: UBERON:0009842
    label: Acinus
  - id: AUTO:Sj%C3%B6gren%27s%20syndrome
    label: Sjögren's syndrome
  - id: MONDO:0021357
    label: Salivary gland tumors
```

## Cell Type Templates

This PR also demonstrates using subclasses for more refined subtypes

Compare the two:

1. `ontogpt generate-extract -m gpt-4 -t cell_type "L2/3
Intratelencephalic Projecting Glutamatergic Neuron Of The Primary Motor
Cortex"`
2. 1ontogpt generate-extract -m gpt-4 -t cell_type.InterneuronDocument
"L2/3 Intratelencephalic Projecting Glutamatergic Neuron Of The Primary
Motor Cortex"`

The first uses the generic base class. the second uses a subclass
designed for interneurons, which has an extra slot for projection fields

Example output:

```yaml
extracted_object:
  cell_type: L2/3 Intratelencephalic Projecting Glutamatergic Neuron of the Primary
    Motor Cortex
  range: Not mentioned
  parents:
    - AUTO:excitatory%20neuron
  subtypes:
    - AUTO:Not%20mentioned
  localizations:
    - UBERON:0000956
    - UBERON:0001384
  genes:
    - AUTO:Not%20mentioned
  diseases:
    - MONDO:0005180
    - MONDO:0020128
  projects_to_or_from:
    - UBERON:0001893
named_entities:
  - id: UBERON:0001893
    label: telencephalon
  - id: AUTO:excitatory%20neuron
    label: excitatory neuron
  - id: AUTO:Not%20mentioned
    label: Not mentioned
  - id: UBERON:0000956
    label: cerebral cortex
  - id: UBERON:0001384
    label: primary motor cortex
  - id: MONDO:0005180
    label: Parkinson's disease
  - id: MONDO:0020128
    label: motor neuron disease
```
ruchira pushed a commit to ruchira/OntoLLM that referenced this issue Aug 10, 2023
…-initiative#159

This PR does two things:

- Add a combined generate-extract command, fixes monarch-initiative#158
- Adds cell type templates, fixes monarch-initiative#159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant