# GitHub-assisted workflow

Here we will walk through a use case where we make use of an index of an ontology issue tracker to assist us
in adding ontology terms.

In [6]:
!curategpt ontology index -m openai:  -c terms_obi sqlite:obo:obi

__TODO__ we should probably pin this to a version of OBI, as if OBI adds the term described here the demo doesn't make sense any more...

## Check the indexing worked

We'll query for the top 3 terms most similar to "magnetoencephalography"

In [11]:
!curategpt search -l 3 -c terms_obi magnetoencephalography

## 1 DISTANCE: 0.1318223476409912
id: Electroencephalography
label: electroencephalography
definition: An extracellular electrophysiology assay where electrodes are mounted
  outside the brain (either on the surface of the scalp on onto the brain surface
  itself during surgery) to measure the electrical field over the external surface.
relationships:
- predicate: rdfs:subClassOf
  target: ExtracellularElectrophysiologyRecordingAssay

## 2 DISTANCE: 0.2989899218082428
id: MagneticResonanceImagingAssay
label: magnetic resonance imaging assay
definition: An imaging assay in which nuclear magnetic resonance is used to produce
  information about the interior structure and composition of an input material entity.
relationships:
- predicate: rdfs:subClassOf
  target: ImagingAssay

## 3 DISTANCE: 0.3554028570652008
id: Immunoelectrophoresis
label: immunoelectrophoresis
definition: An electrophoresis that separates and characterize proteins based on reaction
  with antibodies.
relationships:


Note the results (intentionally) don't have the IDs, and CamelCase versions of the labels are used as IDs
(with additional uniquification). We'll return to this

## Index the issue tracker

In [3]:
!curategpt index -c gh_obi obi-issues/*.json 

In [4]:
!curategpt search -l 2 -c gh_obi magnetoencephalography | perl -npe 's@login:.*@@'

## 0 DISTANCE: 0.4603792726993561
number: 1000
state: OPEN
title: 'NTR: magnetoencephalography'

## 0 DISTANCE: 1.0009130239486694
body: "NTR: magnetoencephalography\r\ndefinition: a functional neuroimaging technique\
  \ for mapping brain activity by recording magnetic fields produced by electrical\
  \ currents occurring naturally in the brain, using very sensitive magnetometers.\
  \ [Source: https://en.wikipedia.org/wiki/Magnetoencephalography]\r\nParent: extracellular\
  \ electrophysiology recording assay OBI:0000454\r\nsynonyms: MEG\r\n\r\nThe term\
  \ does exist in Ontology for MIRNA Target (OMIT:0016015), though that seems an inappropriate\
  \ place. "
closedAt: null
comments:
- id: MDEyOklzc3VlQ29tbWVudDQ2OTM0NDg3NA==
  author:
    
  authorAssociation: CONTRIBUTOR
  body: "Discussed on the OBI call 2019-03-04.\r\n\r\nWhile looking at this issue,\
    \ we noticed that 'extracellular electrophysiology recording assay' OBI:0000454`has_specified_output\
    \ some 'mass measu

## Create a term using GitHub issues as background knowledge

In [9]:
!curategpt create -c terms_obi --docstore-collection gh_obi -m gpt-4 "magnetoencephalography"

id: Magnetoencephalography
label: magnetoencephalography
definition: A functional neuroimaging technique for mapping brain activity by recording
  magnetic fields produced by electrical currents occurring naturally in the brain,
  using very sensitive magnetometers.
relationships:
- predicate: rdfs:subClassOf
  target: ExtracellularElectrophysiologyRecordingAssay



### Comparison without GitHub issues

In [10]:
!curategpt create -c terms_obi -m gpt-4 "magnetoencephalography"

id: Magnetoencephalography
label: magnetoencephalography
definition: A technique for mapping brain activity by recording magnetic fields produced
  by electrical currents occurring naturally in the brain, using very sensitive magnetometers.
relationships:
- predicate: rdfs:subClassOf
  target: BrainMapping



As can be seen, the resulting definition is quite close. This is not surprising as the source of the proposed definition in the github issue is from Wikipedia, which has already been ingested by GPT-4. However, we can see the wording (_A functional neuroimaging technique_) is taken directly from the modified form proposed in the GitHub issue when the GitHub store is given as an additional document source.

This is more obvious when we also consider the suggested parent (`ExtracellularElectrophysiologyRecordingAssay`, an existing term).