# Create an Index over JSON files

CurateGPT depends on *indexes* for many operations. These are used to provide context for LLM queries.

CurateGPT contains *wrappers* for many common APIs and data formats, but if a wrapper doesn't
existing you can always convert to JSON files and load.

Assume we have a directory of JSON files:

In [1]:
!ls ../../data/phenopackets | head

Abdul_Wahab-2016-GCDH-Patient_5.json
Ajmal-2013-BBS1-IV-5_family_A.json
Al-Dosari-2010-TFAP2A-10-year-old_girl.json
Al-Hashmi-2018-SNX14-IV-1.json
Al-Qattan-2018-SCARF2-proband.json
Al-Semari-2013-FGD1-II-1.json
AlSubhi-2016-ALG9-IV_5.json
Alazami-2016-ATP6V1E1-Family_5_-_IV_2.json
Ali-2017-MYH3-proband.json
Alzahrani-2018-EPG5-18-month_son.json


Let's index these:

In [2]:
!curategpt index -c phenopackets_384 -m openai: --object-type Phenopacket \
    --description "Phenopackets from https://zenodo.org/record/3905420" ../../data/phenopackets/*.json



## Search the index

In [3]:
!curategpt search -c phenopackets_384 "renal phenotypes"

## 1 DISTANCE: 0.37623695681691316
id: PMID:27886254-Zhong-2016-PRPF3-020001-II:4
subject:
  id: 020001-II:4
  ageAtCollection:
    age: adult
  sex: MALE
  taxonomy:
    id: NCBITaxon:9606
    label: Homo sapiens
phenotypicFeatures:
- type:
    id: HP:0000662
    label: Nyctalopia
  evidence:
  - evidenceCode:
      id: ECO:0000033
      label: author statement supported by traceable reference
    reference:
      id: PMID:27886254
      description: Two novel mutations in PRPF3 causing autosomal dominant retinitis
        pigmentosa
- type:
    id: HP:0007737
    label: Bone spicule pigmentation of the retina
  evidence:
  - evidenceCode:
      id: ECO:0000033
      label: author statement supported by traceable reference
    reference:
      id: PMID:27886254
      description: Two novel mutations in PRPF3 causing autosomal dominant retinitis
        pigmentosa
- type:
    id: HP:0001133
    label: Constriction of peripheral visual field
  evidenc

In [4]:
!curategpt ask -c phenopackets_384 "what genes are associated with renal phenotypes?"

# Response:

The genes associated with renal phenotypes are KMT2A [1](#ref-1) and UMOD [2](#ref-2).


# Raw:

The genes associated with renal phenotypes are KMT2A [1] and UMOD [2].

# References:


## 1

```yaml
id: PMID:25186178-Zemojtel-2014-KMT2A-P1
subject:
  id: P1
  ageAtCollection:
    age: P3Y
  sex: FEMALE
  taxonomy:
    id: NCBITaxon:9606
    label: Homo sapiens
phenotypicFeatures:
- type:
    id: HP:0003508
    label: Proportionate short stature
  evidence:
  - evidenceCode:
      id: ECO:0000033
      label: author statement supported by traceable reference
    reference:
      id: PMID:25186178
      description: Effective diagnosis of genetic disease by computational phenotype
        analysis of the disease-associated genome
- type:
    id: HP:0007441
    label: Hyperpigmented/hypopigmented macules
  evidence:
  - evidenceCode:
      id: ECO:0000033
      label: author statement supported by traceable reference
    reference:
