# Finding Model Search Repository

## Setup

`FindingModelRepsitory` stores finding models in JSON format in a `defs` directory under a root directory, along with an index file in the root directory (`index.jsonl`).

One possibility is that users will clone the [Open Imaging Finding Models repository](https://github.com/openimagingdata/findingmodels) and use the root directory for local finding model work.

In [1]:
import findingmodel as fm
from findingmodel.search_repository import SearchRepository

Here, we're going to use the `data` directory as the repository root; there are already a number of files there:
```text
$ tree
.
├── defs
│   ├── abdominal_aortic_aneurysm.fm.json
│   ├── aortic_dissection.fm.json
│   ├── breast_density.fm.json
│   ├── breast_malignancy_risk.fm.json
│   ├── pulmonary_embolism.fm.json
│   └── ventricular_diameters.fm.json
└── index.lancedb
```

> The `index.lancedb` is where the [LanceDB](https://lancedb.github.io/lancedb) the underlies the repository index is.

In [2]:
repo = SearchRepository("data")

## Basic Retrieval

In [3]:
repo.model_ids

['OIFM_MSFT_134126',
 'OIFM_MSFT_156954',
 'OIFM_MSFT_356221',
 'OIFM_MSFT_367670',
 'OIFM_MSFT_573630',
 'OIFM_MSFT_932618']

`list_models()` returns an `Iterator[FindingModelFull]` after loading each finding model.

In [4]:
list(repo.list_models())

[FindingModelFull(oifm_id='OIFM_MSFT_134126', name='abdominal aortic aneurysm', description='An abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.', synonyms=['AAA'], tags=None, attributes=[ChoiceAttributeIded(oifma_id='OIFMA_MSFT_898601', name='presence', description='Presence or absence of abdominal aortic aneurysm', type=<AttributeType.CHOICE: 'choice'>, values=[ChoiceValueIded(value_code='OIFMA_MSFT_898601.0', name='absent', description='Abdominal aortic aneurysm is absent', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.1', name='present', description='Abdominal aortic aneurysm is present', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.2', name='indeterminate', description='Presence of abdominal aortic aneurysm cannot be determined', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.3',

Get a specific finding model with `get_model()`; takes either a name or an OIFM ID as its parameter.

In [5]:
repo.get_model("abdominal aortic aneurysm")

FindingModelFull(oifm_id='OIFM_MSFT_134126', name='abdominal aortic aneurysm', description='An abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.', synonyms=['AAA'], tags=None, attributes=[ChoiceAttributeIded(oifma_id='OIFMA_MSFT_898601', name='presence', description='Presence or absence of abdominal aortic aneurysm', type=<AttributeType.CHOICE: 'choice'>, values=[ChoiceValueIded(value_code='OIFMA_MSFT_898601.0', name='absent', description='Abdominal aortic aneurysm is absent', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.1', name='present', description='Abdominal aortic aneurysm is present', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.2', name='indeterminate', description='Presence of abdominal aortic aneurysm cannot be determined', index_codes=None), ChoiceValueIded(value_code='OIFMA_MSFT_898601.3', 

In [6]:
list(repo._models_path.iterdir())

[PosixPath('data/defs/ventricular_diameters.fm.json'),
 PosixPath('data/defs/breast_malignancy_risk.fm.json'),
 PosixPath('data/defs/pulmonary_embolism.fm.json'),
 PosixPath('data/defs/abdominal_aortic_aneurysm.fm.json'),
 PosixPath('data/defs/breast_density.fm.json'),
 PosixPath('data/defs/aortic_dissection.fm.json')]

## Save to Repository

We can save a new model to the repository. We can start with an ID-less `FindingModelBase`; when we save it, we will get back a `FindingModelFull` with IDs. Note that you need to provide a 3- or 4-letter source code (e.g., "MGB", "MSFT") to save an ID-less model.

In [7]:
new_model = fm.FindingModelBase(
    name="Test Model",
    description="A simple test finding model.",
    synonyms=["Test Synonym"],
    tags=["tag1", "tag2"],
    attributes=[
        fm.finding_model.ChoiceAttribute(
            name="Severity",
            values=[fm.finding_model.ChoiceValue(name="Mild"), fm.finding_model.ChoiceValue(name="Severe")],
            description="How severe is the finding?",
            required=True,
            max_selected=1,
        ),
        fm.finding_model.NumericAttribute(
            name="Size",
            description="Size of the finding.",
            minimum=1,
            maximum=10,
            unit="cm",
            required=False,
        ),
    ],
)

In [8]:
saved_model = repo.save_model(new_model, source="TEST")

In [9]:
print(saved_model.oifm_id)

OIFM_TEST_376162


In [10]:
print("\n".join(repo.model_names))

abdominal aortic aneurysm
aortic dissection
Breast density
Mammographic malignancy assessment
pulmonary embolism
Test Model
Ventricular diameters


In [11]:
repo._table.count_rows()

7

In [12]:
# Get all the files in the repository's definitions directory
def model_files_dir() -> list[str]:
    return [str(file) for file in repo._models_path.iterdir() if file.is_file() and file.suffix == ".json"]


print("\n".join(model_files_dir()))

data/defs/ventricular_diameters.fm.json
data/defs/breast_malignancy_risk.fm.json
data/defs/pulmonary_embolism.fm.json
data/defs/abdominal_aortic_aneurysm.fm.json
data/defs/breast_density.fm.json
data/defs/test_model.fm.json
data/defs/aortic_dissection.fm.json


## Delete from Repository

Remove a model from the repo (including deleting the model file in the `defs` directory) using `remove_model()`.

In [13]:
repo.remove_model(saved_model.oifm_id)

In [14]:
print("\n".join(repo.model_names))

abdominal aortic aneurysm
aortic dissection
Breast density
Mammographic malignancy assessment
pulmonary embolism
Ventricular diameters


In [15]:
print("\n".join(model_files_dir()))

data/defs/ventricular_diameters.fm.json
data/defs/breast_malignancy_risk.fm.json
data/defs/pulmonary_embolism.fm.json
data/defs/abdominal_aortic_aneurysm.fm.json
data/defs/breast_density.fm.json
data/defs/aortic_dissection.fm.json


## Check for Duplicate IDs

We can also check a `FindingModelFull` object for IDs (either models or attributes) that have already been used in the database; if any are found, it will be returned as a dictionary where the offending ID will point to a data structure describing the model which already contains that ID.

In [16]:
saved_model.oifm_id = "OIFM_MSFT_134126"
errors = repo.check_model_for_duplicate_ids(saved_model)
print(errors)

{'OIFM_MSFT_134126': SearchIndexEntry(file='abdominal_aortic_aneurysm.fm.json', id='OIFM_MSFT_134126', name='abdominal aortic aneurysm', slug_name='abdominal_aortic_aneurysm', description='An abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.', synonyms=None, tags=None, index_text='abdominal aortic aneurysm\nAn abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.\nAttributes: presence; change from prior', attribute_names=['presence', 'change from prior'], attribute_ids=['OIFMA_MSFT_898601', 'OIFMA_MSFT_783072'], vector=FixedSizeList(dim=3072))}


You can also just directly check a model ID or attribute ID with `check_existing_id()`, which will return a data structure where the offending ID is found if it occurs.

In [17]:
error = repo.check_existing_id("OIFM_MSFT_134126")
print(error)
error = repo.check_existing_id("OIFM_TEST_701203")
print(error)

[SearchIndexEntry(file='abdominal_aortic_aneurysm.fm.json', id='OIFM_MSFT_134126', name='abdominal aortic aneurysm', slug_name='abdominal_aortic_aneurysm', description='An abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.', synonyms=None, tags=None, index_text='abdominal aortic aneurysm\nAn abdominal aortic aneurysm (AAA) is a localized dilation of the abdominal aorta, typically defined as a diameter greater than 3 cm, which can lead to rupture and significant morbidity or mortality.\nAttributes: presence; change from prior', attribute_names=['presence', 'change from prior'], attribute_ids=['OIFMA_MSFT_898601', 'OIFMA_MSFT_783072'], vector=FixedSizeList(dim=3072))]
[]


## Search

In [5]:
SEARCH_TERMS = ["heart", "breast", "abdomen", "lung"]
for term in SEARCH_TERMS:
    print(f"Searching for '{term}'")
    for summary in repo.search_summary(term):
        print(f"  {summary.name} - {summary.id} - {summary.score:.3f}")


Searching for 'heart'
  Ventricular diameters - OIFM_MSFT_367670 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
Searching for 'breast'
  Breast density - OIFM_MSFT_356221 - 0.016
  Mammographic malignancy assessment - OIFM_MSFT_156954 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
Searching for 'abdomen'
  abdominal aortic aneurysm - OIFM_MSFT_134126 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
Searching for 'lung'
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
  abdominal aortic aneurysm - OIFM_MSFT_134126 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016


In [4]:
for term in SEARCH_TERMS:
    print(f"Searching for '{term}'")
    for model, score in repo.search_models(term):
        print(f"  {model.name} - {model.oifm_id} - {score:.3f}")

Searching for 'heart'
  Ventricular diameters - OIFM_MSFT_367670 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
Searching for 'breast'
  Breast density - OIFM_MSFT_356221 - 0.016
  Mammographic malignancy assessment - OIFM_MSFT_156954 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
Searching for 'abdomen'
  abdominal aortic aneurysm - OIFM_MSFT_134126 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
Searching for 'lung'
  pulmonary embolism - OIFM_MSFT_932618 - 0.016
  abdominal aortic aneurysm - OIFM_MSFT_134126 - 0.016
  aortic dissection - OIFM_MSFT_573630 - 0.016
