# Dermatology dataset

See `README.md` for installation and usage instructions.

This notebook pre-processes the provided dermatoloty data from [1, 2].

```
[1] Stutz, D., Roy, A.G., Matejovicova, T., Strachan, P., Cemgil, A.T.,
    & Doucet, A. (2023).
    Conformal prediction under ambiguous ground truth. ArXiv, abs/2307.09302.
[2] Stutz, D., Cemgil, A.T., Roy, A.G., Matejovicova, T., Barsbey, M.,
    Strachan, P., Schaekermann, M., Freyberg, J.V., Rikhye, R.V., Freeman, B.,
    Matos, J.P., Telang, U., Webster, D.R., Liu, Y., Corrado, G.S., Matias, Y.,
    Kohli, P., Liu, Y., Doucet, A., & Karthikesalingam, A. (2023).
    Evaluating AI systems under uncertain ground truth: a case study in
    dermatology. ArXiv, abs/2307.02191.
```

## Imports

In [None]:
import numpy as np
import os
import json
import pickle

In [None]:
import formats
import selectors_utils
import irn

## Data

In [None]:
with open('data/dermatology_selectors.json', 'r') as f:
  selectors = json.load(f)

In [None]:
padded_selectors = selectors_utils.pad_selectors(selectors, 10)
rankings, groups = formats.convert_selectors_to_rankings(padded_selectors, 419)
plausibilities = irn.aggregate_irn(rankings, groups)

In [None]:
data = {
    'test_irn': plausibilities,
    'test_selectors': selectors,
    'test_rankings': rankings,
    'test_groups': groups,
}
with open('data/dermatology_data.pkl', 'wb') as f:
  pickle.dump(data, f)