# eligibility_criteria_parser

> Repository with experiments on the usability of prompt learning for parsing eligibility criteria in clinical trials

## Install

In order to install the module issue the following commands

```sh
bash$ git clone https://github.com/megaduks/criteria_parser.git

bash$ cd criteria_parser

bash$ pip install -r requirements.txt

bash$ pip install -e '.[dev]'
```

The next step is to run `dvc` to download the data

```bash
bash$ dvc pull
```

## How to use

Here is how you can load the Chia dataset for easy processing

In [None]:
from eligibility_criteria_parser.core import *

df = load_eligibility_criteria()

In [None]:
df.head()

Unnamed: 0,ct_no,criteria,mode,drugs,persons,procedures,conditions,devices,visits,scopes,observations,measurements
0,NCT03124329,Male and female individuals between ages of 18...,inclusion,,[ages],,"[gingival recession defects, recession defects]",,,,[cervical restorations extending to the CEJ],"[recession, keratinized gingiva, Miller]"
1,NCT02796378,Elevated blood-cholesterol,inclusion,,,,,,,,,[blood-cholesterol]
2,NCT03216967,Adult patients Kidney transplant recipients Pa...,inclusion,"[calcineurin inhibitor, mycophenolic acid]",[Adult],,,,,,,"[Viremia, pregnancy test, blood ß-HCG dosage]"
3,NCT02200978,Patients less than 16 years old with newly dia...,inclusion,,[old],,[acute promyelocytic leukemia],,,,,[PML-RARa]
4,NCT01314898,"Male and/or female healthy volunteers, age 18 ...",inclusion,,"[Male, female, age, Females]",,"[healthy, childbearing potential]",,,,,"[Body Mass Index (BMI), total body weight]"


In [None]:
df.columns

Index(['ct_no', 'criteria', 'mode', 'drugs', 'persons', 'procedures',
       'conditions', 'devices', 'visits', 'scopes', 'observations',
       'measurements'],
      dtype='object')

In [None]:
df.shape

(2000, 12)

A simple function can compute the metric of entity coverage

In [None]:
from nbdev import show_doc

show_doc(entity_coverage)

---

[source](https://github.com/Mikołaj Morzy/eligibility_criteria_parser/blob/main/eligibility_criteria_parser/core.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### entity_coverage

>      entity_coverage (ents_true:List[str], ents_pred:List[str], mode:str,
>                       threshold:float=0.0)

Compute the compound metric of entity coverage in eligibility criteria

Args:

    ents_true: entities from Chia annotations

    ents_pred: predicted entities

    mode: which version of Jaccard coefficient to use

    threshold: only matches with Jaccard coefficient above the threshold will count as non-zero

For each entity in a criterion, find the predicted entity which maximizes the Jaccard score and
return the average Jaccard score for matched entities and the percentage of entites for which
any matching has been found

In [None]:
ents_true = ['adult', 'no alcohol substance abuse', 'cardiovascular disease', 'elevated cholesterol']
ents_pred = ['adult man or woman', 'no alcohol usage during last year', 'high blood pressure']

In [None]:
entity_coverage(ents_true=ents_true, ents_pred=ents_pred, mode="strict")

(0.25, 0.5)

In [None]:
entity_coverage(ents_true=ents_true, ents_pred=ents_pred, mode="relaxed")

(0.75, 0.5)

In [None]:
entity_coverage(ents_true=ents_true, ents_pred=ents_pred, mode="left")

(0.75, 0.5)

In [None]:
entity_coverage(ents_true=ents_true, ents_pred=ents_pred, mode="right")

(0.29166666666666663, 0.5)

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()