#### LLM Model Comparison

Small comparison of out-of-the-box (OOB) large language models, sourced from Huggingface Hub for simple task of free-text anatomic classification of the findings section of radiology reports.
Performance is measured by classification accuracy of the model across the following categories:
```
LUNG/PLEURA/LARGE AIRWAYS
VESSELS
HEART
MEDIASTINUM AND HILA
CHEST WALL AND LOWER NECK
LIVER
BILE DUCTS
GALLBLADDER
PANCREAS
SPLEEN
ADRENAL GLANDS
KIDNEYS AND URETERS
BLADDER
REPRODUCTIVE ORGANS
BOWEL
VESSELS
PERITONEUM/RETROPERITONEUM/LYMPH NODES
BONE AND SOFT TISSUE
MISCELLANEOUS
```

`MISCELLANEOUS` category describes usually verbiage about comparison with previous studies, the type of CXR study performed, comments about patient clinical history, etc.

Classification is carried out on a subset of CXR reports sourced from the MIMIC-CXR database.
MIMIC-CXR comprises ~220k CXR studies from patients admitted to BIDMC from 2011 to 2016.
This analysis selects 50 patients' reports (totaling a little >100 reports). 
Sentence-level classification was done by hand.

#### Setup and Imports

In [1]:
from transformers import pipeline

In [2]:
# Create list of dictionaries for input/output examples

with open("../../data/eval_dataset_annotated.csv") as f:
    lines = f.readlines()

lines = [line.strip().replace("\n", "") for line in lines]
column_names = lines[0].split(",")
datadict = []
for line in lines[1:]:
    line = line.split(",")
    datadict.append({column_names[i]: line[i] for i in range(len(column_names))})

datadict[:3]

[{'filename': '/home/khans24/charit/anatomy_ner/mimic_cxr_reports/p10/p10394761/s53097934.txt',
  'patient_id': 'p10394761',
  'finding': 'PA and lateral chest views were obtained with patient in upright  position',
  'anatomic_classification': 'MISCELLANEOUS',
  'possible_secondary': ''},
 {'filename': '/home/khans24/charit/anatomy_ner/mimic_cxr_reports/p10/p10394761/s53097934.txt',
  'patient_id': 'p10394761',
  'finding': 'Analysis is performed in direct comparison with the next preceding  similar study of ___',
  'anatomic_classification': 'MISCELLANEOUS',
  'possible_secondary': ''},
 {'filename': '/home/khans24/charit/anatomy_ner/mimic_cxr_reports/p10/p10394761/s53097934.txt',
  'patient_id': 'p10394761',
  'finding': 'There is mild cardiac enlargement',
  'anatomic_classification': 'HEART',
  'possible_secondary': ''}]

In [3]:
final_dataset = [x for x in datadict if x["anatomic_classification"] != "MISCELLANEOUS"]
print(f"There are {len(final_dataset)} classification examples in the final dataset.")

There are 440 classification examples in the final dataset.


#### Testing Different Large Language Models

The following models are open-source and available on HuggingFace - we can go ahead and start with these:
- Nous Hermes
- MedAlpaca: Open source LLama trained on Anki cards and other medical student study resources
- Meditron 7b, 70b
- Mistral 7b
- BioMedLM
- BioGPT
- BioGPT-Large

In [4]:
def alpaca_prompt_constructor(input_string):
    return (
        "### Instruction: Select from one of the following categories for the most relevant anatomy involved for the prompt sentence. The categories are (separated by comma): "
        "LUNG/PLEURA/LARGE AIRWAYS, VESSELS, HEART, MEDIASTINUM AND HILA, CHEST WALL AND LOWER NECK, LIVER, BILE DUCTS, GALLBLADDER, PANCREAS, "
        "SPLEEN, ADRENAL GLANDS, KIDNEYS AND URETERS, BLADDER, REPRODUCTIVE ORGANS, BOWEL, VESSELS, PERITONEUM/RETROPERITONEUM/LYMPH NODES, BONE AND SOFT TISSUE. "
        "if none of the above categories are relevant, output MISCELLANEOUS. "
        "output the above choice and nothing more. \n\n"
        f"### Input: {input_string} \n\n"
        "### Response: "
    )

print(alpaca_prompt_constructor(final_dataset[10]["finding"]))  # Test a sample

### Instruction: Select from one of the following categories for the most relevant anatomy involved for the prompt sentence. The categories are (separated by comma): LUNG/PLEURA/LARGE AIRWAYS, VESSELS, HEART, MEDIASTINUM AND HILA, CHEST WALL AND LOWER NECK, LIVER, BILE DUCTS, GALLBLADDER, PANCREAS, SPLEEN, ADRENAL GLANDS, KIDNEYS AND URETERS, BLADDER, REPRODUCTIVE ORGANS, BOWEL, VESSELS, PERITONEUM/RETROPERITONEUM/LYMPH NODES, BONE AND SOFT TISSUE. if none of the above categories are relevant, output MISCELLANEOUS. output the above choice and nothing more. 

### Input: "As there was no evidence of pleural effusion or other signs of  parenchymal infiltrates 

### Response: 


#### Testing Llama CPP

Currently, without GPU access, the `llama-cpp` library is the best option for CPU-bound LLMs without having to undergo some serious custom engineering.

In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "medalpaca/medalpaca-7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device=-1, load_in_8bit=True)

RuntimeError: No GPU found. A GPU is needed for quantization.