This repository contains the SpAMCQA (Spondyloarthritis Multiple-Choice Question Answering) benchmark and the inference results from the paper:
Beyond Generalist LLMs: Building and Validating Domain-Specific Models with the SpAMCQA Benchmark
- SpAMCQA_Dataset_and_Inference_Logs.xlsx: This file includes the complete set of 222 expert-validated questions used in our study, along with the inference outputs from the baseline model (GPT-4) and our specialized model (SpAD-LLM).
The SpAD-LLM model weights are currently not available for public download due to clinical data privacy regulations. For research collaboration requests, please contact the corresponding authors listed in the paper.
This dataset is intended for research purposes only.