MedMNIST v2 is a collection of biomedical images. There are 8 datasets with 2D images for a multi-class classification task. The authors of the dataset collection report baseline performances with ResNets, and with the AutoML solutions auto-sklearn, AutoKeras, and Google AutoML Vision.
Using a pre-trained Vision Transformer model, and fine-tuning it for each task, we are able to outperform almost all of those baselines. The noteobook contains the full code to run the Vision Transformer experiment. The other numbers are from the authors of MedMNIST v2.