Model improvements & fixes
- Added evaluation support for Detectron2_Detection
- Fixed dataset issue to reflect the correct on-device accuracy for BERT
- Fixed accuracy discrepancy between torch and on-device for HuggingFace WaveLM Base Plus
- Fixed input type to avoid mismatch for VideoMAE
- Model changes to RTMDet that allow quantization to be w8a16
- Correctly adding all context lengths to Qwen 2.5 7B VL metadata
- PiperTTS variants correctly specify the language in their description.
- Running demo.py on-device now works with custom input shapes
- MMMU multimodal eval dataset and evaluator were added for VLMs
- Ability run curated 100 prompts evaluation across LLMs/VLMs in evaluate.py
- Added performance numbers for Samsung Galaxy S26 across all models.
- XR 2 Gen 2 published perf numbers are now measured using Samsung Galaxy S22 instead of QCS8450 (Proxy), as Proxy devices will soon be deprecated in workbench.