-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
It would be very helpful to be able to determine the expected image size and batch size for vision models.
This information is already available, just not exposed via a convenient function the way that mtmd_get_audio_bitrate does.
I propose adding 2 new functions:
// get vision image size in pixels, for example 1024
// return -1 if vision is not supported
MTMD_API int mtmd_get_vision_image_size(mtmd_context * ctx);
// get vision patch size, for example 14
// return -1 if vision is not supported
MTMD_API int mtmd_get_vision_patch_size(mtmd_context * ctx);Motivation
This will make it easier to do any image preprocessing before calling into the projector/model.
Possible Implementation
int mtmd_get_vision_image_size(mtmd_context * ctx) {
if (!ctx->ctx_v) {
return -1;
}
return clip_get_image_size(ctx->ctx_v);
}
int mtmd_get_vision_patch_size(mtmd_context * ctx) {
if (!ctx->ctx_v) {
return -1;
}
return clip_get_patch_size(ctx->ctx_v);
}Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request