Skip to content

Set-of-Mark Prompting #898

@slavakurilyak

Description

@slavakurilyak

Is your feature request related to a problem? Please describe.

Set-of-Mark Prompting unlocks better control of multi-modal models like GPT-4V. The authors present Set-of-Mark (SoM), a new visual prompting method, to enhance the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V. This method involves partitioning an image into regions at different levels of granularity and overlaying these regions with marks like alphanumerics, masks, and boxes. This enables GPT-4V to answer questions requiring visual grounding more effectively. The authors' experiments show significant improvements in tasks like referring expression comprehension and segmentation.

Describe the solution you'd like

I propose the integration of Set-of-Mark Prompting with the OS mode in Open Interpreter. This enhancement would enable more effective handling of multi-modal tasks and leverage the existing capabilities of Open Interpreter in a more versatile and powerful way.

Describe alternatives you've considered

No response

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions