Skip to content

Research issue: gather examples of multi-modal API calls from different LLMs #557

Open
@simonw

Description

@simonw

To aid in the design for both of these:

I'm going to gather a bunch of examples of how different LLMs accept multi-modal inputs. I'm particularly interested in the following:

  • What kind of files do they accept?
  • Do they accept file uploads, base64 inline files, URL references or a selection?
  • How are these interspersed with text prompts? This will help inform the database schema design for Design new LLM database schema #556
  • If included with a text prompt does it go before or after the files?
  • How many files can be attached at once?
  • Is extra information such as the mimetype needed? If so, this helps inform how the CLI design looks (can I do --file filename.ext or do I need some other mechanism that helps provide the type as well?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions