-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize multimodality #1741
Generalize multimodality #1741
Conversation
This is miracle-tier. |
This is looking good so far. |
I think it's ready now, sorry for the PR size |
Sorry for taking so long to review this. This is one of those foundational PRs that are of major importance but don't receive immediate attention from users. Probably because of your modest title ;) I have tested all pipelines and everything just worked:
I have made some minor changes:
Thanks a lot for yet another brilliant PR! |
This is a POC for #1687. I based it on this PR: #1664, so it has a few extra commits (I left llava extension here for reference, but it's going to get deprecated by multimodality)
Short description
The main gist of it, is to provide an unified framework for multimodal pipelines - those which still use LLM with only weights/biases/added tokens, not LLMs with added layers for multimodality. This is achieved by adding a new extension - called
multimodality
.The hooks to text-generation-webui are the same as in llava, but I no longer use
custom_generate_chat_prompt
, instead providing a new extension hooktokenized_length
- to be used instead oflen(encode(prompt))
, as the tokenizer extensions can modify the number of tokens.Working principle
Multimodality extension does most of the stuff which is required for any image input:
--multimodal-pipeline
parameterNow, for the pipelines, they:
Pipelines
All of the pipelines should subclass
AbstractMultimodalPipeline
class. The idea is to allow for new pipelines to be added in the same way as user extensions - git clone intoextensions/multimodal/pipelines
.For the POC I'm providing 2 built in pipelines:
llava-13b
- for LLaVA v0 13B, for examplewojtab/llava-13b-v0-4bit-128g
llava-7b
- for LLaVA v0 7b, for examplewojtab/llava-7b-v0-4bit-128g
And 1 pipeline outside the repository:
minigpt4-13b
- for MiniGPT-4 13B, to run it:extensions/multimodal/pipelines
anon8231489123/vicuna-13b-GPTQ-4bit-128g
Pipeline modules
All of the pipeline modules should have a
pipelines.py
file, which has the following fields:available_pipelines: List[str]
- list of pipelines provided by this module, show only to the userdef get_pipeline(name: str, params: dict) -> Optional[AbstractMultimodalPipeline]:
- a function to get a concrete pipeline by name, ifname
doesn't match any, should returnNone
.params
is the user settings for multimodal extensiondef get_pipeline_from_model_name(model_name: str, params: dict) -> Optional[AbstractMultimodalPipeline]:
- a function to get a pipeline frommodel_name
, should be eager to return None, unless the determination can be done clearly (for example: minigpt-4 bases on vicuna - it should never return the pipeline, but llava can, as it has its own specific LLM finetune)A pipeline module should lazy-import the pipelines only when necessary, and it should keep its imports to minimum
Example
Looking for feedback