-
2024-05-07: Major updates:
- Added support for Moondream2 model.
- Added support for reading question from field on the sample.
- Added support for storing the answer in a field on the sample.
- Added support for applying to all samples in the current view (one at a time).
- Added support for delegated execution.
- Added support for Python operator execution.
-
2024-05-03: @harpreetsahota204 added support for Idefics-8b model from Replicate.
-
2023-10-24: Added support for Llava-13b and Fuyu-8b models from Replicate.
This plugin is a Python plugin that allows you to answer visual questions about images in your dataset!
This version of the plugin supports the following models:
- Fuyu-8b from Adept AI (via Replicate)
- Llava-13b (via Replicate)
- ViLT (default Vision Language Transformer used in the Visual Question Answering pipeline)
- BLIPv2 (via Replicate)
- Idefics2-8b from Hugging Face (via Replicate)
- Moondream2 via Hugging Face Transformers and via Replicate
Feel free to fork this plugin and add support for other models!
- If you plan to use it, install the Hugging Face transformers library:
pip install transformers
- If you plan to use it, install the Replicate library:
pip install replicate
And add your Replicate API key to your environment:
export REPLICATE_API_TOKEN=<your-api-token>
fiftyone plugins download https://github.com/jacobmarks/vqa-plugin
- Applies the selected visual question answering model to the selected sample in your dataset and outputs the answer.
The recommended interactive way to use this plugin is in the FiftyOne App with exactly one sample selected.
If you want to loop over samples in your dataset or view, you may be interested in using the Python operator execution mode.
import fiftyone as fo
import fiftyone.operators as foo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart", max_samples=5)
## Access the operator via its URI (plugin name + operator name)
vqa = foo.get_operator("@jacobmarks/vqa/answer_visual_question")
## Apply the operator to the dataset
vqa(
dataset,
model_name="llava",
question="Describe the image",
answer_field="llava_answer",
)
## Print the answers
print(dataset.values("llava_answer"))