Visual Question Answering Plugin

Updates

2024-05-07: Major updates:
- Added support for Moondream2 model.
- Added support for reading question from field on the sample.
- Added support for storing the answer in a field on the sample.
- Added support for applying to all samples in the current view (one at a time).
- Added support for delegated execution.
- Added support for Python operator execution.
2024-05-03: @harpreetsahota204 added support for Idefics-8b model from Replicate.
2023-10-24: Added support for Llava-13b and Fuyu-8b models from Replicate.

Plugin Overview

This plugin is a Python plugin that allows you to answer visual questions about images in your dataset!

Supported Models

This version of the plugin supports the following models:

Fuyu-8b from Adept AI (via Replicate)
Llava-13b (via Replicate)
ViLT (default Vision Language Transformer used in the Visual Question Answering pipeline)
BLIPv2 (via Replicate)
Idefics2-8b from Hugging Face (via Replicate)
Moondream2 via Hugging Face Transformers and via Replicate

Feel free to fork this plugin and add support for other models!

Watch On Youtube

Installation

Pre-requisites

If you plan to use it, install the Hugging Face transformers library:

pip install transformers

If you plan to use it, install the Replicate library:

pip install replicate

And add your Replicate API key to your environment:

export REPLICATE_API_TOKEN=<your-api-token>

Install the plugin

fiftyone plugins download https://github.com/jacobmarks/vqa-plugin

Operators

`answer_visual_question`

Applies the selected visual question answering model to the selected sample in your dataset and outputs the answer.

Usage

The recommended interactive way to use this plugin is in the FiftyOne App with exactly one sample selected.

Python Operator Execution

If you want to loop over samples in your dataset or view, you may be interested in using the Python operator execution mode.

import fiftyone as fo
import fiftyone.operators as foo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart", max_samples=5)

## Access the operator via its URI (plugin name + operator name)
vqa = foo.get_operator("@jacobmarks/vqa/answer_visual_question")

## Apply the operator to the dataset
vqa(
    dataset,
    model_name="llava",
    question="Describe the image",
    answer_field="llava_answer",
)

## Print the answers
print(dataset.values("llava_answer"))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Question Answering Plugin

Updates

Plugin Overview

Supported Models

Watch On Youtube

Installation

Pre-requisites

Install the plugin

Operators

`answer_visual_question`

Usage

Python Operator Execution

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Question Answering Plugin

Updates

Plugin Overview

Supported Models

Watch On Youtube

Installation

Pre-requisites

Install the plugin

Operators

answer_visual_question

Usage

Python Operator Execution

`answer_visual_question`