VLIS: Unimodal Language Models Guide Multimodal Language Generation

This repository contains the code for our EMNLP 2023 paper: VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung, Youngjae Yu

Paper & Presentation slides (tbu)

VLIS per backbone models

VLIS is a decoding-time method agnostic of the backbone model. We support inference UI code for three difference recent vision-language backbones: BLIP-2, LLAVA, and Lynx

Preparation

For python dependencies, we provide two options:

A monolithic conda environment for all models (conda env create -f env.yaml).
Per-model requirements.txt file.

Other than that, please look into per-model instructions (e.g. code/blip2/PREPARATION.md) for preparation.

Usage

Use the provided UI script to test VLIS per backbone.

For example, when using VLIS with LLaVA:

code code/llava
python ui.py --lm_name 'lmsys/vicuna-7b-v1.5'

The Gradio UI supports widely used text generation hyperparameters, including temperature, num_beams, top_p, max_length.

It would be straightforward to convert the given UI code into an offline script for data evaluation. If you want inference script for a particular dataset, please feel free to open an issue.

Landmark and Character benchmarks

We assess visual specificity of visual-language models on named entity in appendix A of our paper. Here, we provide the data and evaluation code to replicate our experiments.

Preparation

Use code/data/character_urls.json and code/data/landmark_urls.json to download the images, respectively.

Evaluation

The model generated output files should be structured as follows:

OUTPUT_FILE

{
  DATA_ID: {
    MODELNAME: OUTPUT
  }
}

We provide example output files in code/data/example_landmark.json. Then, run the evaluation script to get the statistics.

python code/data/landmark_eval.py --path OUTPUT_FILE

Contact

Jiwan Chung: jiwan.chung.research@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
assets		assets
code		code
.gitignore		.gitignore
README.md		README.md
env.yaml		env.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

code

code

.gitignore

.gitignore

README.md

README.md

env.yaml

env.yaml

Repository files navigation

VLIS: Unimodal Language Models Guide Multimodal Language Generation

VLIS per backbone models

Preparation

Usage

Landmark and Character benchmarks

Preparation

Evaluation

Contact

About

Releases

Packages

Languages

JiwanChung/vlis

Folders and files

Latest commit

History

Repository files navigation

VLIS: Unimodal Language Models Guide Multimodal Language Generation

VLIS per backbone models

Preparation

Usage

Landmark and Character benchmarks

Preparation

Evaluation

Contact

About

Resources

Stars

Watchers

Forks

Languages