Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenFlamingo #2237

Merged
merged 44 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
915f7cc
add openflamingo
michiyasunaga Jan 14, 2024
367aa30
fix ver
michiyasunaga Jan 14, 2024
799f41f
fix ver
michiyasunaga Jan 14, 2024
0b7bcb4
fix ver
michiyasunaga Jan 14, 2024
70a8e34
fix ver
michiyasunaga Jan 14, 2024
1eef0a5
fix ver
michiyasunaga Jan 14, 2024
abe19e3
fix ver
michiyasunaga Jan 14, 2024
b221cbf
fix ver
michiyasunaga Jan 14, 2024
63011e7
fix ver
michiyasunaga Jan 14, 2024
7729aca
add openflamingo
michiyasunaga Jan 14, 2024
1d23a9a
add openflamingo
michiyasunaga Jan 14, 2024
d27893a
add openflamingo
michiyasunaga Jan 14, 2024
c9fd286
add openflamingo
michiyasunaga Jan 14, 2024
5367f05
add openflamingo
michiyasunaga Jan 14, 2024
0afa051
add openflamingo
michiyasunaga Jan 14, 2024
014aada
add openflamingo
michiyasunaga Jan 14, 2024
1bee70c
add openflamingo
michiyasunaga Jan 14, 2024
3613907
fix GHA build - define openflamingo dependencies
teetone Jan 16, 2024
1affed4
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone Jan 16, 2024
9c1ad0a
address code review
michiyasunaga Jan 20, 2024
1ca8408
fix transformers version
michiyasunaga Feb 8, 2024
292456e
Merge branch 'main' into michi_openflamingo
michiyasunaga Feb 8, 2024
12d535e
merge main
JosselinSomervilleRoberts Feb 22, 2024
84cc573
Add some parameters to the model deployment
JosselinSomervilleRoberts Feb 22, 2024
3756295
Fixing einops dependency conflict
JosselinSomervilleRoberts Feb 22, 2024
9dc70dc
Remove duplicated crfm-helm['image'] dependency
JosselinSomervilleRoberts Feb 22, 2024
6f531ac
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone Feb 26, 2024
8c44df7
more logging for model init
teetone Feb 27, 2024
1fd7961
fix token init in openflamingo
teetone Feb 27, 2024
472bacb
fix token init in openflamingo
teetone Feb 27, 2024
5a64bb3
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone Feb 28, 2024
65e2950
resolve
teetone Feb 28, 2024
0ea9ced
fix tokenizer
teetone Feb 28, 2024
6128ad2
update conf
teetone Feb 28, 2024
163b13c
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone Feb 28, 2024
e51e550
Merge branch 'main' of https://github.com/stanford-crfm/benchmarking …
teetone Feb 28, 2024
865b33a
disable temporarily
teetone Feb 28, 2024
a4f0586
resolve merge conflicts
teetone Mar 4, 2024
8f0e763
undo
teetone Mar 4, 2024
e2404a9
fix paths
teetone Mar 4, 2024
83eaefa
get in-context learning examples to work
teetone Mar 4, 2024
1404de4
fix decoding
teetone Mar 4, 2024
ac19049
fix sequence construction
teetone Mar 4, 2024
8c6cdcb
include num_completions in cache key
teetone Mar 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ dacite==1.6.0
datasets==2.5.2
dill==0.3.5.1
distlib==0.3.6
einops==0.7.0
einops-exts==0.0.4
emoji==2.1.0
et-xmlfile==1.1.0
exceptiongroup==1.1.0
Expand Down Expand Up @@ -89,6 +91,7 @@ nodeenv==1.7.0
numba==0.56.4
numpy==1.23.3
openai==0.27.8
open-clip-torch==2.24.0
opencv-python==4.8.1.78
openpyxl==3.0.10
outcome==1.2.0
Expand Down Expand Up @@ -168,7 +171,7 @@ torchvision==0.13.1 ; sys_platform == "darwin"
torch==1.12.1+cu113 ; sys_platform == "linux"
torchvision==0.13.1+cu113 ; sys_platform == "linux"
tqdm==4.64.1
transformers==4.36.0
transformers==4.32.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you revert this? This is going to break Llava

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! This was a bit tricky. OpenFlamingo seems to require 4.32.0; I tried 4.36.0, but encountered errors like ImportError: cannot import name '_expand_mask' from 'transformers.models.bloom.modeling_bloom' (similar to salesforce/LAVIS#571).

trio==0.22.0
trio-websocket==0.9.2
typer==0.4.2
Expand Down Expand Up @@ -196,4 +199,4 @@ zipp==3.11.0
zope.event==4.5.0
zope.interface==5.4.0
zstandard==0.18.0
fairlearn==0.9.0
fairlearn==0.9.0
13 changes: 10 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ install_requires=
scikit-learn~=1.1.2

# Models and Metrics Extras
transformers~=4.36.0 # For anthropic_client, vision_language.huggingface_vlm_client, huggingface_client, huggingface_tokenizer, test_openai_token_cost_estimator, model_summac (via summarization_metrics)
transformers>=4.28.0 # For anthropic_client, vision_language.huggingface_vlm_client, huggingface_client, huggingface_tokenizer, test_openai_token_cost_estimator, model_summac (via summarization_metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, we need transformers>=4.36.0 here

# TODO: Upgrade torch - we need > 2.0.0 for newer versions of transformers
torch>=1.12.1,<3.0.0 # For huggingface_client, yalm_tokenizer, model_summac (via summarization_metrics)
torchvision>=0.13.1,<3.0.0 # For huggingface_client, yalm_tokenizer, model_summac (via summarization_metrics)
Expand Down Expand Up @@ -135,7 +135,13 @@ models =
crfm-helm[yandex]

vlm =
torch~=2.1.2 # For IDEFICS
# For OpenFlamingo
einops~=0.7.0
einops-exts~=0.0.4
open-clip-torch~=2.24.0

# For IDEFICS
torch~=2.1.2

heim =
# HEIM scenarios
Expand Down Expand Up @@ -223,6 +229,7 @@ exclude =
venv/*
src/helm/proxy/clients/image_generation/dalle_mini/*
src/helm/proxy/clients/image_generation/mindalle/*
src/helm/proxy/clients/vision_language/open_flamingo/*

# Ignore completely:
# E203 - White space before ':', (conflicts with black)
Expand All @@ -240,7 +247,7 @@ check_untyped_defs = True
disable_error_code = annotation-unchecked
# TODO: Change disallow_untyped_defs to True
disallow_untyped_defs = False
exclude = dalle_mini|mindalle
exclude = dalle_mini|mindalle|open_flamingo

[tool:pytest]
addopts =
Expand Down
5 changes: 4 additions & 1 deletion src/helm/benchmark/model_metadata_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,13 @@
IDEFICS_INSTRUCT_MODEL_TAG: str = "IDEFICS_INSTRUCT_MODEL_TAG"
# Llava should use a special prompt format (see `LlavaRunExpander`)
LLAVA_MODEL_TAG: str = "LLAVA_MODEL_TAG"

# OpenFlamingo
OPEN_FLAMINGO_MODEL_TAG: str = "OPEN_FLAMINGO_MODEL_TAG"

# Frozen is set to false as the model_deployment_registry.py file
# might populate the deployment_names field.


@dataclass(frozen=False)
class ModelMetadata:
name: str
Expand Down
24 changes: 24 additions & 0 deletions src/helm/benchmark/run_expander.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,30 @@ def expand(self, run_spec: RunSpec) -> List[RunSpec]:
]


class OpenFlamingoRunExpander(RunExpander):
"""
Custom prompt for OpenFlamingo models.
See https://huggingface.co/openflamingo/OpenFlamingo-9B-vitl-mpt7b for more information.
"""

name = "open_flamingo"

def expand(self, run_spec: RunSpec) -> List[RunSpec]:
return [
replace(
run_spec,
name=run_spec.name,
adapter_spec=replace(
run_spec.adapter_spec,
input_prefix="",
input_suffix="",
output_prefix="",
output_suffix="",
),
),
]


michiyasunaga marked this conversation as resolved.
Show resolved Hide resolved
class FormatPromptRunExpander(RunExpander):
"""Adds a prefix and suffix to the prompt."""

Expand Down
6 changes: 6 additions & 0 deletions src/helm/benchmark/run_specs.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
GoogleRunExpander,
IDEFICSInstructRunExpander,
LlavaRunExpander,
OpenFlamingoRunExpander,
StopRunExpander,
ChatMLRunExpander,
IncreaseTemperatureRunExpander,
Expand Down Expand Up @@ -66,6 +67,7 @@
GOOGLE_GEMINI_MODEL_TAG,
IDEFICS_INSTRUCT_MODEL_TAG,
LLAVA_MODEL_TAG,
OPEN_FLAMINGO_MODEL_TAG,
NO_NEWLINES_TAG,
NLG_PREFIX_TAG,
CHATML_MODEL_TAG,
Expand Down Expand Up @@ -3093,6 +3095,10 @@ def alter_run_spec(run_spec: RunSpec) -> RunSpec:
if LLAVA_MODEL_TAG in model.tags:
run_spec = singleton(LlavaRunExpander().expand(run_spec))

# OpenFlamingo
if OPEN_FLAMINGO_MODEL_TAG in model.tags:
run_spec = singleton(OpenFlamingoRunExpander().expand(run_spec))

michiyasunaga marked this conversation as resolved.
Show resolved Hide resolved
# For multiple choice
if BUGGY_TEMP_0_TAG in model.tags and run_spec.adapter_spec.temperature == 0:
increase_temperature_expander = IncreaseTemperatureRunExpander(value=1e-4)
Expand Down
8 changes: 8 additions & 0 deletions src/helm/config/model_deployments.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,14 @@ model_deployments:
max_sequence_length: 2048
client_spec:
class_name: "helm.proxy.clients.vision_language.huggingface_vlm_client.HuggingFaceVLMClient"

## OpenFlamingo
- name: openflamingo/OpenFlamingo-9B-vitl-mpt7b
model_name: openflamingo/OpenFlamingo-9B-vitl-mpt7b
tokenizer_name: anas-awadalla/mpt-7b
max_sequence_length: 2048
client_spec:
class_name: "helm.proxy.clients.vision_language.open_flamingo_client.OpenFlamingoClient"

## Mistral AI
- name: huggingface/bakLlava-v1-hf
Expand Down
11 changes: 10 additions & 1 deletion src/helm/config/model_metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1092,7 +1092,16 @@ models:
num_parameters: 13000000000
release_date: 2023-10-05
tags: [VISION_LANGUAGE_MODEL_TAG, LLAVA_MODEL_TAG]



- name: openflamingo/OpenFlamingo-9B-vitl-mpt7b
display_name: OpenFlamingo (9B)
description: OpenFlamingo is an open source implementation of DeepMind's Flamingo models. This 9B-parameter model uses a CLIP ViT-L/14 vision encoder and MPT-7B language model. ([paper](https://arxiv.org/abs/2308.01390))
creator_organization_name: OpenFlamingo
access: open
num_parameters: 9000000000
release_date: 2023-08-02
tags: [VISION_LANGUAGE_MODEL_TAG, OPEN_FLAMINGO_MODEL_TAG]


# 01.AI
Expand Down
6 changes: 6 additions & 0 deletions src/helm/config/tokenizer_configs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,12 @@ tokenizer_configs:
class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer"
end_of_text_token: "</s>"
prefix_token: "<s>"

- name: anas-awadalla/mpt-7b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from the pre-existing mpt tokenizer (which is the same as EleutherAI/gpt-neox-20b)?

tokenizer_spec:
class_name: "helm.proxy.tokenizers.huggingface_tokenizer.HuggingFaceTokenizer"
end_of_text_token: "<|endoftext|>"
prefix_token: ""

# Huggingface
- name: huggingface/gpt2
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .src.flamingo import Flamingo
from .src.factory import create_model_and_transforms
Empty file.
143 changes: 143 additions & 0 deletions src/helm/proxy/clients/vision_language/open_flamingo/src/factory.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
from typing import Optional

from transformers import AutoModelForCausalLM, AutoTokenizer

from helm.common.general import handle_module_not_found_error
from .flamingo import Flamingo
from .flamingo_lm import FlamingoLMMixin
from .utils import extend_instance


def create_model_and_transforms(
clip_vision_encoder_path: str,
clip_vision_encoder_pretrained: str,
lang_encoder_path: str,
tokenizer_path: str,
cross_attn_every_n_layers: int = 1,
use_local_files: bool = False,
decoder_layers_attr_name: str = None,
freeze_lm_embeddings: bool = False,
cache_dir: Optional[str] = None,
**flamingo_kwargs,
):
"""
Initialize a Flamingo model from a pretrained vision encoder and language encoder.
Appends special tokens to the tokenizer and freezes backbones.

Args:
clip_vision_encoder_path (str): path to pretrained clip model (e.g. "ViT-B-32")
clip_vision_encoder_pretrained (str): name of pretraining dataset for clip model (e.g. "laion2b_s32b_b79k")
lang_encoder_path (str): path to pretrained language encoder
tokenizer_path (str): path to pretrained tokenizer
cross_attn_every_n_layers (int, optional): determines how often to add a cross-attention layer. Defaults to 1.
use_local_files (bool, optional): whether to use local files. Defaults to False.
decoder_layers_attr_name (str, optional): name of the decoder layers attribute. Defaults to None.
freeze_lm_embeddings (bool, optional): whether to freeze LM input embeddings when configuring Perceiver.
cache_dir (str, optional): path to cache directory for downloading OpenClip/HF weights.
Returns:
Flamingo: Flamingo model from pretrained vision and language encoders
Image processor: Pipeline to preprocess input images
Tokenizer: A tokenizer for the language model
"""
try:
import open_clip
except ModuleNotFoundError as e:
handle_module_not_found_error(e, ["vlm"])

vision_encoder, _, image_processor = open_clip.create_model_and_transforms(
clip_vision_encoder_path,
pretrained=clip_vision_encoder_pretrained,
cache_dir=cache_dir,
)
# set the vision encoder to output the visual features
vision_encoder.visual.output_tokens = True

text_tokenizer = AutoTokenizer.from_pretrained(
tokenizer_path,
local_files_only=use_local_files,
trust_remote_code=True,
cache_dir=cache_dir,
)
# add Flamingo special tokens to the tokenizer
text_tokenizer.add_special_tokens({"additional_special_tokens": ["<|endofchunk|>", "<image>"]})
if text_tokenizer.pad_token is None:
# Issue: GPT models don't have a pad token, which we use to
# modify labels for the loss.
text_tokenizer.add_special_tokens({"pad_token": "<PAD>"})

lang_encoder = AutoModelForCausalLM.from_pretrained(
lang_encoder_path,
local_files_only=use_local_files,
trust_remote_code=True,
cache_dir=cache_dir,
)

# hacks for MPT-1B, which doesn't have a get_input_embeddings method
if "mpt-1b-redpajama-200b" in lang_encoder_path:

class EmbeddingFnMixin:
def get_input_embeddings(self):
return self.transformer.wte

def set_input_embeddings(self, new_embeddings):
self.transformer.wte = new_embeddings

extend_instance(lang_encoder, EmbeddingFnMixin)

# convert LM to FlamingoLM
extend_instance(lang_encoder, FlamingoLMMixin)

if decoder_layers_attr_name is None:
decoder_layers_attr_name = _infer_decoder_layers_attr_name(lang_encoder)
lang_encoder.set_decoder_layers_attr_name(decoder_layers_attr_name)
lang_encoder.resize_token_embeddings(len(text_tokenizer))

model = Flamingo(
vision_encoder,
lang_encoder,
text_tokenizer.encode("<|endofchunk|>")[-1],
text_tokenizer.encode("<image>")[-1],
vis_dim=open_clip.get_model_config(clip_vision_encoder_path)["vision_cfg"]["width"],
cross_attn_every_n_layers=cross_attn_every_n_layers,
**flamingo_kwargs,
)

# Freeze all parameters
model.requires_grad_(False)
assert sum(p.numel() for p in model.parameters() if p.requires_grad) == 0

# Unfreeze perceiver, gated_cross_attn_layers, and LM input embeddings
model.perceiver.requires_grad_(True)
model.lang_encoder.gated_cross_attn_layers.requires_grad_(True)
if not freeze_lm_embeddings:
model.lang_encoder.get_input_embeddings().requires_grad_(True)
# TODO: investigate also training the output embeddings when untied

print(
f"Flamingo model initialized with {sum(p.numel() for p in model.parameters() if p.requires_grad)} trainable parameters"
)

return model, image_processor, text_tokenizer


def _infer_decoder_layers_attr_name(model):
for k in __KNOWN_DECODER_LAYERS_ATTR_NAMES:
if k.lower() in model.__class__.__name__.lower():
return __KNOWN_DECODER_LAYERS_ATTR_NAMES[k]

raise ValueError(
"We require the attribute name for the nn.ModuleList in the decoder storing the transformer block layers. "
"Please supply this string manually."
)


__KNOWN_DECODER_LAYERS_ATTR_NAMES = {
"opt": "model.decoder.layers",
"gptj": "transformer.h",
"gpt-j": "transformer.h",
"pythia": "gpt_neox.layers",
"llama": "model.layers",
"gptneoxforcausallm": "gpt_neox.layers",
"mpt": "transformer.blocks",
"mosaicgpt": "transformer.blocks",
}
Loading
Loading