Skip to content

Commit

Permalink
init vision_text_dual_encoder
Browse files Browse the repository at this point in the history
  • Loading branch information
patil-suraj committed Sep 10, 2021
1 parent 1c191ef commit c131df7
Show file tree
Hide file tree
Showing 14 changed files with 2,556 additions and 0 deletions.
147 changes: 147 additions & 0 deletions docs/source/model_doc/vision_text_dual_encoder.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

VisionTextDualEncoder
-----------------------------------------------------------------------------------------------------------------------

Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The VisionTextDualEncoder model was proposed in `<INSERT PAPER NAME HERE> <<INSERT PAPER LINK HERE>>`__ by <INSERT
AUTHORS HERE>. <INSERT SHORT SUMMARY HERE>

The abstract from the paper is the following:

*<INSERT PAPER ABSTRACT HERE>*

Tips:

<INSERT TIPS ABOUT MODEL HERE>

This model was contributed by `<INSERT YOUR HF USERNAME HERE> <https://huggingface.co/<INSERT YOUR HF USERNAME
HERE>>`__. The original code can be found `here <<INSERT LINK TO GITHUB REPO HERE>>`__.

VisionTextDualEncoderConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderConfig
:members:


VisionTextDualEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary


VisionTextDualEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderTokenizerFast
:members:


VisionTextDualEncoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderModel
:members: forward


VisionTextDualEncoderForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForCausalLM
:members: forward


VisionTextDualEncoderForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForMaskedLM
:members: forward


VisionTextDualEncoderForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForSequenceClassification
:members: forward


VisionTextDualEncoderForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForMultipleChoice
:members: forward


VisionTextDualEncoderForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForTokenClassification
:members: forward


VisionTextDualEncoderForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.VisionTextDualEncoderForQuestionAnswering
:members: forwardFlaxVisionTextDualEncoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderModel
:members: call


FlaxVisionTextDualEncoderForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForMaskedLM
:members: call


FlaxVisionTextDualEncoderForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForCausalLM
:members: call


FlaxVisionTextDualEncoderForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForSequenceClassification
:members: call


FlaxVisionTextDualEncoderForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForMultipleChoice
:members: call


FlaxVisionTextDualEncoderForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForTokenClassification
:members: call


FlaxVisionTextDualEncoderForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: transformers.FlaxVisionTextDualEncoderForQuestionAnswering
:members: call
68 changes: 68 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,11 @@
"TransfoXLCorpus",
"TransfoXLTokenizer",
],
"models.vision_text_dual_encoder": [
"VISION_TEXT_DUAL_ENCODER_PRETRAINED_CONFIG_ARCHIVE_MAP",
"VisionTextDualEncoderConfig",
"VisionTextDualEncoderTokenizer",
],
"models.visual_bert": ["VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "VisualBertConfig"],
"models.vit": ["VIT_PRETRAINED_CONFIG_ARCHIVE_MAP", "ViTConfig"],
"models.wav2vec2": [
Expand Down Expand Up @@ -365,6 +370,7 @@
# tokenizers-backed objects
if is_tokenizers_available():
# Fast tokenizers
_import_structure["models.vision_text_dual_encoder"].append("VisionTextDualEncoderTokenizerFast")
_import_structure["models.roformer"].append("RoFormerTokenizerFast")
_import_structure["models.clip"].append("CLIPTokenizerFast")
_import_structure["models.convbert"].append("ConvBertTokenizerFast")
Expand Down Expand Up @@ -521,6 +527,7 @@
_import_structure["modeling_utils"] = ["Conv1D", "PreTrainedModel", "apply_chunking_to_forward", "prune_layer"]

# PyTorch models structure

_import_structure["models.albert"].extend(
[
"ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1148,6 +1155,21 @@
"load_tf_weights_in_transfo_xl",
]
)
_import_structure["models.vision_text_dual_encoder"].extend(
[
"VISION_TEXT_DUAL_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST",
"VisionTextDualEncoderForCausalLM",
"VisionTextDualEncoderForMaskedLM",
"VisionTextDualEncoderForMultipleChoice",
"VisionTextDualEncoderForQuestionAnswering",
"VisionTextDualEncoderForSequenceClassification",
"VisionTextDualEncoderForTokenClassification",
"VisionTextDualEncoderLayer",
"VisionTextDualEncoderModel",
"VisionTextDualEncoderPreTrainedModel",
"load_tf_weights_in_vision_text_dual_encoder",
]
)
_import_structure["models.visual_bert"].extend(
[
"VISUAL_BERT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -1723,6 +1745,7 @@
)

# Flax models structure

_import_structure["models.bart"].extend(
[
"FlaxBartForConditionalGeneration",
Expand Down Expand Up @@ -1824,6 +1847,19 @@
]
)
_import_structure["models.t5"].extend(["FlaxT5ForConditionalGeneration", "FlaxT5Model", "FlaxT5PreTrainedModel"])
_import_structure["models.vision_text_dual_encoder"].extend(
[
"FlaxVisionTextDualEncoderForCausalLM",
"FlaxVisionTextDualEncoderForMaskedLM",
"FlaxVisionTextDualEncoderForMultipleChoice",
"FlaxVisionTextDualEncoderForQuestionAnswering",
"FlaxVisionTextDualEncoderForSequenceClassification",
"FlaxVisionTextDualEncoderForTokenClassification",
"FlaxVisionTextDualEncoderLayer",
"FlaxVisionTextDualEncoderModel",
"FlaxVisionTextDualEncoderPreTrainedModel",
]
)
_import_structure["models.vit"].extend(["FlaxViTForImageClassification", "FlaxViTModel", "FlaxViTPreTrainedModel"])
_import_structure["models.wav2vec2"].extend(
["FlaxWav2Vec2ForCTC", "FlaxWav2Vec2ForPreTraining", "FlaxWav2Vec2Model", "FlaxWav2Vec2PreTrainedModel"]
Expand Down Expand Up @@ -2049,6 +2085,11 @@
TransfoXLCorpus,
TransfoXLTokenizer,
)
from .models.vision_text_dual_encoder import (
VISION_TEXT_DUAL_ENCODER_PRETRAINED_CONFIG_ARCHIVE_MAP,
VisionTextDualEncoderConfig,
VisionTextDualEncoderTokenizer,
)
from .models.visual_bert import VISUAL_BERT_PRETRAINED_CONFIG_ARCHIVE_MAP, VisualBertConfig
from .models.vit import VIT_PRETRAINED_CONFIG_ARCHIVE_MAP, ViTConfig
from .models.wav2vec2 import (
Expand Down Expand Up @@ -2177,6 +2218,7 @@
from .models.splinter import SplinterTokenizerFast
from .models.squeezebert import SqueezeBertTokenizerFast
from .models.t5 import T5TokenizerFast
from .models.vision_text_dual_encoder import VisionTextDualEncoderTokenizerFast
from .models.xlm_roberta import XLMRobertaTokenizerFast
from .models.xlnet import XLNetTokenizerFast
from .tokenization_utils_fast import PreTrainedTokenizerFast
Expand Down Expand Up @@ -2223,6 +2265,7 @@
from .utils.dummy_timm_objects import *

if is_torch_available():

# Benchmarks
from .benchmark.benchmark import PyTorchBenchmark
from .benchmark.benchmark_args import PyTorchBenchmarkArguments
Expand Down Expand Up @@ -2787,6 +2830,19 @@
TransfoXLPreTrainedModel,
load_tf_weights_in_transfo_xl,
)
from .models.vision_text_dual_encoder import (
VISION_TEXT_DUAL_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST,
VisionTextDualEncoderForCausalLM,
VisionTextDualEncoderForMaskedLM,
VisionTextDualEncoderForMultipleChoice,
VisionTextDualEncoderForQuestionAnswering,
VisionTextDualEncoderForSequenceClassification,
VisionTextDualEncoderForTokenClassification,
VisionTextDualEncoderLayer,
VisionTextDualEncoderModel,
VisionTextDualEncoderPreTrainedModel,
load_tf_weights_in_vision_text_dual_encoder,
)
from .models.visual_bert import ( # load_tf_weights_in_visual_bert,
VISUAL_BERT_PRETRAINED_MODEL_ARCHIVE_LIST,
VisualBertForMultipleChoice,
Expand Down Expand Up @@ -3228,6 +3284,7 @@
from .utils.dummy_tf_objects import *

if is_flax_available():

from .generation_flax_logits_process import (
FlaxForcedBOSTokenLogitsProcessor,
FlaxForcedEOSTokenLogitsProcessor,
Expand Down Expand Up @@ -3351,6 +3408,17 @@
FlaxRobertaPreTrainedModel,
)
from .models.t5 import FlaxT5ForConditionalGeneration, FlaxT5Model, FlaxT5PreTrainedModel
from .models.vision_text_dual_encoder import (
FlaxVisionTextDualEncoderForCausalLM,
FlaxVisionTextDualEncoderForMaskedLM,
FlaxVisionTextDualEncoderForMultipleChoice,
FlaxVisionTextDualEncoderForQuestionAnswering,
FlaxVisionTextDualEncoderForSequenceClassification,
FlaxVisionTextDualEncoderForTokenClassification,
FlaxVisionTextDualEncoderLayer,
FlaxVisionTextDualEncoderModel,
FlaxVisionTextDualEncoderPreTrainedModel,
)
from .models.vit import FlaxViTForImageClassification, FlaxViTModel, FlaxViTPreTrainedModel
from .models.wav2vec2 import (
FlaxWav2Vec2ForCTC,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
t5,
tapas,
transfo_xl,
vision_text_dual_encoder,
visual_bert,
vit,
wav2vec2,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
CONFIG_MAPPING_NAMES = OrderedDict(
[
# Add configs here
("vision_text_dual_encoder", "VisionTextDualEncoderConfig"),
("gptj", "GPTJConfig"),
("layoutlmv2", "LayoutLMv2Config"),
("beit", "BeitConfig"),
Expand Down Expand Up @@ -99,6 +100,7 @@
CONFIG_ARCHIVE_MAP_MAPPING_NAMES = OrderedDict(
[
# Add archive maps here
("vision_text_dual_encoder", "VISION_TEXT_DUAL_ENCODER_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("gptj", "GPTJ_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("layoutlmv2", "LAYOUTLMV2_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("beit", "BEIT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -163,6 +165,7 @@
MODEL_NAMES_MAPPING = OrderedDict(
[
# Add full (and cased) model names here
("vision_text_dual_encoder", "VisionTextDualEncoder"),
("gptj", "GPT-J"),
("beit", "BeiT"),
("rembert", "RemBERT"),
Expand Down
8 changes: 8 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
MODEL_MAPPING_NAMES = OrderedDict(
[
# Base model mapping
("vision_text_dual_encoder", "VisionTextDualEncoderModel"),
("gptj", "GPTJModel"),
("layoutlmv2", "LayoutLMv2Model"),
("beit", "BeitModel"),
Expand Down Expand Up @@ -136,6 +137,7 @@
MODEL_WITH_LM_HEAD_MAPPING_NAMES = OrderedDict(
[
# Model with LM heads mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForMaskedLM"),
("gptj", "GPTJForCausalLM"),
("rembert", "RemBertForMaskedLM"),
("roformer", "RoFormerForMaskedLM"),
Expand Down Expand Up @@ -185,6 +187,7 @@
MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = OrderedDict(
[
# Model for Causal LM mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForCausalLM"),
("gptj", "GPTJForCausalLM"),
("rembert", "RemBertForCausalLM"),
("roformer", "RoFormerForCausalLM"),
Expand Down Expand Up @@ -228,6 +231,7 @@
MODEL_FOR_MASKED_LM_MAPPING_NAMES = OrderedDict(
[
# Model for Masked LM mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForMaskedLM"),
("rembert", "RemBertForMaskedLM"),
("roformer", "RoFormerForMaskedLM"),
("big_bird", "BigBirdForMaskedLM"),
Expand Down Expand Up @@ -290,6 +294,7 @@
MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Sequence Classification mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForSequenceClassification"),
("gptj", "GPTJForSequenceClassification"),
("layoutlmv2", "LayoutLMv2ForSequenceClassification"),
("rembert", "RemBertForSequenceClassification"),
Expand Down Expand Up @@ -334,6 +339,7 @@
MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(
[
# Model for Question Answering mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForQuestionAnswering"),
("layoutlmv2", "LayoutLMv2ForQuestionAnswering"),
("rembert", "RemBertForQuestionAnswering"),
("canine", "CanineForQuestionAnswering"),
Expand Down Expand Up @@ -379,6 +385,7 @@
MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING_NAMES = OrderedDict(
[
# Model for Token Classification mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForTokenClassification"),
("layoutlmv2", "LayoutLMv2ForTokenClassification"),
("rembert", "RemBertForTokenClassification"),
("canine", "CanineForTokenClassification"),
Expand Down Expand Up @@ -412,6 +419,7 @@
MODEL_FOR_MULTIPLE_CHOICE_MAPPING_NAMES = OrderedDict(
[
# Model for Multiple Choice mapping
("vision_text_dual_encoder", "VisionTextDualEncoderForMultipleChoice"),
("rembert", "RemBertForMultipleChoice"),
("canine", "CanineForMultipleChoice"),
("roformer", "RoFormerForMultipleChoice"),
Expand Down
Loading

0 comments on commit c131df7

Please sign in to comment.