Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLAVA code #1219

Open
sameeravithana opened this issue Mar 25, 2022 · 4 comments
Open

FLAVA code #1219

sameeravithana opened this issue Mar 25, 2022 · 4 comments

Comments

@sameeravithana
Copy link

In the original FLAVA paper [1], it cited MMF for the implementation. We want to check whether we can access the FLAVA implementation in this codebase.

[1] Singh, Amanpreet, et al. "FLAVA: A Foundational Language And Vision Alignment Model." arXiv preprint arXiv:2112.04482 (2021).

@apsdehal
Copy link
Contributor

Hi,

The FLAVA codebase is on track to be released via torchmultimodal library. I will reply back to this issue by end of this week with further instructions.

@PeterDykas
Copy link

Why is there going to be two different repositories for multi-modal models. What is the difference going to be between TorchMultimodal and mmf?

@kartikayk
Copy link

Thanks for the question! We will have more detailed communication around this, but a quick note here. MMF currently supports text + image understanding tasks with some initial support for video understanding models added recently. We have received feedback from the community that MMF is slowly becoming over-engineered and the layers of inheritance is making it hard to use components outside of MMF. It’s also getting harder to add support for new tasks (eg: generation), support recent trends like model scaling, and extending to new modalities (audio for example).

As we rethink the Multimodal ecosystem in PyTorch, we will look to evolve MMF into a library for text + image understanding (refactor the models to be Pytorch components, deprecate the trainers and config systems etc) and provide more general support for combining modalities and tasks through TorchMultimodal. Our goal is to provide a collection of examples in TorchMultimodal that bring together components and infrastructure from all over the ecosystem, including MMF, for training multitask multimodal models at scale. As such, TorchMultimodal is designed with extensibility and composability in mind which makes adding new modalities (and tasks) or reusing components in other frameworks easy. The first example of this is the official release of FLAVA in TorchMultimodal. We don’t plan on adding this to MMF.

As I mentioned, we will share a more detailed communication around this soon!

@PeterDykas
Copy link

Thanks for the reply, that makes sense. Looking forward for the FLAVA implementation in TorchMultimodal
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants