Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix flava link #1302

Merged
merged 1 commit into from
Feb 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _posts/2022-11-17-introducing-torchmultimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ We are announcing TorchMultimodal Beta, a PyTorch domain library for training So

## Why TorchMultimodal?

Interest is rising around AI models that understand multiple input types (text, images, videos and audio signals), and optionally use this understanding to generate different forms of outputs (sentences, pictures, videos). Recent work from FAIR such as [FLAVA](https://arxiv.org/abs/2112.04482), [Omnivore](https://arxiv.org/pdf/2201.08377.pdf) and [data2vec](https://arxiv.org/abs/2202.03555) have shown that [multimodal models for understanding](https://ai.facebook.com/blog/advances-in-multimodal-understanding-research-at-meta-ai/) are competitive with unimodal counterparts, and in some cases are establishing the new state-of-the art. Generative models such as [Make-a-video](https://ai.facebook.com/blog/generative-ai-text-to-video/) and [Make-a-scene](https://ai.facebook.com/blog/greater-creative-control-for-ai-image-generation/) are redefining what modern AI systems can do.
Interest is rising around AI models that understand multiple input types (text, images, videos and audio signals), and optionally use this understanding to generate different forms of outputs (sentences, pictures, videos). Recent work from FAIR such as [FLAVA](https://github.com/facebookresearch/multimodal/tree/main/examples/flava), [Omnivore](https://arxiv.org/pdf/2201.08377.pdf) and [data2vec](https://arxiv.org/abs/2202.03555) have shown that [multimodal models for understanding](https://ai.facebook.com/blog/advances-in-multimodal-understanding-research-at-meta-ai/) are competitive with unimodal counterparts, and in some cases are establishing the new state-of-the art. Generative models such as [Make-a-video](https://ai.facebook.com/blog/generative-ai-text-to-video/) and [Make-a-scene](https://ai.facebook.com/blog/greater-creative-control-for-ai-image-generation/) are redefining what modern AI systems can do.

As interest in multimodal AI has grown, researchers are looking for tools and libraries to quickly experiment with ideas, and build on top of the latest research in the field. While the PyTorch ecosystem has a rich repository of libraries and frameworks, it’s not always obvious how components from these interoperate with each other, or how they can be stitched together to build SoTA multimodal models.

Expand Down Expand Up @@ -39,7 +39,7 @@ TorchMultimodal is a PyTorch domain library for training multi-task multimodal m

- **[Examples](https://github.com/facebookresearch/multimodal/tree/main/examples)**. A collection of examples that show how to combine these building blocks with components and common infrastructure (Lightning, TorchMetrics) from across the PyTorch Ecosystem to replicate state-of-the-art models published in literature. We currently provide five examples, which include.

- [FLAVA](https://arxiv.org/abs/2112.04482) \[[paper](https://arxiv.org/abs/2112.04482)\]. Official code for the paper accepted at CVPR, including a tutorial on finetuning FLAVA.
- [FLAVA](https://github.com/facebookresearch/multimodal/tree/main/examples/flava) \[[paper](https://github.com/facebookresearch/multimodal/tree/main/examples/flava)\]. Official code for the paper accepted at CVPR, including a tutorial on finetuning FLAVA.

- [MDETR](https://github.com/facebookresearch/multimodal/tree/main/examples/mdetr) \[[paper](https://arxiv.org/abs/2104.12763)\]. Collaboration with authors from NYU to provide an example which alleviates interoperability pain points in the PyTorch ecosystem, including a [notebook](https://github.com/facebookresearch/multimodal/blob/main/examples/mdetr/MDETRTutorial.ipynb) on using MDETR for phrase grounding and visual question answering.

Expand Down