# MMF Colab Demo

This notebook provides step-by-step instructions on how to use MMF to build new models and uses the Hateful Memes (HM) dataset for this specific tutorial.

Follow these links to learn more about MMF:
- [MMF Blog Post]()
- [GitHub repo](https://github.com/facebookresearch/mmf)
- [Website](https://mmf.sh) and [Documentation](https://mmf.rtfd.io)

In general, the notebook demonstrates how to:

1. [Download MMF](#scrollTo=l7Eo9ZqTDW3I)
2. [Download the HM dataset](#scrollTo=nYyXt9dzEBEU&line=12&uniqifier=1)
3. [Test pretrained models on HM](#scrollTo=nYyXt9dzEBEU&line=12&uniqifier=1)
4. [Submit a prediction](#scrollTo=uhKvYHtWHlyr&line=3&uniqifier=1)
5. [Train existing model on HM](#scrollTo=) 
6. [Build your model](#scrollTo=)
7. [Train your model on HM](#scrollTo=) 

## Download MMF

In this section, we will download the MMF package and required dependencies.

### Prerequisites 
Please enable GPU in this notebook: Runtime > Change runtime type > Hardware Accelerator > Set to GPU

First we will install the MMF package and required dependencies

Install from source [Recommended]

https://mmf.sh/docs/challenges/hateful_memes_challenge/#predicting-for-phase-1

In [None]:
!git clone https://github.com/facebookresearch/mmf.git

In [None]:
%cd /content/mmf/
!pip install --editable .

## Download dataset

We will now download the Hateful Memes dataset. You will require two things to download the datasets: (i) URL (ii) Password to the zip file. To get both of these follow these steps:

1. Go to [DrivenData challenge page](https://www.drivendata.org/competitions/64/hateful-memes/)
2. Register, read and acknowledge the agreements for data access.
3. Go to the [data page](https://www.drivendata.org/competitions/64/hateful-memes/data), right click on the "Hateful Memes challenge dataset" link and "Copy Link Address" as shown in the image. This will copy the URL for the zip file to your clipboard which you will use in the next step.
![data](https://i.imgur.com/JQx2hPm.png)
4. Also, note the password provided in the description.
5. Run the next code block, fill in the URL and the zipfile's password when prompted.

The code blocks after that will download, convert and visualize the dataset.

Hateful Memes: Phase 1

https://www.drivendata.org/competitions/64/hateful-memes/data/

url: https://drivendata-competition-fb-hateful-memes-data.s3.amazonaws.com/XjiOc5ycDBRRNwbhRlgH.zip?AWSAccessKeyId=AKIARVBOBDCY4MWEDJKS&Signature=FpmkioFlEFPvW%2FMtmwfZIgJ%2BGCE%3D&Expires=1618941090

password: EWryfbZyNviilcDF

In [None]:
# from getpass import getpass, getuser
# url = getpass("Enter the Hateful Memes data URL:")
# password = getpass("Enter ZIP file's Password:")

url='https://drivendata-competition-fb-hateful-memes-data.s3.amazonaws.com/XjiOc5ycDBRRNwbhRlgH.zip?AWSAccessKeyId=AKIARVBOBDCY4MWEDJKS&Signature=FpmkioFlEFPvW%2FMtmwfZIgJ%2BGCE%3D&Expires=1618941090'
password='EWryfbZyNviilcDF'

This will actually download the data.

In [None]:
!curl -o /content/hm.zip "$url" -H 'Referer: https://www.drivendata.org/competitions/64/hateful-memes/data/' --compressed

The next command will convert the zip file into required MMF format.

In [None]:
!mmf_convert_hm --zip_file=/content/hm.zip --password=EWryfbZyNviilcDF --bypass_checksum=1

## Test pretrained model

We will now use MMF to load an existing model MMBT to run some tests on random images from the internet. Fill in the image url and the text contained in it to see if the model thinks of it as hateful or not.

In [None]:
import matplotlib.pyplot as plt
import requests
from PIL import Image
from mmf.models.mmbt import MMBT

model = MMBT.from_pretrained("mmbt.hateful_memes.images")

In [None]:
image_url = "https://i.imgur.com/tEcsk5q.jpg" #@param {type:"string"}
text = "look how many people love you" #@param {type: "string"}
output = model.classify(image_url, text)
plt.imshow(Image.open(requests.get(image_url, stream=True).raw))
plt.axis("off")
plt.show()
hateful = "Yes" if output["label"] == 1 else "No"
print("Hateful as per the model?", hateful)
print(f"Model's confidence: {output['confidence'] * 100:.3f}%")

## Submit a prediction

Now, we will use a pretrained model from MMF to submit a prediction to DrivenData. Run the command in the next block and at the end it will output the path to the csv file generated. Download and upload that file to [DrivenData's submission page](https://www.drivendata.org/competitions/64/hateful-memes/submissions/).

In [None]:
!mmf_predict config=projects/hateful_memes/configs/mmbt/defaults.yaml \
  model=mmbt \
  dataset=hateful_memes \
  run_type=test \
  checkpoint.resume_pretrained=False \
  dataset_config.hateful_memes.annotations.val[0]=hateful_memes/defaults/annotations/dev_seen.jsonl \
  dataset_config.hateful_memes.annotations.test[0]=hateful_memes/defaults/annotations/test_seen.jsonl

## Train an existing model

We will use MMF to train an existing baseline from MMF's model zoo on the Hateful Memes dataset. Run the next code cell to start training MMBT-Grid model on the dataset. You can adjust the batch size, maximum number of updates, log and evaluation interval among other things by using command line overrides. Read more about MMF's configuration system at https://github.com/facebookresearch/mmf/tree/master/projects/hateful_memes#reproducing-baselines

In [None]:
!mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml \
  model=mmbt \
  dataset=hateful_memes \
  training.log_interval=50 \
  training.max_updates=3000 \
  training.batch_size=16 \
  training.evaluation_interval=500

## Build your own model

Using MMF's encoders, modules and utilities, we can easily build a custom model. In this example, we are building a fusion model which fuses ResNet pooled grid features with fasttext embedding vectors to classify a meme as hateful or not hateful. 

Steps involved in building the model are:

1. Create a new processor to get fasttext sentence embeddings. (Read more on processors [here]())
2. Create new model using encoders from MMF.
3. Move hardcoded stuff from model to configuration.

In [None]:
import torch 

# We will inherit the FastText Processor already present in MMF
from mmf.datasets.processors import FastTextProcessor
# registry is needed to register processor and model to be MMF discoverable
from mmf.common.registry import registry

# Register the processor so that MMF can discover it
@registry.register_processor("fasttext_sentence_vector")
class FastTextSentenceVectorProcessor(FastTextProcessor):
    # Override the call method
    def __call__(self, item):
        # This function is present in FastTextProcessor class and loads
        # fasttext bin
        self._load_fasttext_model(self.model_file)
        if "text" in item:
            text = item["text"]
        elif "tokens" in item:
            text = " ".join(item["tokens"])

        # Get a sentence vector for sentence and convert it to torch tensor
        sentence_vector = torch.tensor(
            self.model.get_sentence_vector(text),
            dtype=torch.float
        )

        # Return back a dict
        return {
            "text": sentence_vector
        }
    
    # Make dataset builder happy, return a random number
    def get_vocab_size(self):
        return None

In [None]:
import torch

# registry is need to register our new model so as to be MMF discoverable
from mmf.common.registry import registry
# All model using MMF need to inherit BaseModel
from mmf.models.base_model import BaseModel
# ProjectionEmbedding will act as proxy encoder for FastText Sentence Vector
from mmf.modules.embeddings import ProjectionEmbedding
# Builder methods for image encoder and classifier
from mmf.utils.build import build_classifier_layer, build_image_encoder

# Register the model for MMF, "concat_vl" key would be used to find the model
@registry.register_model("concat_vl")
class LanguageAndVisionConcat(BaseModel):
    # All models in MMF get first argument as config which contains all
    # of the information you stored in this model's config (hyperparameters)
    def __init__(self, config, *args, **kwargs):
        # This is not needed in most cases as it just calling parent's init
        # with same parameters. But to explain how config is initialized we 
        # have kept this
        super().__init__(config, *args, **kwargs)
    
    # This classmethod tells MMF where to look for default config of this model
    @classmethod
    def config_path(cls):
        # Relative to user dir root
        return "/content/hm_example_mmf/configs/models/concat_vl.yaml"
    
    # Each method need to define a build method where the model's modules
    # are actually build and assigned to the model
    def build(self):
        """
        Config's image_encoder attribute will used to build an MMF image
        encoder. This config in yaml will look like:

        # "type" parameter specifies the type of encoder we are using here. 
        # In this particular case, we are using resnet152
        type: resnet152
      
        # Parameters are passed to underlying encoder class by 
        # build_image_encoder
        params:
          # Specifies whether to use a pretrained version
          pretrained: true 
          # Pooling type, use max to use AdaptiveMaxPool2D
          pool_type: avg 
      
          # Number of output features from the encoder, -1 for original
          # otherwise, supports between 1 to 9
          num_output_features: 1 
        """
        self.vision_module = build_image_encoder(self.config.image_encoder)

        """
        For classifer, configuration would look like:
        # Specifies the type of the classifier, in this case mlp
        type: mlp
        # Parameter to the classifier passed through build_classifier_layer
        params:
          # Dimension of the tensor coming into the classifier
          in_dim: 512
          # Dimension of the tensor going out of the classifier
          out_dim: 2
          # Number of MLP layers in the classifier
          num_layers: 0
        """
        self.classifier = build_classifier_layer(self.config.classifier)
        
        # ProjectionEmbeddings takes in params directly as it is module
        # So, pass in kwargs, which are in_dim, out_dim and module
        # whose value would be "linear" as we want linear layer
        self.language_module = ProjectionEmbedding(
            **self.config.text_encoder.params
        )
        # Dropout value will come from config now
        self.dropout = torch.nn.Dropout(self.config.dropout)
        # Same as Projection Embedding, fusion's layer params (which are param 
        # for linear layer) will come from config now
        self.fusion = torch.nn.Linear(**self.config.fusion.params)
        self.relu = torch.nn.ReLU()

    # Each model in MMF gets a dict called sample_list which contains
    # all of the necessary information returned from the image
    def forward(self, sample_list):
        # Text input features will be in "text" key
        text = sample_list["text"]
        # Similarly, image input will be in "image" key
        image = sample_list["image"]

        text_features = self.relu(self.language_module(text))
        image_features = self.relu(self.vision_module(image))

        # Concatenate the features returned from two modality encoders
        combined = torch.cat([text_features, image_features.squeeze()], dim=1)

        # Pass through the fusion layer, relu and dropout
        fused = self.dropout(self.relu(self.fusion(combined)))

        # Pass final tensor from classifier to get scores
        logits = self.classifier(fused)

        # For loss calculations (automatically done by MMF based on loss defined
        # in the config), we need to return a dict with "scores" key as logits
        output = {"scores": logits}

        # MMF will automatically calculate loss
        return output

Now, we will install the example repo that we have already created on top of MMF and contains code in this colab. We do this so that we don't have to build configs again from scratch

In [None]:
!git clone https://github.com/apsdehal/hm_example_mmf /content/hm_example_mmf

## Train your model

In this step, we will train the model we just built. A dot list can be passed as either a dict or a list to the run to override the configuration parameters.

In [None]:
import sys
from mmf_cli.run import run
opts = opts=[
    "config='/content/hm_example_mmf/configs/experiments/defaults.yaml'", 
    "model=concat_vl", 
    "dataset=hateful_memes", 
    "training.num_workers=0"
]
run(opts=opts)

## Using your module

Since, we have cloned the repo that contains the example we built in this colab notebook we can use it also to run the training from command line by using the `env.user_dir` option or by overriding the environment variable `MMF_USER_DIR`. Expand the cell below the next code cell to see how it can be done.

In [None]:
!MMF_USER_DIR="/content/hm_example_mmf" mmf_run \
  config="configs/experiments/defaults.yaml" \
  model=concat_vl \
  dataset=hateful_memes \
  training.num_workers=0

## Conclusion and Further Steps

In this colab notebook, we learned how we can use MMF to train and predict already existing models in MMF's zoo. We also learned how we can build custom models using various modules and goodies provided in MMF easily.

If you have any issues, feedback or comments, please reach us out at mmf@fb.com or open up an issue at [GitHub](https://github.com/facebookresearch/mmf/issues/new/choose). We are also accepting PRs if you want to add your cool model to MMF and we are always open to community contributions.

At Facebook AI, we’ll continuously improve and expand on the multimodal capabilities available through MMF, and we welcome contributions from the community as well to build this resource. We hope MMF will be the framework of choice and be a catalyst for research in this area by providing a powerful, versatile platform for multimodal research. 