<a href="https://colab.research.google.com/github/miayuehan/hm_detection/blob/main/MMF_unimodal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# **Install MMF**
Please enable GPU in this notebook: Runtime > Change runtime type > Hardware Accelerator > Set to GPU
(more details in [MMF Colab Demo](https://colab.research.google.com/github/facebookresearch/mmf/blob/notebooks/notebooks/mmf_hm_example.ipynb#scrollTo=1nwebqtdWOfZ))

In [None]:
pwd

In [None]:
!pip uninstall -y mmf

In [None]:
!pip install git+https://luckyyc12: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@github.com/luckyyc12/mmf.git

In [None]:
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

# **Dataset**

### **Convert dataset zip file into required MMF format**

In [None]:
!mmf_convert_hm --zip_file /content/drive/MyDrive/CS7643_final_project/hateful_memes.zip --password ' ' --bypass_checksum=1

### **Visualize**

Below code block will output some samples of the dataset for visualization. You can adjust number of samples, rows and size among other stuff.

**Note 1**: In some particular images, colab version of matplotlib can cause issues, so we will upgrade it and restart the runtime to load new version.

**Note 2**: Some of the images in the hateful memes dataset are sensitive and may not be suitable for all audiences. Please run the next code responsibly keeping these conditions in mind.

In [None]:
!pip install --upgrade matplotlib==3.3.4

In [None]:
 !pip install --upgrade Pillow

In [None]:
from mmf.common.registry import registry
from mmf.models.mmbt import MMBT
from mmf.utils.build import build_dataset
from mmf.utils.env import setup_imports

setup_imports()
dataset = build_dataset("hateful_memes")

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20, 20)
dataset.visualize(num_samples=8, size=(512, 512), nrow=4)

# **Test pretrained model**
We will now use MMF to load an existing model MMBT to run some tests on random images from the internet. Fill in the image url and the text contained in it to see if the model thinks of it as hateful or not.

In [None]:
import matplotlib.pyplot as plt
import requests

from PIL import Image

from mmf.models.mmbt import MMBT

model = MMBT.from_pretrained("mmbt.hateful_memes.images")

In [None]:
image_url = "https://i.imgur.com/tEcsk5q.jpg" #@param {type:"string"}
text = "look how many people love you" #@param {type: "string"}
output = model.classify(image_url, text)
plt.imshow(Image.open(requests.get(image_url, stream=True).raw))
plt.axis("off")
plt.show()
hateful = "Yes" if output["label"] == 1 else "No"
print("Hateful as per the model?", hateful)
print(f"Model's confidence: {output['confidence'] * 100:.3f}%")

# **Submit a prediction**

Now, we will use a pretrained model from MMF to submit a prediction to DrivenData. Run the command in the next block and at the end it will output the path to the csv file generated. Download and upload that file to DrivenData's submission page.

In [None]:
!mmf_predict config=projects/hateful_memes/configs/mmbt/defaults.yaml model=mmbt dataset=hateful_memes run_type=test checkpoint.resume_zoo=mmbt.hateful_memes.images training.batch_size=16

# **Train an existing model**

We will use MMF to train an existing baseline from MMF's model zoo on the Hateful Memes dataset. Run the next code cell to start training MMBT-Grid model on the dataset. You can adjust the batch size, maximum number of updates, log and evaluation interval among other things by using command line overrides. Read more about MMF's configuration system at https://mmf.readthedocs.io/en/latest/notes/configuration.html.

In [None]:
!mmf_run config=projects/hateful_memes/configs/mmf_bert/defaults.yaml \
  model=mmf_bert \
  dataset=hateful_memes \
  training.log_interval=50 \
  training.max_updates=300 \
  training.batch_size=16 \
  training.evaluation_interval=500 \
  trainer.params.gpus=100

In [None]:
!mmf_run config=projects/hateful_memes/configs/mmbt/defaults.yaml \
  model=mmbt \
  dataset=hateful_memes \
  training.log_interval=50 \
  training.max_updates=3000 \
  training.batch_size=16 \
  training.evaluation_interval=500 \
  trainer.params.gpus=100

# **Build your own model**

Using MMF's encoders, modules and utilities, we can easily build a custom model. In this example, we are building a fusion model which fuses ResNet pooled grid features with fasttext embedding vectors to classify a meme as hateful or not hateful.

Steps involved in building the model are:

Create a new processor to get fasttext sentence embeddings. (Read more on processors here)
Create new model using encoders from MMF.
Move hardcoded stuff from model to configuration.

In [None]:
import torch 

# We will inherit the FastText Processor already present in MMF
from mmf.datasets.processors import FastTextProcessor
# registry is needed to register processor and model to be MMF discoverable
from mmf.common.registry import registry


# Register the processor so that MMF can discover it
@registry.register_processor("fasttext_sentence_vector")
class FastTextSentenceVectorProcessor(FastTextProcessor):
    # Override the call method
    def __call__(self, item):
        # This function is present in FastTextProcessor class and loads
        # fasttext bin
        self._load_fasttext_model(self.model_file)
        if "text" in item:
            text = item["text"]
        elif "tokens" in item:
            text = " ".join(item["tokens"])

        # Get a sentence vector for sentence and convert it to torch tensor
        sentence_vector = torch.tensor(
            self.model.get_sentence_vector(text),
            dtype=torch.float
        )

        # Return back a dict
        return {
            "text": sentence_vector
        }
    
    # Make dataset builder happy, return a random number
    def get_vocab_size(self):
        return None

In [None]:
import torch

# registry is need to register our new model so as to be MMF discoverable
from mmf.common.registry import registry
# All model using MMF need to inherit BaseModel
from mmf.models.base_model import BaseModel
# ProjectionEmbedding will act as proxy encoder for FastText Sentence Vector
from mmf.modules.embeddings import ProjectionEmbedding
# Builder methods for image encoder and classifier
from mmf.utils.build import build_classifier_layer, build_image_encoder

# Register the model for MMF, "concat_vl" key would be used to find the model
@registry.register_model("concat_vl")
class LanguageAndVisionConcat(BaseModel):
    # All models in MMF get first argument as config which contains all
    # of the information you stored in this model's config (hyperparameters)
    def __init__(self, config, *args, **kwargs):
        # This is not needed in most cases as it just calling parent's init
        # with same parameters. But to explain how config is initialized we 
        # have kept this
        super().__init__(config, *args, **kwargs)
    
    # This classmethod tells MMF where to look for default config of this model
    @classmethod
    def config_path(cls):
        # Relative to user dir root
        return "/content/hm_example_mmf/configs/models/concat_vl.yaml"
    
    # Each method need to define a build method where the model's modules
    # are actually build and assigned to the model
    def build(self):
        """
        Config's image_encoder attribute will used to build an MMF image
        encoder. This config in yaml will look like:

        # "type" parameter specifies the type of encoder we are using here. 
        # In this particular case, we are using resnet152
        type: resnet152
      
        # Parameters are passed to underlying encoder class by 
        # build_image_encoder
        params:
          # Specifies whether to use a pretrained version
          pretrained: true 
          # Pooling type, use max to use AdaptiveMaxPool2D
          pool_type: avg 
      
          # Number of output features from the encoder, -1 for original
          # otherwise, supports between 1 to 9
          num_output_features: 1 
        """
        self.vision_module = build_image_encoder(self.config.image_encoder)

        """
        For classifer, configuration would look like:
        # Specifies the type of the classifier, in this case mlp
        type: mlp
        # Parameter to the classifier passed through build_classifier_layer
        params:
          # Dimension of the tensor coming into the classifier
          in_dim: 512
          # Dimension of the tensor going out of the classifier
          out_dim: 2
          # Number of MLP layers in the classifier
          num_layers: 0
        """
        self.classifier = build_classifier_layer(self.config.classifier)
        
        # ProjectionEmbeddings takes in params directly as it is module
        # So, pass in kwargs, which are in_dim, out_dim and module
        # whose value would be "linear" as we want linear layer
        self.language_module = ProjectionEmbedding(
            **self.config.text_encoder.params
        )
        # Dropout value will come from config now
        self.dropout = torch.nn.Dropout(self.config.dropout)
        # Same as Projection Embedding, fusion's layer params (which are param 
        # for linear layer) will come from config now
        self.fusion = torch.nn.Linear(**self.config.fusion.params)
        self.relu = torch.nn.ReLU()

    # Each model in MMF gets a dict called sample_list which contains
    # all of the necessary information returned from the image
    def forward(self, sample_list):
        # Text input features will be in "text" key
        text = sample_list["text"]
        # Similarly, image input will be in "image" key
        image = sample_list["image"]

        text_features = self.relu(self.language_module(text))
        image_features = self.relu(self.vision_module(image))

        # Concatenate the features returned from two modality encoders
        combined = torch.cat([text_features, image_features.squeeze(dim=1)], dim=1)

        # Pass through the fusion layer, relu and dropout
        fused = self.dropout(self.relu(self.fusion(combined)))

        # Pass final tensor from classifier to get scores
        logits = self.classifier(fused)

        # For loss calculations (automatically done by MMF based on loss defined
        # in the config), we need to return a dict with "scores" key as logits
        output = {"scores": logits}

        # MMF will automatically calculate loss
        return output

Now, we will install the example repo that we have already created on top of MMF and contains code in this colab. We do this so that we don't have to build configs again from scratch

In [None]:
!git clone https://github.com/apsdehal/hm_example_mmf /content/hm_example_mmf

# **Train your model**

In this step, we will train the model we just built. A dot list can be passed as either a dict or a list to the run to override the configuration parameters.

In [None]:
import sys
from mmf_cli.run import run
from mmf.common.registry import registry

registry.mapping["state"] = {}
opts = opts=[
    "config='/content/hm_example_mmf/configs/experiments/defaults.yaml'", 
    "model=concat_vl", 
    "dataset=hateful_memes", 
    "training.num_workers=0",
]
run(opts=opts)

# **Using your module**
Since, we have cloned the repo that contains the example we built in this colab notebook we can use it also to run the training from command line by using the env.user_dir option or by overriding the environment variable MMF_USER_DIR. Expand the cell below the next code cell to see how it can be done.

In [None]:
!MMF_USER_DIR="/content/hm_example_mmf" mmf_run \
  config="configs/experiments/defaults.yaml" \
  model=concat_vl \
  dataset=hateful_memes \
  training.num_workers=0

# **BERT + Classfier**
- Bert only
- MLP classifier with configurable #layers and hidden units

In [None]:
!git clone https://github.com/luckyyc12/mmf.git

In [None]:
!mmf_run config=projects/hateful_memes/configs/unimodal/with_features.yaml \
  model=unimodal_image \
  dataset=hateful_memes \
  training.log_interval=100 \
  training.max_updates=1500 \
  training.batch_size=64 \
  training.evaluation_interval=500 \
  trainer.params.gpus=200

In [None]:
!mmf_run config=projects/hateful_memes/configs/unimodal/bert.yaml \
  model=unimodal_text \
  dataset=hateful_memes \
  training.log_interval=1000 \
  # training.max_updates=1500 \
  # training.batch_size=64 \
  training.evaluation_interval=5000 \
  trainer.params.gpus=100

In [None]:
!mmf_run config=projects/hateful_memes/configs/vilbert/from_cc.yaml \
  model=vilbert \
  dataset=hateful_memes \
  training.log_interval=1000 \
  training.max_updates=10000 \
  training.evaluation_interval=2500 \
  trainer.params.gpus=100

  #  # training.batch_size=64 \

In [None]:
!mmf_run config=projects/hateful_memes/configs/unimodal/image.yaml \
  model=unimodal_image \
  dataset=hateful_memes \
  training.log_interval=1000 \
  training.max_updates=10000 \
  training.evaluation_interval=2500 \
  trainer.params.gpus=100

  #  # training.batch_size=64 \

In [None]:
!mmf_run config=projects/hateful_memes/configs/visual_bert/from_coco.yaml \
  model=visual_bert \
  dataset=hateful_memes \
  training.log_interval=2000 \
  training.max_updates=10000 \
  # training.batch_size=64 \
  training.evaluation_interval=2000 \
  trainer.params.gpus=100