Skip to content

i-chenene/FiLM-VQA

Repository files navigation

FiLM — Feature-wise Linear Modulation

Implementation of Perez et al. (2017) with an interactive Streamlit application covering three use cases: visual question answering on Sort-of-CLEVR and CLEVR, and artistic style transfer via Conditional Instance Normalisation.

Python PyTorch Streamlit


What is FiLM?

A common challenge in deep learning is conditioning — adapting a network's behavior based on external information (a question, a style, a class label).

FiLM addresses this with a simple and general idea: instead of concatenating the context to the inputs, it transforms it into scale and shift parameters γ and β that directly modulate the CNN feature maps:

$$\text{FiLM}(F_{i,c}) = \gamma_{i,c} \cdot F_{i,c} + \beta_{i,c}$$

  • γ amplifies, reduces or suppresses a feature map
  • β shifts activations up or down
  • Both are produced by a lightweight network (the FiLM generator) from the conditioning input (e.g. the question)

In practice we use $\gamma = 1 + \Delta\gamma$ so the model starts close to identity and avoids gradient issues at the start of training.


Quickstart

pip install -r requirements.txt
python -m streamlit run app.py

Everything runs from the interface — no further terminal interaction needed.


Data & Weights

Data and model weights are not included in the repo (too large). They are hosted on Google Drive and can be downloaded directly from the app.

(Google Drive)

Dataset Size How to get it
Sort-of-CLEVR ~200 MB Button in the app
Style Transfer ~400 MB Button in the app
CLEVR VQA ~18 GB Manual (see below)

App Pages

Sort-of-CLEVR

2D Kaggle dataset of colored shapes with 11 answer classes. The question is encoded in 10 dimensions and passed to the FiLM generator, which modulates the CNN feature maps. You can train from scratch, load a pretrained model, and test visually on generated scenes.

CLEVR VQA

Full implementation of the paper's architecture. The dataset is ~18 GB so interactive training is not available — the app displays the learning curves from our own run (~40k iterations).

To reproduce:

# Preprocess questions
python -m clevr.scripts.preprocess_questions \
  --input_questions_json CLEVR_v1.0/questions/CLEVR_train_questions.json \
  --output_h5_file clevr/data/train_questions.h5 \
  --output_vocab_json clevr/data/vocab.json

# Extract ResNet101 features
python -m clevr.scripts.extract_features \
  --data-dir clevr/data/clevr --split train

# Train
python -m clevr.scripts.train_model --model_type FiLM \
  --checkpoint_path clevr/data/film_checkpoint.pth \
  --batch_size 64 --num_iterations 100000 --loader_num_workers 0

Style Transfer

Implementation of Ghiasi et al. (2017) — same FiLM conditioning idea applied to artistic style via Conditional Instance Normalisation. 6 styles available with interactive inference from the app.


Results

Dataset Validation Accuracy
Sort-of-CLEVR ~94% (10 epochs)
CLEVR VQA (our run, 40k iters) ~51%

The gap with the paper on CLEVR VQA is mainly due to a reduced hidden_dim (256 vs 4096) and a limited number of iterations.


Repository Structure

FiLMProjet/
├── app.py
├── pages/
│   ├── 0_Présentation.py
│   ├── 1_Sort_of_CLEVR.py
│   ├── 2_CLEVR_VQA.py
│   └── 3_Style_Transfer.py
├── sortofclevr/          # dataset, model, training
├── style_transfer/       # dataset, model, training
├── clevr/
│   ├── core/             # data, embedding, preprocess, utils
│   ├── models/           # film_net, film_gen, baselines, layers
│   ├── scripts/          # train, preprocess, extract features
│   └── data/             # vocab, h5 questions, result logs
├── assets/
└── requirements.txt

References


Authors

  • Iliès Chenene
  • Valentin Porlier

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages