Source Code for Non-i.i.d. Image Models
This package provides the Matlab source code for training and extracting Fisher vectors of Latent Bag-of-Words (LatBoW) and Latent Mixture-of-Gaussians (LatMoG) models, as explained in
- Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid, “Approximate Fisher Kernels of non-iid Image Models for Image Categorization”, accepted for publication in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2015,
- Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid, “Image categorization using Fisher kernels of non-iid image models”, in IEEE Conference on Computer Vision & Pattern Recognition (CVPR), Providence, USA, June 2012.
[genm,eprm] = fv_fisher_latentgmm_variationalestimate(p,fvbase,N,D,K,)
to train a generative model
genm using variational expectation-maximization (EM) procedure. This function also returns the posteriors
eprm for the training examples.
[eprm] = fv_fisher_latentgmm_estep(fvbase,N,D,K,,genm,p)
to estimate posteriors on test images.
[desc] = fv_fisher_latentgmm_grads(N,D,K,gradopt,genm,eprm)
to extract LatMoG Fisher vectors. Use
grapopt=alpha for the LatBoW model and
grapopt=all for the LatMoG model to get all Fisher vector components. In our experiments, we then apply per-dimension whitening and L2 normalization to the resulting descriptors, as explained in the paper.
In order to utilize the aforementioned functions, the following per-image statistics over local descriptors should to be provided in the
|E_x||(N D K)||E_x(j,:,k)=sum_i( p(k|x_i) * x_i ) / sum_i( p(k|x_i) ), over x_i \in image_j|
|E_x2||(N D K)||E_x2(j,:,k)=sum_i( p(k|x_i) * x_i^2 ) / sum_i( p(k|x_i) )|
|counts||(N K)||counts(j,k)=sum_i( p(k|x_i) ), over x_i \in image_j|
x_ii-th local descriptor in an image.
N: number of training images
D: local descriptor dimensionality
K: vocabulary size
E_x2fields are not needed when training LatBoW models.
A couple of other options should be provided in the
p struct. See
example.m for default values and their explanations.
A final note: When using spatial grids, we train per-cell models independently.