## Example usage of the CBP_KRP algorithm

**Original publication:** López-Sánchez, Daniel, Angélica González Arrieta, and Juan M. Corchado. "Compact bilinear pooling via kernelized random projection for fine-grained image categorization on low computational power devices." Neurocomputing 398 (2020): 411-421.

The main motivation of the CBP_KRP method is to achieve fast inference times when running on low computational
power devices. In this repo, two implementations of CBP_KRP are provided. The first one is a toch-based implementation designed to run efficiently on GPU devices. Additionally, we provide a scipy/numpy-based implementation, optimized for fast inference on low computational power devices. The workflow poposed in the paper involves training/fine-tuning your models end-to-end using the torch implementation, and they deploying the models
to low computational power devices using the scipy/numpy-based implementation of CBP_KRP, which takes advantage of several
computational tricks such as the use of sparse matrix multiplication routines.

Some of the tricks described in the paper are not really effective when running on powerful desktop
CPUs, where parallelization can play a much larger role than the computational tricks of CBP_KRP. To simulate CPU inference on a low computational power device, one can set some environment variables to limit
the number of threads used by lineal algebra libraries.

In [84]:
import os

# To simulate CPU inference on a low computational power device, we can set these
# env variables to limit the number of threads used by numpy.
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

In [85]:
import torch
import numpy as np
from CBP_KRP import CBP_KRP, CBP_KRP_cpu_inference

Create an example batch with random data:

In [92]:
BATCH_SIZE = 32
CHANNELS = 1024
FEATURE_MAP_HEIGHT = 50
FEATURE_MAP_WIDTH = 50

example_batch = torch.Tensor(np.random.rand(BATCH_SIZE, CHANNELS, FEATURE_MAP_HEIGHT, FEATURE_MAP_WIDTH))
example_batch_numpy = example_batch.detach().numpy()

Create a CBP_KRP module for GPU inference, and run a forward pass on the example batch. See the original paper for a description of the hyper-parameters of the algorithm. 

We keep the resulting descriptors for comparison with the cpu inference implementation:

In [93]:
S = 100 # Sparsity level of the random vectors
K = 512 # Number of features of the output descriptor
P = 5000 # Total number of unique random vectors used by CBP_KRP
T = 2 # Number of vectors to sum for the CLT

cbp_krp = CBP_KRP(input_dim=INPUT_DIM, k=K, s=S, p=P, t=T)

compact_bilinear_pooling_descriptors_torch = cbp_krp.forward(example_batch)

Create a CBP_KRP module for CPU inference in low computational power devices. For initialization, we use the random
vectors and indexes created by the GPU instance of CBP_KRP, so the results are the same. Note that this CPU implemantation uses several tricks for efficient inference in low computational power devices, such as the use of sparse matrix multiplication routines.

Run a forward pass
on the example batch and keep the resulting descriptors for comparison with the GPU inference implementation.

In [94]:
cbp_krp_cpu = CBP_KRP_cpu_inference(random_vectors=cbp_krp.random_vectors.detach().numpy(),
                                    random_indexes=cbp_krp.random_indexes,
                                    s=cbp_krp.s, use_sparse_matrix_multiplication=True)

compact_bilinear_pooling_descriptors_cpu = cbp_krp_cpu.forward(example_batch_numpy)

Up to numerical precission errors, the results of the toch and numpy/scipy implementations should be the same:

In [95]:
print("Some GPU results: ", compact_bilinear_pooling_descriptors_torch[0, :4])
print("Some CPU results: ", compact_bilinear_pooling_descriptors_cpu[0, :4])

Some GPU results:  tensor([-56082.5078,  11413.1699,  -9906.0254, -14039.8086])
Some CPU results:  [-56082.47680664  11413.16070557  -9906.02264404 -14039.81323242]


 In a device with low computational power (e.g., few cores), the use of sparse matrix multiplication
 routines can make a big difference:

In [96]:
cbp_krp_cpu = CBP_KRP_cpu_inference(random_vectors=cbp_krp.random_vectors.detach().numpy(),
                                    random_indexes=cbp_krp.random_indexes,
                                    s=cbp_krp.s, use_sparse_matrix_multiplication=True)

%timeit compact_bilinear_pooling_descriptors_cpu = cbp_krp_cpu.forward(example_batch_numpy)

3.19 s ± 61.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [97]:
cbp_krp_cpu = CBP_KRP_cpu_inference(random_vectors=cbp_krp.random_vectors.detach().numpy(),
                                    random_indexes=cbp_krp.random_indexes,
                                    s=cbp_krp.s, use_sparse_matrix_multiplication=False)

%timeit compact_bilinear_pooling_descriptors_cpu = cbp_krp_cpu.forward(example_batch_numpy)

8.28 s ± 130 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
