# Geodesic convolutional neural networks on Riemannian manifolds
by Jonathan Masci†∗ Davide Boscaini†∗ Michael M. Bronstein† Pierre Vandergheynst, USI, Lugano, Switzerland, EPFL, Lausanne, Switzerland
<a href="https://arxiv.org/abs/1501.06297">DOI: arXiv:1501.06297</a>

Author: Ian Wu

## Abstract

Geodesic convolutional neural netowrk (GCNN) is a new generalization of the convolutional networks (CNN) paradigm to non-Euclidean manifolds. It achieves state of the art performance for solving shape description, retrieval, and correspondance problems.

## I. Introduction
Shapes descriptors are essential for shape anlaysis. There are mainly two types of feature descriptors:

### Local feature descriptor
It is assigned to each point on the shape representing the local structure.
It is used in higher-level tasks such as establishing correspondence between shapes, shape retrieval, or segmentation.

### Global feature descripter
It is producde by aggregating local descriptors to describe the whole shape (ex: bag of features in NLP)


#### Intrinsic VS Extrinsic:
<figure>
<img src = 'img/Intrinsic_vs_Extrinsic.png' style="width:50%">
<figcaption align = "right"><b>Fig. 1 Intrinsic vs Extrinsic </b></figcaption>
</figure>


Previous Works focus on using **extrinsic structures** that are invariant under Euclidean transformation.
#### Examples:
[1] Spine Image Descriptors: It is a surface representation technique. It encodes global properties of any surface in an object-oriented coordinate system rather than in a viewer-oriented coordinate system.<br>
[2] Tntegral Volume Descriptor: It uses density as the integrand and mass as the integral. These features are used to compare objects represented as closed planar contours.


Newer works focus on **intrinsic structure** (geodesic distance) that are invariant under isometric deformations. <br>
#### Examples:
[3] SIFT: It uses keypoints to locate local features. These keypoints are scale & rotation invariant that can be used for image matching, object detection and scene detection.
[4] Heat Kernel Signature: HKS based  is based on the concept of heat diffusion over a surface. Given an initial heat distribution over the surface, the heat kernel h_{t}(x,y) relates the amount of heat transferred from x to y under after time t. The heat kernel is invariant under isometric transformations and stable under small perturbations to the isometry.<br>
[5] Wave Kernel Signatures: Similar to HKS, WKS uses Schrödinger wave equation instead of heat equation to compue the similarity between shapes.<br>

### Deep Learning Methods
Learning methods are becoming more and more popular in the field of 3D shape analysis. It is used extensively in in problems such as shape correspondence, similarity description, and retrieval. Intrinsic versions of CNNs that would allows dealing with shape deformations are difficult to formulate due to the lack of shift invariance on Riemannian manifolds

#### Examples:
[6] Spectral Networks And Deep Locally Connected Networks on Graphs: Efficient deep NN architecture that uses two constructions, one based upon a hierarchical clustering of the domain, and another based on the spectrum of the graph Laplacian.

This paper proposes **Geodesic Convolution Neural Network**, combining the benefits of **intrinsic sturctures** and **deep learning**. It is a extension of of the CNN paradigm to non-Euclidean manifolds based on local geodesic system of coordinates that are analogous to ‘patches’ in images.


## II. Background and Model

**Geodesic Convolution Neural Network** uses local system of geodesic polar coordinates at x
1. p-level set of geodesic distance function $d_{X} (x, \xi )$, truncated at $\rho_{0}$
2. points along geodesic $\Gamma_{\theta}(x)$ emanating from x in direction $\theta$

Local chart: bijective map <br>

$\Omega (x) : B_{\rho_{0}} (x) \rightarrow [0, \rho_{0}] \times [0, 2\Pi]$

from manifold to local coordinates
($\rho$, $\theta$) around x

**Patch operator** applied to $f \in L^2 (X)$ <br>
$(D(x)f)(\rho , \theta) = (f o \Omega ^(-1)(x))(\rho , \theta)$             
                
<figure>
<img src = 'img/geodesic_coor.png' width="300">
<figcaption align = "center"><b>Fig. 2 local system examples</b></figcaption>
</figure>

<figure>
<img src = 'img/patch_operator.png' width="600">
<figcaption align = "center"><b>Fig. 3 Path Operator Construction</b></figcaption>
</figure> 


## Geodesic Convolution 
GC applys filter a to patches extracted from $f \in L^2 (X)$ in local geodesic polar coordinates 

$(f*a)(x) = \sum_{\theta , r} (D(x)f)(r,\theta) \alpha (\theta , r)$



<figure>
<img src = 'img/geodesic_convolution.png' width="600">
<figcaption align = "center"><b>Fig. 4 Geodesic Convolution</b></figcaption>
</figure> 


### Geodesic Convolution Layer

<figure>
<img src = 'img/convolution_layer.png' width="600">
<figcaption align = "center"><b>Fig. 4 Geodesic Convolution Layer</b></figcaption>
</figure>

#### Add Max Pooling Layer (remove rotation ambiguity)

<figure>
<img src = 'img/max_pooling.png' width="600">
<figcaption align = "center"><b>Fig. 4 Geodesic Convolution Layer</b></figcaption>
</figure>

## Geodesic Convolution Neural Network Architecture
<figure>
<img src = 'img/GCNN.png' width="600">
<figcaption align = "center"><b>Fig. 4 Geodesic Convolution Layer</b></figcaption>
</figure>


## III. Implementation 

In [13]:
import sys
import os
import numpy as np
import scipy.io
import time

import theano
import theano.tensor as T
import theano.sparse as Tsp

import lasagne as L
import lasagne.layers as LL
import lasagne.objectives as LO
from lasagne.layers.normalization import batch_norm

sys.path.append('..')
from icnn import utils_lasagne, dataset, snapshotter

import geomstats


In [14]:
# Network Defination
nin = 544
nclasses = 6890
l2_weight = 1e-5

def get_model(inp, patch_op):
    icnn = LL.DenseLayer(inp, 16)
    icnn = batch_norm(utils_lasagne.GCNNLayer([icnn, patch_op], 16, nrings=5, nrays=16))
    icnn = batch_norm(utils_lasagne.GCNNLayer([icnn, patch_op], 32, nrings=5, nrays=16))
    icnn = batch_norm(utils_lasagne.GCNNLayer([icnn, patch_op], 64, nrings=5, nrays=16))
    ffn = batch_norm(LL.DenseLayer(icnn, 512))
    ffn = LL.DenseLayer(icnn, nclasses, nonlinearity=utils_lasagne.log_softmax)

    return ffn

inp = LL.InputLayer(shape=(None, nin))
patch_op = LL.InputLayer(input_var=Tsp.csc_fmatrix('patch_op'), shape=(None, None))

ffn = get_model(inp, patch_op)

# L.layers.get_output -> theano variable representing network
output = LL.get_output(ffn)
pred = LL.get_output(ffn, deterministic=True)  # in case we use dropout

# target theano variable indicatind the index a vertex should be mapped to wrt the latent space
target = T.ivector('idxs')

# to work with logit predictions, better behaved numerically
cla = utils_lasagne.categorical_crossentropy_logdomain(output, target, nclasses).mean()
acc = LO.categorical_accuracy(pred, target).mean()

# a bit of regularization is commonly used
regL2 = L.regularization.regularize_network_params(ffn, L.regularization.l2)


cost = cla + l2_weight * regL2



In [15]:
# Update Rule Defination

params = LL.get_all_params(ffn, trainable=True)
grads = T.grad(cost, params)
# computes the L2 norm of the gradient to better inspect training
grads_norm = T.nlinalg.norm(T.concatenate([g.flatten() for g in grads]), 2)

# Adam turned out to be a very good choice for correspondence
updates = L.updates.adam(grads, params, learning_rate=0.001)

## IV. Demonstration and Analysis

### Dataset
The dataset used is called MPI Dynamic FAUST [7]. It is a 4D scan of human moving captured over time. FAUST shapes contained 6.8K points. All shapes were scaled to unit geodesic diameter.

It contains 100 scans with 10 poses <br>
<figure>
<img src = 'img/train_images.png' width="800">
<figcaption align = "center"><b>Fig. 5 10 poses in the training dataset</b></figcaption>
</figure>


In [16]:
# Dataset
base_path = '/FAUST_registrations/data/diam=200/'

ds = dataset.ClassificationDatasetPatchesMinimal(
    'FAUST_registrations_train.txt', 'FAUST_registrations_test.txt',
    os.path.join(base_path, 'descs', 'shot'),
    os.path.join(base_path, 'patch_aniso', 'alpha=100_nangles=016_ntvals=005_tmin=6.000_tmax=24.000_thresh=99.900_norm=L1'), 
    None, 
    os.path.join(base_path, 'labels'),
    epoch_size=50)


FileNotFoundError: [Errno 2] No such file or directory: 'FAUST_registrations_train.txt'

### Training

In [17]:
n_epochs = 50
eval_freq = 1

start_time = time.time()
best_trn = 1e5
best_tst = 1e5

kvs = snapshotter.Snapshotter('demo_training.snap')

for it_count in xrange(n_epochs):
    tic = time.time()
    b_l, b_c, b_s, b_r, b_g, b_a = [], [], [], [], [], []
    for x_ in ds.train_iter():
        tmp = funcs['train'](*x_)

        # do some book keeping (store stuff for training curves etc)
        b_l.append(tmp[0])
        b_c.append(tmp[1])
        b_r.append(tmp[2])
        b_g.append(tmp[3])
        b_a.append(tmp[4])
    epoch_cost = np.asarray([np.mean(b_l), np.mean(b_c), np.mean(b_r), np.mean(b_g), np.mean(b_a)])
    print(('[Epoch %03i][trn] cost %9.6f (cla %6.4f, reg %6.4f), |grad| = %.06f, acc = %7.5f %% (%.2fsec)') %
                 (it_count, epoch_cost[0], epoch_cost[1], epoch_cost[2], epoch_cost[3], epoch_cost[4] * 100, 
                  time.time() - tic))

    if np.isnan(epoch_cost[0]):
        print("NaN in the loss function...let's stop here")
        break

    if (it_count % eval_freq) == 0:
        v_c, v_a = [], []
        for x_ in ds.test_iter():
            tmp = funcs['acc_loss'](*x_)
            v_a.append(tmp[0])
            v_c.append(tmp[1])
        test_cost = [np.mean(v_c), np.mean(v_a)]
        print(('           [tst] cost %9.6f, acc = %7.5f %%') % (test_cost[0], test_cost[1] * 100))

        if epoch_cost[0] < best_trn:
            kvs.store('best_train_params', [it_count, LL.get_all_param_values(ffn)])
            best_trn = epoch_cost[0]
        if test_cost[0] < best_tst:
            kvs.store('best_test_params', [it_count, LL.get_all_param_values(ffn)])
            best_tst = test_cost[0]
print("...done training %f" % (time.time() - start_time))

FileNotFoundError: [Errno 2] Unable to open file (unable to open file: name = 'demo_training.snap', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

### Testing

In [None]:
rewrite = True

out_path = '/outputs' 
print "Saving output to: %s" % out_path

if not os.path.isdir(out_path) or rewrite==True:
    try:
        os.makedirs(out_path)
    except:
        pass
    
    a = []
    for i,d in enumerate(ds.test_iter()):
        fname = os.path.join(out_path, "%s" % ds.test_fnames[i])
        print fname,
        tmp = funcs['predict'](d[0], d[1])[0]
        a.append(np.mean(np.argmax(tmp, axis=1).flatten() == d[2].flatten()))
        scipy.io.savemat(fname, {'desc': tmp})
        print ", Acc: %7.5f %%" % (a[-1] * 100.0)
    print "\nAverage accuracy across all shapes: %7.5f %%" % (np.mean(a) * 100.0)
else:
    print "Model predictions already produced."

### Outcome

I was not able to reproduce the output of the paper. The dataset that was used requires a lot more preprocesssing. I have tried incorporating code from the author [9] but with no luck. Below is the anlaysis I did from the results of the paper.



### Analysis

Acoording to the paper, GCNN is more robust to previous attempts(including Random forest, functional map, and blended maps). It achieves higher percent of correct correspondance
<figure>
<img src = 'img/geodesic_radius.png' width="800">
<figcaption align = "center"><b>Fig. 6 Performance of shape correspondence using Princeton Benchmark</b></figcaption>
</figure>


In Figure 7, you can see GCNN has a better performance on matching corresponding body parts than random forrest.

<figure>
<img src = 'img/gcc_vs_rf.png' width="800">
<figcaption align = "center"><b>Fig. 7 GCNN (bottom) vs random forest(top)</b></figcaption>
</figure>



<figure>
<img src = 'img/precision.png' width="800">
<figcaption align = "center"><b>Fig. 8 Performance (in terms of Precision-Recall) of shape retrieval on the FAUST</b></figcaption>
</figure>

In conclusion, GCNN is a flexible and effective solution for shape analysis. It be used to solve both simple and complex problems by adjusting the amount of concollution layers.

## Reference
[1] A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3D scenes. PAMI, 21(5):433– 449, 1999. <br>
[2] S. Manay et al. Integral invariants for shape matching. PAMI, 28(10):1602–1618, 2006.<br>
[3] D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91–110, 2004.<br>
[4] J. Sun, M. Ovsjanikov, and L. J. Guibas. A concise and provably informative multi-scale signature based on heat diffusion.CGF, 28(5):1383–1392, 2009.<br>
[5] M. Aubry, U. Schlickewei, and D. Cremers. The wave kernel signature: A quantum mechanical approach to shape analysis. In Proc. ICCV, 2011.<br>
[6] J. Bruna et al. Spectral networks and locally connected networks on graphs. In Proc. ICLR, 2014.<br>
[7] F. Bogo et al. FAUST: Dataset and evaluation for 3D mesh registration. In Proc. CVPR, 2014.F. Bogo et al. FAUST: Dataset and evaluation for 3D mesh registration. In Proc. CVPR, 2014.<br>
[8] Miolane, Nina, et al. "Geomstats: a Python package for Riemannian geometry in machine learning." Journal of Machine Learning Research 21.223 (2020)
[9] https://github.com/jonathanmasci/EG16_tutorial