# **Convolutional neural network architecture for geometric matching**

**Authors: Ignacio Rocco  (IDI), Relja Arandjelovic (ENS), Josef Sivic (3CIIRC)**

**Official Github**: https://github.com/ignacio-rocco/cnngeometric_pytorch

---

**Edited By Su Hyung Choi (Key Summary & Code Practice)**

If you have any issues on this scripts, please PR to the repository below.

**[Github: @JonyChoi - Computer Vision Paper Reviews]** https://github.com/jonychoi/Computer-Vision-Paper-Reviews

Edited Jan 10 2022

---

### **Abstract**

<table>
    <thead>
        <tr>
            <th>
                Abstract
            </th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>
                <i>
                    We address the problem of determining correspondences
                    between two images in agreement with a geometric model
                    such as an affine or thin-plate spline transformation, and
                    estimating its parameters. The contributions of this work
                    are three-fold. First, we propose a convolutional neural network architecture for geometric matching. The architecture
                    is based on three main components that mimic the standard
                    steps of feature extraction, matching and simultaneous inlier detection and model parameter estimation, while being
                    trainable end-to-end. Second, we demonstrate that the network parameters can be trained from synthetically generated imagery without the need for manual annotation and
                    that our matching layer significantly increases generalization capabilities to never seen before images. Finally, we
                    show that the same model can perform both instance-level
                    and category-level matching giving state-of-the-art results
                    on the challenging Proposal Flow dataset.
                </i>
            </td>
        </tr>
    </tbody>
</table>

### **Introduction**


<table>
    <thead>
        <tr>
            <th>
                Introduction
            </th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>
                <p>
                    Estimating correspondences between images is one of
                    the fundamental problems in computer vision [19, 25] with
                    applications ranging from large-scale 3D reconstruction [3]
                    to image manipulation [21] and semantic segmentation
                    [42]. Traditionally, correspondences consistent with a geometric model such as epipolar geometry or planar affine
                    transformation, are computed by detecting and matching
                    local features (such as SIFT [38] or HOG [12, 22]), followed by pruning incorrect matches using local geometric
                    constraints [43, 47] and robust estimation of a global geometric transformation using algorithms such as RANSAC
                    [18] or Hough transform [32, 34, 38]. This approach works
                    well in many cases but fails in situations that exhibit (i) large
                    changes of depicted appearance due to e.g. intra-class variation [22], or (ii) large changes of scene layout or non-rigid deformations that require complex geometric models with
                    many parameters which are hard to estimate in a manner
                    robust to outliers.
                </p>
                <table>
                    <tbody>
                        <tr>
                            <td>
                                <img src="./imgs/figure1.png" width="350" />
                            </td>
                            <td>
                                Figure 1: Our trained geometry estimation network automatically
                                aligns two images with substantial appearance differences. It is
                                able to estimate large deformable transformations robustly in the
                                presence of clutter.
                            </td>
                        </tr>
                    </tbody>
                </table>
                <p>
                    In this work we build on the traditional approach and
                    develop a convolutional neural network (CNN) architecture
                    that mimics the standard matching process. First, we replace the standard local features with powerful trainable
                    convolutional neural network features [31, 46], which allows us to handle large changes of appearance between
                    the matched images. Second, we develop trainable matching and transformation estimation layers that can cope with
                    noisy and incorrect matches in a robust way, mimicking the
                    good practices in feature matching such as the second nearest neighbor test [38], neighborhood consensus [43, 47] and
                    Hough transform-like estimation [32, 34, 38].
                </p>
                <p>
                    The outcome is a convolutional neural network architecture trainable for the end task of geometric matching,
                    which can handle large appearance changes, and is therefore
                    suitable for both instance-level and category-level matching
                    problems.
                </p>
            </td>
        </tr>
    </tbody>
</table>
