- Propose novel group consistency loss for unsupervised part-segmentation. Use inconsistent dataset to train a shape refiner.
- Cut / Assoc(A) + Cut / Assoc(B). Cut, Assoc(A), and Assoc(B) are summation of edge weights.
- Multi-scale + reverse connection
- 2D proposals + PointNet
- Instance coloring.
- y = phi(x) + (u, v)
- min(y - cy). Attractive force only.
- Direct: minimize the color error of the projection.
- Indirect: minimize the geometry error of the projection.
- A feature point is often represented by an oriented planar texture patch.
- Surf
- Fast
- (Extended) Kalman filter
- Particle filter
- Graph-based
- Prediction step (motion model): bel(xt)’ = ∫p(xt | ut, xt-1) bel(xt-1)dxt-1
- Correction step (sensor/observation model): bel(xt) = \phi p(zt | xt) bel(xt)
- A Bayes filter.
- Optimal for linear models with Gaussian distributions.
- Prediction step: xt (state) = At * xt-1 + Bt * ut (observation) + epsilon.
- Correction step: zt (predicted observation) = Ct * xt + delta.
- noise smoothing (improve noisy measurements) + state estimation (for state feedback) + recursive (computes next estimate using only most recent measurement).
- Marginal and conditional of Gaussian are still Gaussian.
- Extended: local linearilization (at the current best-estimated point) of non-linear functions. (The inverse operation is the bottleneck.)
- Unscented: sampling techniques to find an approximated Gaussian distribution.
- Discrete maps into cells (occupied or free space).
- Non parametric model.
- Assumptions: Cells are binary, static, and independent; Poses are known.
- Binary bayes filter (for static state). Correction step only.
- Levenberg-Marquardt (LM) algorithm.
- Used in a small volume (a room) in a long term.
- First real-time monocular SLAM system.
- Probabilistic filtering of a joint state consisting of camera and scene feature position estimates.
- Separate tracking and mapping into two parallel threads.
- Mapping is based on keyframes processed using batch techniques(bundle adjustment).
- The map is densely initialized from a stereo pair (5-point algorithm).
- New points are initialized by epipolar search.
- A large number of points are mapped.
- Bundle adjustment + robust n-point pose estimation.
- PTAM + dense reconstruction
- Projective TSDF (easy to parallelize but correct exactly only at the surface.
- Moving average for surface (TSDF) update.
- Surface measurement (V, N) -> Projective TSDF -> V, N -> pose (frame to model).
- Coarse 6D warp-field to model the dynamic motion.
- Estimation of the volumetric model-to-frame warp field -> fusion of the live frame depth map into the canonical space -> adaption of the warp-field to capture new geometry.
- Estimate pose and landmark locations (represented in the state space).
- Assumption: known correspondences.
- Object-based map representation. Use Mask R-CNN to predict object-level TSDF for initialization.
- Predict foreground probability for rendering.
- Pose-graph of keyframes with semi-dense depth maps.
- Filtering over a large number pixelwise small-baseline stereo comparisons.
- Tracking with sim(3) (detecting scale-drift explicitly).
- Initialized with a random depth map and large variance.
- Re-weighted Gauss-Newton optimization.
- Direct + Sparse
- Points are well-distributed. Divide the image into 32x32 blocks and select one pixel inside each block with large gradient.
- Incrementally construct cost volume and minimize energy for dense mapping.
- Dense tracking.
- Use a sparse code to represent depth.
- Linear depth decoder (no ReLU). Jacobian of the decoder w.r.t the code can be computed.
- SGM (Semi-global matching) / LoopyBP: Generalize there's an axact solution for a chain.
- Graph cuts: generalize there's an exact solution if d has only two values.
- TRW-S
- Uniform coverage by finding top-k local maxima from each image block (32 x 32 blocks).
- Matching -> expansion -> filtering -> polygonal surface reconstruction.
- Dominant axes + hypothesis planes + optimization
- Detector + orientation + descriptor
- Integrate geometry constraints (based on GT surface normals).
- Input a list of all possible matching pairs (coordinates only) and predict if each pair is valid or not.
- V(k) = sum(w_i(x_i - c_k), where K is the number of clusters.
- Regional VLAD/ASMK
- Inner product layer at the end.
- Classify possible disparities.
- Find the most similar patch forward and backward.
- Cost volume based on plane-sweep and photometric difference. Volume aggregation for multiple views.
- Use variance to consruct the cost volume. Use 3D CNN for regularization.
- Depth refinement based on Deep image matting.
- Use GRU to process the cost volume sequentially. Concatenate all the outputs to get the regularized cost volume.
- Point pair features
- PPF-FoldNet: rotation invariant
- Reinforce algorithm based on rendered images
- Junctions + post-processing
- Single image -> 360 degree panorama.
- SUNCG
- TSDF input.
- Layered Scene Decomposition via the Occlusion-CRF
- Layer-structured 3D Scene Inference via View Synthesis
- Image -> voxel -> projection (supervision)
Pixels, Voxels, and Views: A Study of Shape Representations for Single View 3D Object Shape Prediction
- Multi-surface generalizes better than voxel-based representations. It also looks better (high resolution). It can also capture some thin structures, though its post-processing step (surface reconstruction) might discard them.
- Viewer-centered gneralizes better than object-centered. It has good shape prediction but poor pose prediction. Object-centered tend to memorizes the observed meshes, and its learned features can be used for object recognition.
- The model trained to predict shape and pose can be finetuned for object recognition. Maybe it will generalize better.
- Predict per-pixel dominant directions (frames), which could be used for other applications.
- Deformable mesh model (from Neural 3D Mesh Renderer).
- Use rendered views (for both observed and unseen views) to add discriminative loss.
- Internal pressure loss to encourage larger volume.
- Gnerate 2D mesh based on Canny edges.
- Estimate plane parameters for each face.
- CNN + DenseCRF
- Bootstrap net + iterative net + refinement net
- Flow-based image warping for iterative refinement.
- Similarity loss between internal features.
- Paired images of the same scene with different modalities.
- Porxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching.
- Domain confusion.
- Instances-based: find similar instances in the source domain.
- Mapping-based: map instances from two domains into a new data space with better similarity.
- Network-based: reuse the partial of network pre-trained in the source domain.
- Adversarial-based: Use domain confusion to penalize the network if it predict the domain correctly.
- Predict relative location between patches.
- Train a learnet to predict the model parameters.
- Assess the model on another exemplar to predict if the new exemplar is of the same class with the one used by the learnet.
Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition
- Similar to DeepDream.
- Simplifying the input images.
- Visualizing the receptive fields of units and their activation patterns.
- Indentifying the semantics of internal units.
- AutoEncoder. The latent code is divided into segments.
- Only one attribute changes in each mini-batch.
- Project nearby points onto the tangent plane.
- Divide point cloud into voxels and process points inside each voxel using a PointNet.
- Apply extension operators to convert point cloud to volumetric representations (using basis functions).
- Process the volumetric representation and sample back to point cloud.
- K-neareat neighbor. Lift each neighbor into new feature, and concatenate lifted features with the current one.
- Learn a KxK transformation matrix to permute, and a standard convolution to process.
- Divide the point cloud into slices and use recurrent network to process slices sequentially.
- Non-local modules.
- Rendering + RL
- AutoDecoder which takes a 3D point and a shape code as input and predicts the SDF for the 3D point.
- Use structured implicit shape representation (100x7 parameters) to represent the 3D surface.
- Classify sampled points as inside or outside based on the predicted parameters, which defines the loss.
- Learn the weight for each neighbor point (similarity).
- Compute the weighted summation of features (non-local module).
- Predict 100 vertices (set generation), read features from voxel grids, and use graph neural network to predict edges.
- Find face candidates in the dual graph, and use graph network to predict face existence.
- Generate training data using mesh simplification (https://github.com/kmammou/v-hacd).
- Recursive network (merging two parts into one node).
- Train an autoencoder and map the context feature to the latent representation for decoding.
- Voxel -> boxes
- Consistency loss + coverage loss. Reinforce algorithm to allow an arbitrary number of primitives.
- Predict weights between every pair of views, which are used for estimating the absolute pose.
- Repeat the process iteratively.
End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation
- Message passing