Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointcloud from Disparity Map #75

Open
dkczk opened this issue Mar 2, 2023 · 2 comments
Open

Pointcloud from Disparity Map #75

dkczk opened this issue Mar 2, 2023 · 2 comments

Comments

@dkczk
Copy link

dkczk commented Mar 2, 2023

I have rectified stereo image pairs and processed a disparity map using RAFT. Afterwards I wanted to convert that map to a 3D pointcloud using reprojectImageTo3D() function from OpenCV's Python library. I read the disparity map with OpenCV's imread(file_path, IMREAD_GRAYSCALE) and passed it together with the Q-Matrix to the reprojection function. Until this step everything works fine, but when assigning the colors to the pointcloud with cvtColor(imgL, cv2.COLOR_BGR2RGB) I get an error:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 2054 but corresponding boolean dimension is 2080
It seems that the dimensions of the left stereo image and the disparity map doesn't match which causes the error. So ich checked your bicycle example and recognized, that the image size there also differs. I guess that I'm missing a step between reading the disparity map and converting to a 3D pointcloud. What am I doing wrong?

Further information:
Disparity Map Processing Options

    parser.add_argument('--restore_ckpt', help="restore checkpoint", default=path/to/models/raftstereo-middlebury.pth')
    parser.add_argument('--save_numpy', action='store_true', help='save output as numpy arrays')
    parser.add_argument('-l', '--left_imgs', help="path to all first (left) frames", default="path/to/left.jpg")
    parser.add_argument('-r', '--right_imgs', help="path to all second (right) frames", default="path/to/right.jpg")
    parser.add_argument('--output_directory', help="directory to save output", default="path/to/output")
    parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')
    parser.add_argument('--valid_iters', type=int, default=32, help='number of flow-field updates during forward pass')

    # Architecture choices
    parser.add_argument('--hidden_dims', nargs='+', type=int, default=[128]*3, help="hidden state and context dimensions")
    parser.add_argument('--corr_implementation', choices=["reg", "alt", "reg_cuda", "alt_cuda"], default="reg", help="correlation volume implementation")
    parser.add_argument('--shared_backbone', action='store_true', help="use a single backbone for the context and feature encoders")
    parser.add_argument('--corr_levels', type=int, default=4, help="number of levels in the correlation pyramid")
    parser.add_argument('--corr_radius', type=int, default=4, help="width of the correlation pyramid")
    parser.add_argument('--n_downsample', type=int, default=2, help="resolution of the disparity field (1/2^K)")
    parser.add_argument('--slow_fast_gru', action='store_true', help="iterate the low-res GRUs more frequently")
    parser.add_argument('--n_gru_layers', type=int, default=3, help="number of hidden GRU levels")

Stereo Image Information
Width: 2456
Height: 2054
Depth: 24 Bit
dpi: 96

Disparity Map Information
Width: 2464
Height: 2080
Depth: 32 Bit

Script

import numpy as np
import cv2


imgL = cv2.imread('path/to/left.jpg')
disp = cv2.imread('path/to/disp.png' , cv2.IMREAD_GRAYSCALE)
h, w = imgL.shape[:2]
f = 0.8 * w
Q = np.float32([[1, 0, 0, -0.5 * w],
                [0, -1, 0, 0.5 * h],
                [0, 0, 0, -f],
                [0, 0, 1, 0]])
points = cv2.reprojectImageTo3D(disp, Q)
colors = cv2.cvtColor(imgL, cv2.COLOR_BGR2RGB)

Thanks in advance for your help.

@lahavlipson
Copy link
Collaborator

Yes, the dimension of the disparity output does not always exactly match the dimension of the input image. This is because padding is done to the input image so that the height and width are divisible by 32.

To fix this, you need to do:

import sys
sys.path.append("RAFT-Stereo/core")
from utils.utils import InputPadder
padder = InputPadder(image.transpose(2,0,1).shape, divis_by=32)
disp = padder.unpad(disp[None, None])[0,0]

I've created a Google Colab which produces a point cloud from the middlebury predictions. Hopefully this will be helpful: https://colab.research.google.com/drive/1G8WJCQt9y55qxQH6QV6PpPvWEbd393g2?usp=sharing

@lahavlipson
Copy link
Collaborator

lahavlipson commented Mar 18, 2023

This issue has now been fixed in demo.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants