How to acquire synchronized frames from depth and main(1280x720) RGB cameras? #64

maurosyl · 2018-10-23T08:32:14Z

Depth and RGB frames in the Hololens are not aligned. I'm trying to get information about the depth in the scene the Hololens user is seeing, but to do so i must establish a correspondence between the coordinates of the two kinds of frame, probably with some stereo calibration algorithm or something similar. The problem is that any way i can think of for achieving the alignment requires the RGB frames to be acquired "at the same time" as the ones i can easily get from the Recorder tool, any idea on how to do that?

Huangying-Zhan · 2018-10-23T12:17:54Z

Hi @mauronano, poses of the sensors in different timestamps are available in research mode. For depth maps, there is an unprojection model for the depth sensor as well. Therefore, Using the unprojection model, you can get 3D coordinates of the points in the depth map. Knowing the relative pose between depth sensor and RGB camera, you should be able to transform the 3D points to RGB camera coordinate system and project the points to RGB camera view, which means that you can create depth maps for RGB camera view. In this case, RGB frame and depth frame are not necessary to be acquired at the same time. However, if the acquired time gap is too large, there are some other issues, like occlusion.

FracturedShader · 2018-10-23T14:58:52Z

@mauronano, what @Huangying-Zhan said is exactly right. In my case I have an additional set of steps where I keep the latest frame from each of the streams, but only send the data from the last frame of each if the wearer hasn't moved very much in the last second. This is mainly to prevent blurry images, but also helps to make debugging a little bit easier. While the CameraStreamCoordinateMapper and CameraStreamCorrelation samples seem to indicate you can directly convert from one image to the other, I found this to simply not be true. Likely because those two samples use a stationary Kinect, which acts differently. In my case I ended up doing: ColorProjectionMatrix * ColorViewMatrix * DepthSpaceToColorSpace * InverseDepthViewMatrix * DepthCameraPoint. It's a whole ordeal, but all in all a pretty standard 2D->3D->3D->2D graphical space conversion pipeline. To get the DepthSpaceToColorSpace matrix you can use the TryGetTransformTo method with the two SpatialCoordinateSystems like they do in the MediaFrameReaderContext::FrameArrived function. Seeing in your previous question that you are using Unity, be sure to convert the matrices before doing anything with them. The Mixed Reality Toolkit has a preview sample that shows how to do that.

FracturedShader · 2018-10-23T15:35:04Z

I would like to add, the projection returned by the streams for the color image is very wrong. I ended up needing to fake it. Once you get it dialed in though, you can capture things as small as individual wires pretty well.

Folds in fabric come out pretty well too. Keep in mind, the data is rough, and full of holes.

maurosyl · 2018-10-23T15:37:59Z

@Huangying-Zhan , @FracturedShader thank you both for your answers. Just to make sure if i understand what you suggest: after i manage to get the 3D coordinates of the unprojected depth pixel (which you refer to as "DepthCameraPoint") i use the cameraViewTransform matrix provided by the Recorder tool to map them to the depth camera space. Regarding the DepthSpaceToColorSpace, i'm not sure about the meaning of the "FrameToOrigin" matrix produced at the place in the code you pointed at, maybe grasping that could help me better understand the math behind this trasformation. As for the "ColorViewMatrix " which stores the camera extrinsics, should i edit the Recorder project to provide me that too? As at the moment it returns only the parameters of the low resolution RGB cameras and not the 1280x720 one.

FracturedShader · 2018-10-23T15:41:55Z

@mauronano, unfortunately I ended up writing my own custom application. I haven't actually tried to use the Recorder project directly. As a result I don't know much about how it specifically works. You may just have to poke around and make modifications as you see fit to get the data you need.

Huangying-Zhan · 2018-10-24T01:26:00Z

@mauronano, suppose you have 3D points in Depth camera space, then if you want to get 3D points in Color camera space, you need the relative pose between depth camera and color camera. From the recorder app, there is an example of getting the absolute pose for each sensor, which you can check from here. Eventually, you can get the relative pose from the absolute poses.

maurosyl · 2018-10-24T09:41:02Z

I want to thank you both very much for your answers as i'm only approaching to computer vision and the stuff i find online is still pretty obscure to me. I have one last question, can i only find the alignment once, offline, and then use it to map every depth frame on my rgb frames or it is something i should do continuously?

Huangying-Zhan · 2018-10-24T11:07:24Z

@mauronano, given that the depth frames and rgb frames are generally taken at different timestamps, which means that, if you have many RGB-D pairs, the relative poses (between RGB and D) for these pairs are not constant and you can't use a single transformation for all the alignments. Therefore, you need to do this for each RGB-D pair. If RGB-D pairs are always taken at same timestamp, then a single transformation is enough for all the alignments. Unfortunately, it is not the case in here.

pranaabdhawan · 2019-02-27T23:08:01Z

@mauronano were you able to convert between the 2 co-ordinate systems? I tried the mapping by going from 2D depth -> World States -> 2d rgb, using the matrices provided by the csv file in the recorder app. However I am following this is to project into the rgb space: https://docs.microsoft.com/en-us/windows/mixed-reality/locatable-camera. Somehow I see the point clouds for the 2 views but they have some offset in each coordinate. Maybe the multiplication by 0.5 etc. as provided in the shader code example (above link) should not be exactly implemented. Do you know the correct approach? Thanks!

cyberj0g · 2019-08-21T09:24:52Z

For anyone who comes from Google, see my implementation of such mapping. Might not be the best code, but works fast and accurate enough for my purpose.

LisaVelten · 2019-10-29T16:25:02Z

Hi everyone,

I am working on the same problem you have. I cannot get my depth images and hd images aligned. The following picture visualizes my problem. I try to align a calibration pattern. First I filter the 3D depth points, which belong to the calibration pattern, then I project these points into the HD image.

To figure out the problem I investigated in some detail the meaning of the tranformation matrices. In the following I outline my understanding of the transformation matrices. Then I describe how I try to align my images. I would appreciate your help very much!

1. CameraCoordinateSystem (MFSampleExtension_Spatial_CameraCoordinateSystem)
In the example HoloLensForCV this coordinate system is used to obtain the transformation "FrameToOrigin". The FrameToOrigin transformation is obtained by transforming the CameraCoordinateSystem to the OriginFrameOfReference. (line 140-142 in MediaFrameReaderContext.cpp)

I still do not exactly know what is described by this transformation. What is meant by "frame"?

Through experimenting I found out that the translation vector changes when moving. In fact, the changes do make sense: If I move forward, the z-component becomes smaller. This agrees with the coordinate system in the image below. The z-axis is pointing in the opposite direction of the image plane.

The same applies for moving left or right: moving right makes the x component increase. The y component is about stable. This makes sense as I am not moving up or down.
What I am really uncertain about is the rotational part of the transformation matrix. The rotational part is almost an Identity Matrix. The rotation of my head seems to be contained in the CameraViewTransform, which I describe in the second point.

As far as I understand, the FrameToOrigin Matrix looks as follows:
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
x, y, z, 1]

For me the "FrameToOrigin" seems to describe the relation between a fixed point on the HoloLens to the Origin (The Origin is defined each time the app is started, this helps to map each frame to a frame of reference). In the Image above the Origin is probably the "App-specific Coordinate System".

2. CameraViewTransform (MFSampleExtension_Spatial_CameraViewTransform )
The CameraViewTransform is directly saved with each frame (In contrast to FrameToOrigin, no transformation is neccessary).

The rotation of the head seems to be saved within the rotational part of this matrix. I testes this by moving my head around the y-axis. If I turn about 180 ° to the right around my y-axis, the rotational part looks as follows:
[0, 0, 1,
0, 1, 0,
-1, 0, 0].
This corresponds to a 180° rotation around the y-axis - what we expect..

The translational part seems to stay about stable. This would make sense if the translational part described the translation between the fixed point on the HoloLens and the respective camera (hd or depth). However, I would expect the translational part to stay exactly equal. This is not the case. The translational part is only "about" equal and not exactly.

If I do not turn my head (rotational part is an Identity Matrix) the CameraViewTransform looks as follows:

CameraViewTransform for HD Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00631712, -0.184793, 0.145006, 1]

CameraViewTransform for Depth Camera
[1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0.00798517, -0.184793, 0.0537722, 1]

So the CameraViewTransform seems to capture the rotation of the users head. What is captured by the translational part? If the translational part is the distance between a fixed point on the HoloLens and the respective camera - why is the translational part not always exactly equal?

3. CameraProjectionTransform (MFSampleExtension_Spatial_CameraProjectionTransform)
This transformation is described on the following github page:
https://github.com/MicrosoftDocs/mixed-reality/blob/5b32451f0fff3dc20048db49277752643118b347/mixed-reality-docs/locatable-camera.md

However, what is still unclear: What is the meaning of the terms A and B?

My aim is to map between the depth camera and the hd camera of the HoloLens. To do this I do the following:

I record images with the Recorder Tool of the HoloLensForCV sample
I take a depth image and look for the corresponding hd image by checking the timestamps.
I use the unprojection mapping to find the 3D points in the CameraViewSpace of the Depth Camera.
I transform the 3D points from the Depth Camera View to the HD Camera View and project them onto the image plane I use the following transformations:
Pixel Coordinates = [3D depth point,1] * inv(CameraViewTransform_Depth) * FrameToOrigin_Depth * inv(FrameToOrigin_HD) * CameraViewTransform_HD * CameraProjectionTransfrom_HD

These Pixel Coordinates are in the range from -1 to 1 and need to be adjusted to the image size of 720x1280. This is done as follows:
 x_rgb = 1280 * (PixelCoordinates.x + 1) / 2;
 y_rgb = 720 * (1 - ((PixelCoordinates.y +1)/2));

Result: When transforming my detections from the depth camera to the hd image camera, the objects (In this case the Calibration Pattern) are not 100 % aligned. So I am trying to figure out where the misalignment is coming from. Am I understanding the transformation matrices wrong or has anyone experienced similar problems?

The problems might occur if the spatial mapping of the HoloLens is not 100% working correctly. This might happen If the HoloLens cannot find enough features to map the room. Thus, I tested my setup in different rooms. Especially, in smaller rooms with more clutter in the background (such that the HoloLens can find more features). However, the problem still occurs. As I outlined above, the rough appearance of the transformations seems to be correct. I do not have any idea of how to test the transformation matrices further to grasp the problem.

I would appreciate your help very much! Thanks a lot in advance!
Lisa

cxnvcarol · 2019-10-30T18:50:45Z

Hello. I haven't tried it myself, but it looks like you're very close to the solution. Could the error come from which matrix are you using for the reprojection? (each camera has it's own CameraViewTransform and FrameToOriginTransform even if they're very close to each other).

Also, if it helps, I personally have found useful to check the ARUco Sample in the HololensForCV project in order to understand how to use these matrices. In the attachment I've extracted the important lines. In this case they have the correspondence already and they're triangulating the 3d point back to the world coordinates. Let us know if you find the solution, I'll apprecciate it.
2DPairTo3DPipelineHololens_arucoSample.pdf

LisaVelten · 2019-10-31T07:10:30Z

I thought it might be better to open a a new case (#119 ), as this one is already closed. I found a solution for the correct alignment, which I noted down in the comments. The exact content of the transformation matrices is still unclear to me. So the case #119 is still open.

maurosyl closed this as completed Oct 24, 2018

nmsmith mentioned this issue Oct 25, 2018

ArUco Marker Tracker without Research Mode #61

Open

Joon-Jung mentioned this issue Dec 12, 2018

Transforming short throw depth's pixel point to world coordinate #74

Closed

FracturedShader mentioned this issue Jun 30, 2019

How to use Sensor Streaming Camera Intrinsics in C# (Unity project) #51

Open

doughtmw mentioned this issue Jun 8, 2020

Visualization in world space doughtmw/HoloLensForCV-Unity#4

Closed

emeraldy mentioned this issue Jul 1, 2020

Is it possible to create a colored point cloud from the resulting data? #133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? #64

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? #64

maurosyl commented Oct 23, 2018

Huangying-Zhan commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

maurosyl commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

Huangying-Zhan commented Oct 24, 2018

maurosyl commented Oct 24, 2018

Huangying-Zhan commented Oct 24, 2018

pranaabdhawan commented Feb 27, 2019

cyberj0g commented Aug 21, 2019

LisaVelten commented Oct 29, 2019

cxnvcarol commented Oct 30, 2019

LisaVelten commented Oct 31, 2019

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? #64

How to acquire synchronized frames from depth and main(1280x720) RGB cameras? #64

Comments

maurosyl commented Oct 23, 2018

Huangying-Zhan commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

maurosyl commented Oct 23, 2018

FracturedShader commented Oct 23, 2018

Huangying-Zhan commented Oct 24, 2018

maurosyl commented Oct 24, 2018

Huangying-Zhan commented Oct 24, 2018

pranaabdhawan commented Feb 27, 2019

cyberj0g commented Aug 21, 2019

LisaVelten commented Oct 29, 2019

cxnvcarol commented Oct 30, 2019

LisaVelten commented Oct 31, 2019