depth_from_video_in_the_wild: not able to reproduce the result #46

liyingliu · 2019-08-17T14:11:21Z

@gariel-google Dear author, Thanks for sharing the source code of the paper.
I was trying to reproduce the result of the paper using your code. However, with your default setting (batch size=4, learning_rate=0.0002, etc.) training from scratch, the result I got it's quite far from what you stated in the paper (Abs Rel 0.147 for the best checkpoint within around 370k-th step vs 0.128 in the paper). For your information, I am using the evaluation code from sfmlearner as what struct2depth does.
Therefore, may I know what's setting for obtaining the paper's result? Or is there anything critical part missing in the current released code (maybe pretrained checkpoint for example)?
Thank you in advance.

gariel-google · 2019-08-17T16:57:18Z

Hi I am assuming you're training on KITTI? Did you create a "possibility mobile" mask for each image? Did you use an segmentation network to do that? Which one?

…

On Sat, Aug 17, 2019, 7:11 AM juju ***@***.***> wrote: @gariel-google <https://github.com/gariel-google> Dear author, Thanks for sharing the source code of the paper. I was trying to reproduce the result of the paper using your code. However, with your default setting (batch size=4, learning_rate=0.0002, etc.) training from scratch, the result I got it's quite far from what you stated in the paper (Abs Rel 0.147 for the best checkpoint within around 370k-th step vs 0.128 in the paper). For your information, I am using the evaluation code from sfmlearner <https://github.com/tinghuiz/SfMLearner/tree/master/kitti_eval> as what struct2depth does. Therefore, may I know what's setting for obtaining the paper's result? Or is there anything critical part missing in the current released code (maybe pretrained checkpoint for example)? Thank you in advance. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNH2XZHXPU5SHQ6YEDDQFABJPA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HFZCAQA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNAY4ZPYW5AVT3TDSE3QFABJPANCNFSM4IMPQMMA> .

liyingliu · 2019-08-18T01:34:17Z

Hi, Sorry for the missing information in my previous comment.

Yes, I am using KITTI, eigen split, using data generation code from vid2depth as what struct2depth does.

Yes, I have created a "possibility mobile" mask for each image. I am using the same masks of struth2depth (each object has different object ID and the objects are being tracked across three consecutive sequences). I am using Mask-RCNN to obtain the mask. Also, I have turned boxify=True, so the masks will become bounding boxes as I understand.
For your information, I also attached the the image in tensorboard of variable seg_stack of line 172, model.py.

gariel-google · 2019-08-18T04:19:17Z

Thanks for the information, that helps a lot. On KITTI we trained with a barch size of 16 and a learning rate of 1e-4. All other parameters are at their default values. In addition, we initialized from an ImageNet checkpoint for ResNet 18. The current release does not yet support initialization form a checkpoint, but it should be easy to set up. Fig. 5 in the paper indicates when you should expect convergence. Lastly, are you using https://github.com/google-research/google-research/blob/master/depth_from_video_in_the_wild/model.py#L388 to infer depth? We are planning to release pretrained checkpoints, more code and more dicumentation before ICCV. We will do our best to do it sooner than later.

…

On Sat, Aug 17, 2019 at 6:34 PM juju ***@***.***> wrote: Hi, Sorry for the missing information in my previous comment. Yes, I am using KITTI, eigen split, using data generation code from vid2depth <https://github.com/tensorflow/models/tree/master/research/vid2depth/dataset> as what struct2depth does. Yes, I have created a "possibility mobile" mask for each image. I am using the same masks of struth2depth (each object has different object ID and the objects are being tracked across three consecutive sequences). I am using Mask-RCNN to obtain the mask. Also, I have turned boxify=True, so the masks will become bounding boxes as I understand. For your information, I also attached the the image in tensorboard of variable seg_stack of line 172, model.py <https://github.com/google-research/google-research/blob/master/depth_from_video_in_the_wild/model.py> . [image: Screenshot 2019-08-18 at 9 24 02 AM] <https://user-images.githubusercontent.com/18667188/63219007-1355ea00-c19b-11e9-88ef-e1da17bf8943.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNFFVIYEE5PMU3XG3VLQFCRKLA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QWRKA#issuecomment-522283176>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNBXUHPWYVEWW6EDMXDQFCRKLANCNFSM4IMPQMMA> .

liyingliu · 2019-08-18T15:12:34Z

Thanks for the reply.

Yes, for depth inference, I am using the link you mentioned in your previous comment.

In Figure 5 of the paper, for Evaluated on KITTI, the training converges at around 1 million training images. So if we assume that batch_size=4, learning_rate=0.0002 has similar convergence, and as I found that my training has Abs_Rel=0.147 for the best checkpoint within around 370k-th step (370k steps=1.48 million images). Therefore, can I conclude that the pretrained ImageNet checkpoint has huge impact (0.147 vs 0.128) on the result?

Looking forward to your release and thanks for the efforts.

gariel-google · 2019-08-19T01:23:32Z

I feel that it's too early to draw a conclusion, we need to investigate this more. If you're ready to do it together, that would be great. We are planning to release a pretrained KITTI checkpoint in the next few days, and a first step can be establishing that we agree on how eval metrics are calculated. We can both run evals on the same checkpoint and exchange results. Based on that, we'll see how to proceed. How does that sound?

…

On Sun, Aug 18, 2019 at 8:12 AM juju ***@***.***> wrote: Thanks for the reply. Yes, for depth inference, I am using the link <https://github.com/google-research/google-research/blob/master/depth_from_video_in_the_wild/model.py#L388> you mentioned in your previous comment. In Figure 5 of the paper, for *Evaluated on KITTI*, the training converges at around 1 million training images. So if we assume that batch_size=4, learning_rate=0.0002 has similar convergence, and as I found that my training has Abs_Rel=0.147 for the best checkpoint within around 370k-th step (370k steps=1.48 million images). Therefore, can I conclude that the pretrained ImageNet checkpoint has huge impact (0.147 vs 0.128) on the result? Looking forward to your release and thanks for the efforts. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNFWASS3J5Q4QMRTSDTQFFRHBA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4RB75Q#issuecomment-522330102>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNBF5JPKUAW3SO7EAX3QFFRHBANCNFSM4IMPQMMA> .

liyingliu · 2019-08-19T14:13:22Z

Hi, it sounds great to me, Let's do it together.

happinessoverdue · 2019-08-20T02:36:50Z

In the paper,it said that some videos selected from 3079 YouTube8M videos labled 'Quadcopter', so does the IDs of them will be public soon?
And I realized that it need much time for process so many videos into three-frames-split and generate its masks and alignment，so the pretrained YouTube8M checkpoint will be also released soon?
I also notice that the current release does not yet support initialization form the resnet18 checkpoint pretrained on ImageNet, I'm trying to write codes to implement that since struct2depth has the similar codes organization...

gariel-google · 2019-08-20T04:18:13Z

Yes, we will release the IDs soon, and also pretrained checkpoints. Regarding the initialization mechanism, we're trying to release it soon, but it might take a bit longer. We will update as soon as possible.

…

On Mon, Aug 19, 2019 at 7:37 PM linxin ***@***.***> wrote: In the paper,it said that some videos selected from 3079 YouTube8M videos labled 'Quadcopter', so does the IDs of them will be public soon? And I realized that it need much time for process so many videos into three-frames-split and generate its masks and alignment，so the pretrained YouTube8M checkpoint will be also released soon? I also notice that the current release does not yet support initialization form the resnet18 checkpoint pretrained on ImageNet, I'm trying to write codes to implement that since struct2depth has the similar codes organization... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNDZJMA5PUZ3NXGIHHLQFNKE7A5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4U3TRQ#issuecomment-522828230>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNFFSD6GL4UA4PINNG3QFNKE7ANCNFSM4IMPQMMA> .

liyingliu · 2019-08-20T04:22:37Z

Hi @gariel-google, I am also evaluating the egomotion prediction by using inference_egomotion to obtain egomition, and using sfmlearner to compute 5-point/3-point ATE.

Since I am just evaluating on my trained model (the one with Abs_Rel=0.147, trained on eigen split training set) and eigen split training set has overlap frames with odometry sequence 09 and 10, the ATE should be reasonably good. However, the result I got it's quite bad compared to what you stated in the paper. (I apologize if I am evaluating the egomotion prediction wrongly.)

	Seq. 09	Seq. 10
5-point	0.0296	0.0245
3-point	0.0212	0.0180

For your information, I also attached the plotted trajectories.

Maybe we could exchange the odometry evaluation result as well?

gariel-google · 2019-08-20T04:47:08Z

We will release the KITTI-trained checkpoints, that should be enough for comparing odometry evals, am I correct? It will take at least a few days though - please bear with me ;-)

…

On Mon, Aug 19, 2019 at 9:22 PM juju ***@***.***> wrote: Hi @gariel-google <https://github.com/gariel-google>, I am also evaluating the egomotion prediction by using inference_egomotion <https://github.com/google-research/google-research/blob/master/depth_from_video_in_the_wild/model.py#L423> to obtain egomition, and using sfmlearner <https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_pose.py> to compute 5-point/3-point ATE. Since I am just evaluating on my trained model (the one with Abs_Rel=0.147, trained on eigen split training set) and eigen split training set has overlap frames with odometry sequence 09 and 10, the ATE should be reasonably good. However, the result I got it's quite bad compared to what you stated in the paper. (I apologize if I am evaluating the egomotion prediction wrongly.) | | Seq. 09 | Seq. 10 | |---------|:-------:|---------| | 5-point | 0.0296 | 0.0245 | | 3-point | 0.0212 | 0.0180 | For your information, I also attached the plotted trajectories. [image: seq09] <https://user-images.githubusercontent.com/18667188/63317302-e0e7f080-c344-11e9-9916-90d3327fd694.png> [image: seq10] <https://user-images.githubusercontent.com/18667188/63317304-e3e2e100-c344-11e9-9780-5357bfd24a6b.png> Maybe we could exchange the odometry evaluation result as well? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNHMCKHUBUG2TGK5WZDQFNWRVA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VAFVA#issuecomment-522846932>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNH5Z6TWRLZYN6HYUGDQFNWRVANCNFSM4IMPQMMA> .

liyingliu · 2019-08-20T05:41:11Z

Yes, as we exchange results it should be enough for comparing odometry evals. Thanks again for the efforts.

gariel-google · 2019-08-22T03:01:39Z

We just released some checkpoints (links in the README file) with the respective metrics. Note that there is a slight change in the code (in depth_prediction_net). Would you be ready to try them and see what metrics do you obtain? YouTube8M IDs coming soon.

…

On Mon, Aug 19, 2019 at 10:41 PM juju ***@***.***> wrote: Yes, as we exchange results it should be enough for comparing odometry evals. Thanks again for the efforts. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNABRKRBNTNND4SDVPDQFN7YNA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VDZ3Q#issuecomment-522861806>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNE3WQWT3JJD2M35E2TQFN7YNANCNFSM4IMPQMMA> .

liyingliu · 2019-08-22T06:14:43Z

Thanks for the release. Yes, I am ready to try! However, I noticed that only the data files are released, and as I understand that to restore a model in TensorFlow, we need 3 files (correct me if I am wrong)--index, data, meta. Therefore, could you release the complete checkpoints?

gariel-google · 2019-08-22T17:34:49Z

Sorry about that. I replaced the links, they now link to zip files that contain all the checkpoint components. Could you check them out? Thanks!

…

On Wed, Aug 21, 2019 at 11:14 PM juju ***@***.***> wrote: Thanks for the release. Yes, I am ready to try! However, I noticed that only the data files are released, and as I understand that to restore a model in TensorFlow, we need 3 files (correct me if I am wrong)--index, data, meta. Therefore, could you release the complete checkpoints? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUND5Z64FBXF64Z3OLC3QFYVGDA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44AFLA#issuecomment-523764396>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNFFXKE4JEXM3LIAY63QFYVGDANCNFSM4IMPQMMA> .

gariel-google · 2019-08-22T18:11:40Z

YouTube8M IDs are out (see the README file).

…

On Thu, Aug 22, 2019 at 10:34 AM Ariel Gordon ***@***.***> wrote: Sorry about that. I replaced the links, they now link to zip files that contain all the checkpoint components. Could you check them out? Thanks! On Wed, Aug 21, 2019 at 11:14 PM juju ***@***.***> wrote: > Thanks for the release. Yes, I am ready to try! However, I noticed that > only the data files are released, and as I understand that to restore a > model in TensorFlow, we need 3 files (correct me if I am wrong)--index, > data, meta. Therefore, could you release the complete checkpoints? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#46?email_source=notifications&email_token=ADXKUND5Z64FBXF64Z3OLC3QFYVGDA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44AFLA#issuecomment-523764396>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ADXKUNFFXKE4JEXM3LIAY63QFYVGDANCNFSM4IMPQMMA> > . >

liyingliu · 2019-08-23T02:08:00Z

Hi @gariel-google, thanks for the work. The new zip files work for me. I have tested the checkpoint trained on KITTI. The following is what I have:

depth result is different when inference with different batch_size:

	abs_rel	sq_rel	rms	log_rms	a1	a2	a3
batch_size=1	0.1262	0.9462	5.2214	0.2086	0.8470	0.9475	0.9774
batch_size=16	0.1305	1.0186	5.3237	0.2136	0.8389	0.9430	0.9751

When batch_size=1, we have the same result, therefore we should be having the same evaluating metrics. However, depth output doesn't have a consistent result when batch_size changes, is it the same case for you? where does the variation come from?

odometry result (ATE) when inference with batch_size=1:

	seq_09	seq_10
5-point	0.0231	0.0195
3-point	0.0170	0.0149

The plotted trajectories:
seq_09

seq_10

The odometry result looks quite bad for me. Do you have the same result? Since eigen split training set has overlap with odometry sequence 09 and 10, the ATE should be better than what you have stated in the paper (0.0231 vs 0.012 and 0.0195 vs 0.010)?

gariel-google · 2019-08-23T03:42:57Z

Thanks much @liyingliu for testing the checkpoints so quickly. I am happy that we are getting the exact same result on depth prediction. Regarding the batch size, we tested at 1. If batch-normalization is replaced everywhere by randomized layer normalization, the inference results do not depend on the batch size, as it should be. Due to an oversight, when we were obtaining the results for the paper, we left a few batch normalization layers in place. We fixed that since, but to be compatible with the checkpoints used for the paper, we needed to leave the batch-norms there, hence the dependence on the batch size. Regarding odometry, there may be a few explanations to that: 1. We used a more mature checkpoint for odometry, which seemed to converge slower than depth prediction. 2. We used inference time correction for the intrinsics (even though the result you're showing seems worse than even our uncorrected one). 3. There might be a difference in the way we stack the rotations and translations together to obtain a trajectory - this is a bit tricky. We can debug odometry together by us releasing the checkpoints used for odometry and the respective inferred trajectories. At this time I would like to ask you what you would like to prioritize - getting the code and checkpoint for imagenet initialization, so that you can reproduce the training, or getting the odometry evaluation right? Please let me know which one you prefer and I'll start there. It will take at least a few days, in any case. Thank you again for your help in debugging this.

…

On Thu, Aug 22, 2019 at 7:08 PM juju ***@***.***> wrote: Hi @gariel-google <https://github.com/gariel-google>, thanks for the work. The new zip files work for me. I have tested the checkpoint trained on KITTI. The following is what I have: - depth result is different when inference with different batch_size: abs_rel sq_rel rms log_rms a1 a2 a3 batch_size=1 0.1262 0.9462 5.2214 0.2086 0.8470 0.9475 0.9774 batch_size=16 0.1305 1.0186 5.3237 0.2136 0.8389 0.9430 0.9751 When batch_size=1, we have the same result, therefore we should be having the same evaluating metrics. However, depth output doesn't have a consistent result when batch_size changes, is it the same case for you? where does the variation come from? - odometry result (ATE) when inference with batch_size=1: seq_09 seq_10 5-point 0.0231 0.0195 3-point 0.0170 0.0149 The plotted trajectories: seq_09 [image: dfvauthorseq09] <https://user-images.githubusercontent.com/18667188/63561747-a02ee800-c58d-11e9-8462-285ec61e84e7.png> seq_10 [image: dfvauthorseq10] <https://user-images.githubusercontent.com/18667188/63561754-aa50e680-c58d-11e9-9f53-762a795cba98.png> The odometry result looks quite bad for me. Do you have the same result? Since eigen split training set has overlap with odometry sequence 09 and 10, the ATE should be better than what you have stated in the paper (0.0231 vs 0.012 and 0.0195 vs 0.010)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNERJ6UPVBOJ67HWQILQF5BBBA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD46446Y#issuecomment-524144251>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNHCTPAV6C2AHV2CWCLQF5BBBANCNFSM4IMPQMMA> .

liyingliu · 2019-08-24T12:31:58Z

Hi @gariel-google, Sorry for the late reply and thanks for your explanations.
Being able to reproduce the training would be a nice start for me. Thanks again for the work.
Also, Thank you in advance for releasing the odometry checkpoint and the respective inferred trajectories.

buaafish · 2019-08-26T07:58:13Z

Hi @gariel-google , thanks for your checkpoint. But I test intrinsic inference with your kitti checkpoint, the intrinsic matrix is not right.
The input images is one pair kitti images like this:

We use top two images to infer intrinsic matrix:
[[119.80293 0. 702.5139 ]
[ 0. 74.126114 -29.449604]
[ 0. 0. 1. ]]

But the ground truth intrinsic matrix:
[[241.67446312 0. 204.16801031]
[ 0. 246.28486827 59.000832 ]
[ 0. 0. 1. ]]

Why the intrinsic matrix is not right?

gariel-google · 2019-08-26T17:52:48Z

@liyingliu Cool, so I am aiming for releasing the code for initializing from an imagenet checkpoint and the respective checkpoint this week. @buaafish Let's try to debug it together, let me start by asking you some questions: 1. Do we have enough evidence to rule out an incorrect normalization of the images (0-1 vs 0-255), or some other error in running the inference? For example, were you able to reproduce the depth metrics? Were you able to obtain reasonable trajectories on the KITTI odometry set? 2. How did you calculate the intrinsics at inference time? Did you run the inference twice, swapping the order of images, and taking the average, like here <https://github.com/google-research/google-research/blob/master/depth_from_video_in_the_wild/model.py#L258> ? 3. As we write in the paper, there are two settings we used for learning the intrinsics: One with a constraint that the intrinsics are the same throughout the dataset (like in EuRoC), and the other is where we predict the intrinsics independently from each pair of images. In the second case, which is the one you are referring to, Eq. 3 and Fig. 9 show that the intrinsics are only correct within the accuracy imposed by rotations, and when there are no rotations, the error can be large. Have you tried to create a plot similar to Fig. 9 or the example you're showing is the only one you ran? Does that example have rotations?

…

On Mon, Aug 26, 2019 at 12:58 AM buaafish ***@***.***> wrote: Hi @gariel-google <https://github.com/gariel-google> , thanks for your checkpoint. But I test intrinsic inference with your kitti checkpoint, the intrinsic matrix is not right. The input images is one pair kitti images like this: [image: kitti2] <https://user-images.githubusercontent.com/6870525/63674721-b81a9c00-c819-11e9-90b4-17f32a181d82.png> We use top two images to infer intrinsic matrix: [[119.80293 0. 702.5139 ] [ 0. 74.126114 -29.449604] [ 0. 0. 1. ]] But the ground truth intrinsic matrix: [[241.67446312 0. 204.16801031] [ 0. 246.28486827 59.000832 ] [ 0. 0. 1. ]] Why the intrinsic matrix is not right? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNBFXD3JOZ3HJO72TFDQGOEKLA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5DTFVQ#issuecomment-524759766>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNHNQN4NLZB4R73PVATQGOEKLANCNFSM4IMPQMMA> .

buaafish · 2019-08-27T03:24:11Z

@gariel-google

Indeed I made incorrect normalization of the images(0-255). Then I add imagenet_norm operation to your test code.

The inference code like this :

  def _build_egomotion_test_graph(self):
    """Builds graph for inference of egomotion given two images."""
    with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
      self._image1 = tf.placeholder(
          tf.float32, [self.batch_size, self.img_height, self.img_width, 3],
          name='image1')
      self._image2 = tf.placeholder(
          tf.float32, [self.batch_size, self.img_height, self.img_width, 3],
          name='image2')
      if self.imagenet_norm:
        self._image1 = (self._image1 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        self._image2 = (self._image2 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        
      # The "compute_loss" scope is needed for the checkpoint to load properly.
      with tf.name_scope('compute_loss'):
        rot, trans, _, mat = motion_prediction_net.motion_field_net(
            images=tf.concat([self._image1, self._image2], axis=-1))
        inv_rot, inv_trans, _, inv_mat = (
            motion_prediction_net.motion_field_net(
                images=tf.concat([self._image2, self._image1], axis=-1)))
      intrinsic_mat = 0.5 * (mat + inv_mat)
      rot = transform_utils.matrix_from_angles(rot)
      inv_rot = transform_utils.matrix_from_angles(inv_rot)
      trans = tf.squeeze(trans, axis=(1, 2))
      inv_trans = tf.squeeze(inv_trans, axis=(1, 2))

      # rot and inv_rot should be the inverses on of the other, but in reality
      # they slightly differ. Averaging rot and inv(inv_rot) gives a better
      # estimator for the rotation. Similarly, trans and rot*inv_trans should
      # be the negatives one of the other, so we average rot*inv_trans and trans
      # to get a better estimator. TODO(gariel): Check if there's an estimator
      # with less variance.
      self.rot = 0.5 * (tf.linalg.inv(inv_rot) + rot)
      self.trans = 0.5 * (-tf.squeeze(
          tf.matmul(self.rot, tf.expand_dims(inv_trans, -1)), axis=-1) + trans)
      self.inf_intrinsic_mat = intrinsic_mat
      
  def inference_egomotion(self, image1, image2, sess):
    return sess.run([self.rot, self.trans, self.inf_intrinsic_mat],
                    feed_dict={
                        self._image1: image1,
                        self._image2: image2
                    })

I modified your code :

      if self.imagenet_norm:
        self._image1 = (self._image1 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        self._image2 = (self._image2 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD

     intrinsic_mat = 0.5 * (mat + inv_mat)

Then I read RGB images to feed to image1 and image2.

The intrinsic matrix is not right too.
[[ 375.67065 0. -579.6008 ]
[ 0. 89.65622 -65.86415]
[ 0. 0. 1. ]]

buaafish · 2019-08-27T03:31:36Z

@gariel-google Test code like this.

def main(_):
  seed = FLAGS.seed
  tf.set_random_seed(seed)
  np.random.seed(seed)
  random.seed(seed)

  if not gfile.Exists(FLAGS.checkpoint_dir):
    gfile.MakeDirs(FLAGS.checkpoint_dir)

  test_model = model.Model(
      boxify=FLAGS.boxify,
      data_dir=FLAGS.data_dir,
      file_extension=FLAGS.file_extension,
      is_training=False,
      foreground_dilation=FLAGS.foreground_dilation,
      learn_intrinsics=FLAGS.learn_intrinsics,
      learning_rate=FLAGS.learning_rate,
      reconstr_weight=FLAGS.reconstr_weight,
      smooth_weight=FLAGS.smooth_weight,
      ssim_weight=FLAGS.ssim_weight,
      translation_consistency_weight=FLAGS.translation_consistency_weight,
      rotation_consistency_weight=FLAGS.rotation_consistency_weight,
      batch_size=FLAGS.batch_size,
      img_height=FLAGS.img_height,
      img_width=FLAGS.img_width,
      weight_reg=FLAGS.weight_reg,
      depth_consistency_loss_weight=FLAGS.depth_consistency_loss_weight,
      queue_size=FLAGS.queue_size,
      input_file=FLAGS.input_file)
  
  _test(test_model, FLAGS.checkpoint_dir)

def readImages(path, subdir, name):
  filename = name+".png"
  filepath = os.path.join(path, subdir, filename)
  im = Image.open(filepath)
  im_array = np.array(im)
  img1 = im_array[:, 0:416, :]
  img2 = im_array[:, 416:832, :]
  return img1[np.newaxis, :, :, :], img2[np.newaxis, :, :, :]


def readMat(path, subdir, name):
  filename = name+"_cam.txt"
  filepath = os.path.join(path, subdir, filename)
  data_temp=[]
  with open(filepath) as fdata:
    line=fdata.readline()
    data_temp.append([float(i) for i in line.split(',')])
  return np.array(data_temp).reshape((3,3))
 
def readFileList(list_data_dir):
  with gfile.Open(list_data_dir) as f:
    frames = f.readlines()
    frames = [k.rstrip() for k in frames]
  subfolders = [x.split(' ')[0] for x in frames]
  frame_ids = [x.split(' ')[1] for x in frames]
  return subfolders, frame_ids


def _test(test_model, checkpoint_dir):
  checkpointpath = "./pretrained/cityscapes_kitti_learned_intrinsics/"
  
  saver = tf.train.import_meta_graph(checkpointpath+'model-1000977.meta')
  checkpoint = checkpointpath+"model-1000977"
  with tf.device('/cpu:0'):
    with tf.Session() as sess:
      sess.run(tf.local_variables_initializer())
      sess.run(tf.global_variables_initializer())
      logging.info('Loading checkpoint...')
      saver.restore(sess, checkpoint)
      logging.info('Reading data...')
      path = "./kitti/format_data"
      list_data_dir = "test.txt"
      subfolders, frame_ids = readFileList(list_data_dir)
      for (subdir, name) in zip(subfolders, frame_ids):  
        img1, img2 = readImages(path, subdir, name)
        logging.info('Start testing...')
        ret = test_model.inference_egomotion(img1, img2,sess)
        print(ret[2])
        mat = readMat(path, subdir, name)
        print(mat)
        logging.info('End testing...')
            
if __name__ == '__main__':
  app.run(main)

gariel-google · 2019-08-29T20:38:03Z

@liyingliu We just added the code for initialization from Imagenet, as well as some corrections in the hyperparameters for training. Unfortunately I was unable to obtain clearance to release the specific ImageNet checkpoint itself yet - sorry about that, things sometimes get more bureaucratic than expected. @buaafish Thanks for sharing your code, it's not easy for me though to spot a bug if there is one. Is there a chance you have an answer for me whether you were able to reproduce the depth inference metrics and/or whether the trajectories look reasonable? The Intrinsic matrix is so much off that I still suspect there is some sort of crude error somewhere. My next steps are to release the checkpoints we used for calculating odometry, with learned and given intrinsics, as well as the respective odometry trajectories. Then I can try to add a small piece of code for generating Fig. 9 in the paper for the intrinsics, which should hopefully resolve the intrinsics issue. Thank you all for your help debugging this, our goal is that everyone will be able to reproduce out results.

…

On Mon, Aug 26, 2019 at 8:31 PM buaafish ***@***.***> wrote: @gariel-google <https://github.com/gariel-google> Test code like this. def main(_): seed = FLAGS.seed tf.set_random_seed(seed) np.random.seed(seed) random.seed(seed) if not gfile.Exists(FLAGS.checkpoint_dir): gfile.MakeDirs(FLAGS.checkpoint_dir) test_model = model.Model( boxify=FLAGS.boxify, data_dir=FLAGS.data_dir, file_extension=FLAGS.file_extension, is_training=False, foreground_dilation=FLAGS.foreground_dilation, learn_intrinsics=FLAGS.learn_intrinsics, learning_rate=FLAGS.learning_rate, reconstr_weight=FLAGS.reconstr_weight, smooth_weight=FLAGS.smooth_weight, ssim_weight=FLAGS.ssim_weight, translation_consistency_weight=FLAGS.translation_consistency_weight, rotation_consistency_weight=FLAGS.rotation_consistency_weight, batch_size=FLAGS.batch_size, img_height=FLAGS.img_height, img_width=FLAGS.img_width, weight_reg=FLAGS.weight_reg, depth_consistency_loss_weight=FLAGS.depth_consistency_loss_weight, queue_size=FLAGS.queue_size, input_file=FLAGS.input_file) _test(test_model, FLAGS.checkpoint_dir) def readImages(path, subdir, name): filename = name+".png" filepath = os.path.join(path, subdir, filename) im = Image.open(filepath) im_array = np.array(im) img1 = im_array[:, 0:416, :] img2 = im_array[:, 416:832, :] return img1[np.newaxis, :, :, :], img2[np.newaxis, :, :, :] def readMat(path, subdir, name): filename = name+"_cam.txt" filepath = os.path.join(path, subdir, filename) data_temp=[] with open(filepath) as fdata: line=fdata.readline() data_temp.append([float(i) for i in line.split(',')]) return np.array(data_temp).reshape((3,3)) def readFileList(list_data_dir): with gfile.Open(list_data_dir) as f: frames = f.readlines() frames = [k.rstrip() for k in frames] subfolders = [x.split(' ')[0] for x in frames] frame_ids = [x.split(' ')[1] for x in frames] return subfolders, frame_ids def _test(test_model, checkpoint_dir): checkpointpath = "./pretrained/cityscapes_kitti_learned_intrinsics/" saver = tf.train.import_meta_graph(checkpointpath+'model-1000977.meta') checkpoint = checkpointpath+"model-1000977" with tf.device('/cpu:0'): with tf.Session() as sess: sess.run(tf.local_variables_initializer()) sess.run(tf.global_variables_initializer()) logging.info('Loading checkpoint...') saver.restore(sess, checkpoint) print(reader.IMAGENET_MEAN) print(reader.IMAGENET_SD) logging.info('Reading data...') path = "./kitti/format_data" list_data_dir = "test.txt" subfolders, frame_ids = readFileList(list_data_dir) for (subdir, name) in zip(subfolders, frame_ids): img1, img2 = readImages(path, subdir, name) logging.info('Start testing...') ret = test_model.inference_egomotion(img1, img2,sess) print(ret[2]) mat = readMat(path, subdir, name) print(mat) logging.info('End testing...') if __name__ == '__main__': app.run(main) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNHNC2IDOE7JDYVIPN3QGSN25A5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5GLS5Q#issuecomment-525121910>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNHON74T2IYYN5MCBTTQGSN25ANCNFSM4IMPQMMA> .

liyingliu · 2019-08-31T11:18:26Z

@gariel-google Understand and thanks! Looking forward to exchanging the odometry results.

gariel-google · 2019-09-13T22:38:13Z

@liyingliu we just released the odometry results, and code for generating trajectories form checkpoints. @buaafish intrinsics is coming next.

…

On Sat, Aug 31, 2019 at 4:18 AM juju ***@***.***> wrote: @gariel-google <https://github.com/gariel-google> Understand and thanks! Looking forward to exchanging the odometry results. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNCLGJXYI4KIP2SS55TQHJHRBA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5TKYBY#issuecomment-526822407>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADXKUNGKRK7EEGWIDAEODYDQHJHRBANCNFSM4IMPQMMA> .

cognitiveRobot · 2019-11-11T10:04:46Z

@liyingliu We just added the code for initialization from Imagenet, as well as some corrections in the hyperparameters for training. Unfortunately I was unable to obtain clearance to release the specific ImageNet checkpoint itself yet - sorry about that, things sometimes get more bureaucratic than expected. @buaafish Thanks for sharing your code, it's not easy for me though to spot a bug if there is one. Is there a chance you have an answer for me whether you were able to reproduce the depth inference metrics and/or whether the trajectories look reasonable? The Intrinsic matrix is so much off that I still suspect there is some sort of crude error somewhere. My next steps are to release the checkpoints we used for calculating odometry, with learned and given intrinsics, as well as the respective odometry trajectories. Then I can try to add a small piece of code for generating Fig. 9 in the paper for the intrinsics, which should hopefully resolve the intrinsics issue. Thank you all for your help debugging this, our goal is that everyone will be able to reproduce out results.
…
On Mon, Aug 26, 2019 at 8:31 PM buaafish @.***> wrote: @gariel-google https://github.com/gariel-google Test code like this. def main(_): seed = FLAGS.seed tf.set_random_seed(seed) np.random.seed(seed) random.seed(seed) if not gfile.Exists(FLAGS.checkpoint_dir): gfile.MakeDirs(FLAGS.checkpoint_dir) test_model = model.Model( boxify=FLAGS.boxify, data_dir=FLAGS.data_dir, file_extension=FLAGS.file_extension, is_training=False, foreground_dilation=FLAGS.foreground_dilation, learn_intrinsics=FLAGS.learn_intrinsics, learning_rate=FLAGS.learning_rate, reconstr_weight=FLAGS.reconstr_weight, smooth_weight=FLAGS.smooth_weight, ssim_weight=FLAGS.ssim_weight, translation_consistency_weight=FLAGS.translation_consistency_weight, rotation_consistency_weight=FLAGS.rotation_consistency_weight, batch_size=FLAGS.batch_size, img_height=FLAGS.img_height, img_width=FLAGS.img_width, weight_reg=FLAGS.weight_reg, depth_consistency_loss_weight=FLAGS.depth_consistency_loss_weight, queue_size=FLAGS.queue_size, input_file=FLAGS.input_file) _test(test_model, FLAGS.checkpoint_dir) def readImages(path, subdir, name): filename = name+".png" filepath = os.path.join(path, subdir, filename) im = Image.open(filepath) im_array = np.array(im) img1 = im_array[:, 0:416, :] img2 = im_array[:, 416:832, :] return img1[np.newaxis, :, :, :], img2[np.newaxis, :, :, :] def readMat(path, subdir, name): filename = name+"_cam.txt" filepath = os.path.join(path, subdir, filename) data_temp=[] with open(filepath) as fdata: line=fdata.readline() data_temp.append([float(i) for i in line.split(',')]) return np.array(data_temp).reshape((3,3)) def readFileList(list_data_dir): with gfile.Open(list_data_dir) as f: frames = f.readlines() frames = [k.rstrip() for k in frames] subfolders = [x.split(' ')[0] for x in frames] frame_ids = [x.split(' ')[1] for x in frames] return subfolders, frame_ids def _test(test_model, checkpoint_dir): checkpointpath = "./pretrained/cityscapes_kitti_learned_intrinsics/" saver = tf.train.import_meta_graph(checkpointpath+'model-1000977.meta') checkpoint = checkpointpath+"model-1000977" with tf.device('/cpu:0'): with tf.Session() as sess: sess.run(tf.local_variables_initializer()) sess.run(tf.global_variables_initializer()) logging.info('Loading checkpoint...') saver.restore(sess, checkpoint) print(reader.IMAGENET_MEAN) print(reader.IMAGENET_SD) logging.info('Reading data...') path = "./kitti/format_data" list_data_dir = "test.txt" subfolders, frame_ids = readFileList(list_data_dir) for (subdir, name) in zip(subfolders, frame_ids): img1, img2 = readImages(path, subdir, name) logging.info('Start testing...') ret = test_model.inference_egomotion(img1, img2,sess) print(ret[2]) mat = readMat(path, subdir, name) print(mat) logging.info('End testing...') if name == 'main': app.run(main) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNHNC2IDOE7JDYVIPN3QGSN25A5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5GLS5Q#issuecomment-525121910>, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXKUNHON74T2IYYN5MCBTTQGSN25ANCNFSM4IMPQMMA .

@gariel-google did you get clearance to release the specific ImageNet checkpoint? I want to try with that.
In my case, it not learning if I train without any checkpoint. If I provide checkpoint from https://github.com/google-research/google-research/tree/master/depth_from_video_in_the_wild#pretrained-checkpoints-and-respective-depth-metrics
I get the following

I1111 21:44:55.547858 140513105188608 train.py:167] Attempting to resume training from depth_from_video_in_the_wild/kitti_learned_intrinsics/...
I1111 21:44:55.548219 140513105188608 train.py:169] Last checkpoint found: None
I1111 21:44:55.548329 140513105188608 train.py:176] Training...

Not learning.
Tried with 'learning_rate', 1e-4 to 1e-6 and 'batch_size', 4 and 2.
Any thoughts? Thanks.

liyingliu · 2019-11-13T08:40:12Z

@cognitiveRobot If your question is to restore and train the checkpoint that the author provided. Then, you could try to add a file named "checkpoint" in your checkpoint folder (the folder contains .index, .meta .data-xxxx). The content in "checkpoint" file can be the following:
model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900"

gariel-google · 2019-11-13T23:01:05Z

@cognitiveRobot the pretrained checkpoint from resnet18 that we used was taken from here <http://cognitiveRobot>. We cannot release it here because it was taken from somewhere else. Sorry about that...

…

On Wed, Nov 13, 2019 at 12:40 AM juju ***@***.***> wrote: @cognitiveRobot <https://github.com/cognitiveRobot> If your question is to restore and train the checkpoint that the author provided. Then, you could try to add a file named "checkpoint" in your checkpoint folder (the folder contains .index, .meta .data-xxxx). The content in "checkpoint" file can be the following: model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUND5PBGYDGCEQCQEC6LQTO4QBA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5KOAQ#issuecomment-553297666>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXKUNEFXKA322UKAGSW7WDQTO4QBANCNFSM4IMPQMMA> .

gariel-google · 2019-11-13T23:03:05Z

Sorry, here's the link to where we took the pretrained checkpoint from: https://pytorch.org/docs/stable/torchvision/models.html

…

On Wed, Nov 13, 2019 at 3:00 PM Ariel Gordon ***@***.***> wrote: @cognitiveRobot the pretrained checkpoint from resnet18 that we used was taken from here <http://cognitiveRobot>. We cannot release it here because it was taken from somewhere else. Sorry about that... On Wed, Nov 13, 2019 at 12:40 AM juju ***@***.***> wrote: > @cognitiveRobot <https://github.com/cognitiveRobot> If your question is > to restore and train the checkpoint that the author provided. Then, you > could try to add a file named "checkpoint" in your checkpoint folder (the > folder contains .index, .meta .data-xxxx). The content in "checkpoint" file > can be the following: > model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900" > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#46?email_source=notifications&email_token=ADXKUND5PBGYDGCEQCQEC6LQTO4QBA5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED5KOAQ#issuecomment-553297666>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ADXKUNEFXKA322UKAGSW7WDQTO4QBANCNFSM4IMPQMMA> > . >

cognitiveRobot · 2019-11-20T00:55:19Z

@gariel-google, thanks. I will test. :)

StephenStorm · 2020-04-13T06:26:16Z

@liyingliu
sorry to bother you, but I have some questions I'd like to ask you. I tried to infer the depth map using the existing checkpoint cityscape_kitti, but the depth value I read directly from the '.npy' file was far from real depth, whether it was my own images or images in cityscape. Did I do something wrong? Or further operations are required to obtain true depth values. Thank you very much.
'inference.py' from (https://github.com/tensorflow/models/blob/master/research/struct2depth/inference.py)
img_width and img_height are defaults (416,128)

liyingliu · 2020-04-13T06:52:24Z

Hi, there is an unknown scale between the predicted depth value from the network and the real depth value. You need to scale the predicted depth by such a scale factor to have true depth value. You could use the median of your prediction divided by the median of ground truth to be the scale.

gariel-google · 2020-04-13T17:41:04Z

The predicted depth is up to an unknown scale factor. When evaluating, this factor is found by normalizing the medians of the predicted and inferred depth this is standard in this line of publications). Are you observing strong discrepancies even beyond that global factor?

…

On Sun, Apr 12, 2020 at 11:26 PM stephen ***@***.***> wrote: @liyingliu <https://github.com/liyingliu> sorry to bother you, but I have some questions I'd like to ask you. I tried to infer the depth map using the existing checkpoint cityscape_kitti, but the depth value I read directly from the '.npy' file was far from real depth, whether it was my own images or images in cityscape. Did I do something wrong? Or further operations are required to obtain true depth values. Thank you very much. 'inference.py' from ( https://github.com/tensorflow/models/blob/master/research/struct2depth/inference.py ) img_width and img_height are defaults (416,128) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXKUNC23ORNPVHH5B4GUFTRMKWBLANCNFSM4IMPQMMA> .

StephenStorm · 2020-04-14T06:42:30Z

@liyingliu @gariel-google
Thank you very much for your two patient guidance, I will try again based on your suggestions.
If I encounter other problems, I will consult you again.
@gariel-google
By the way, I don't quite understand what you mean by "observing strong discrepancies even beyond that global factor". Do you mean whether I found the scale factor does not apply to the entire image ?
thank you both again.

player1321 · 2020-04-17T08:26:32Z

@gariel-google
Hello, I'm new to this line.
I tried to run trajectory_inference.py and evaluate the odometry result, but the output format looks different from the ground truth downloaded from KITTI odometry.
Will you release some examples about how to translate the format? Or maybe I missed some examples in this repo?

gariel-google · 2020-04-20T17:37:58Z

@StephenStorm by "observing strong discrepancies even beyond that global factor" I mean: If you multiply the predicted depth by a factor such that its median matches the median groundtruth depth, do you still see significant discrepancies?

@player1321 In the KITTI format, the first 3 columns are the (x, y, z) position of the car if I'm not mistaken. This code generates the inferred (x, y, z)-s of the trajectory.

player1321 · 2020-04-25T01:23:59Z

@gariel-google Thanks a lot for your patient guidance.
I tested your released model. It seems that the model trained for odometry works well on trajectory inference but not so well on depth estimation, and the model trained on cityscapes&kitti works well on depth estimation but not so well on trajectory inference.
Is this caused by the gap between cityscapes and kitti? Or are there some tricks for improving depth estimation and trajectory inference respectively?

gariel-google · 2020-04-27T21:01:44Z

@player1321 I looked up the checkpoint we used for KITTI odometry (with given intrinsics), and its depth prediction metric is 0.1321, which is indeed worse than the KITTI-only depth error that we report in the paper for given intrinsics (0.129). Is that your concern? We did observe that for odometry results tend to improve the longer we train, whereas for depth they tend to become slightly worse and more noise beyond some point. We did not try to evaluate the cityscapes + KITTI checkpoints for odometry, and I don't know how it would perform.

Would you like to share your numbers on both evaluations?

player1321 · 2020-04-29T15:54:50Z

@gariel-google Thanks for your reply.
Actually I have trouble generating the numbers. I'm using this evaluation tool, and the visualization looks fine but the numbers are very different from yours.
Here is the visualization result:
Seq-09

Seq-10

And here are the numbers:

KITTI
Sequence: 9
Trans. err. (%): 6.572
Rot. err. (deg/100m): 2.254
ATE (m): 40.574
RPE (m): 0.068
RPE (deg): 0.101
Sequence: 10
Trans. err. (%): 13.659
Rot. err. (deg/100m): 3.904
ATE (m): 145.783
RPE (m): 0.100
RPE (deg): 0.097
KITTI + cityscapes
Sequence: 9
Trans. err. (%): 7.727
Rot. err. (deg/100m): 2.220
ATE (m): 45.602
RPE (m): 0.088
RPE (deg): 0.093
Sequence: 10
Trans. err. (%): 13.032
Rot. err. (deg/100m): 2.529
ATE (m): 88.142
RPE (m): 0.104
RPE (deg): 0.082

It seems that the definition of the ATE is not the same as yours. Would you share some evaluation tools? Or any reference about the definitions can be recommended?

frobinet · 2020-05-05T10:58:46Z

@gariel-google Thanks for sharing the code and helping us reproduce the results. I'm able to reproduce similar figures to the paper using the odometry checkpoints, but the scale seems to be wrong. Is the egomotion network supposed to output positions with the real-world scale immediately, or is it assumed that we're performing a scaling as postprocessing? If yes, which type of scaling is used in the paper?

EDIT: From the looks of it, I think the scale-7dof scaling technique is used (see https://github.com/Huangying-Zhan/kitti-odom-eval)

frobinet · 2020-05-05T13:13:55Z

Also, I realized that the given intrinsics weights link with kitti odometry is wrong: it references references the cityscape model from the depth table right above: https://www.googleapis.com/download/storage/v1/b/gresearch/o/depth_from_video_in_the_wild%2Fcheckpoints%2Fcityscapes_learned_intrinsics.zip?generation=1566493765410932&alt=media

gariel-google · 2020-05-06T18:14:04Z

@player1321 The definition of ATE we used follows Zhou et al. Our ATE eval is based on theirs, which is given here: https://github.com/tinghuiz/SfMLearner/tree/master/kitti_eval. The numbers typically are in the 10^-2-s. Your's are in meters, and are large-ish, so indeed it's probably the same definition.

Regarding the translation error, we didn't check if tor the city+KITTI checkpoint, and while your numbers are different, it seems that your numbers are not far from ours, assuming you tested checkpoints with learned intrinsics (right?).

@frobinet The odometry predictions are scale-less, just like the depth predictions. We normalized the entire trajectory by its length. That is, we scaled the predicted trajectory uniformly until its total length was identical to the GT length.

gariel-google · 2020-05-06T18:14:54Z

@frobinet I'll have a look at the model links and get back to you, thanks for pointing this out.

frobinet · 2020-05-14T07:28:10Z

@gariel-google Thanks for helping with this! Any news about releasing the weights for the given intrinsics odometry model?

player1321 · 2020-05-14T15:12:03Z

@gariel-google Thanks for your guidance! It's very helpful!
You are right, I'm using checkpoints with learned intrinsics.
I have a new question now:

In your paper, you show that the network can carve the silhouette of the people out of a rough mask, but in my test, it only happens in a few cases, and in most of the cases the output is like the region of the car, that is, the network cannot catch the residual translation correctly.
Does this happen in your test?

NHirose · 2020-06-19T07:17:02Z

@gariel-google Thanks for sharing the code and helping us.
Even though I am using your pretrained model(learned intrinsic), I can not reproduce the results of egomotion.
I got 0.0259@seq-09 and 0.0210@seq-10, which are much worse than values on your paper.

Can you provide trajectories with poses and/or code to reproduce the same value?
Shared trajecotries only include XYZ positions. If you can add poses, it is very helpful to reproduce the results.

gariel-google · 2020-07-11T00:17:32Z

Sorry for the delayed response.

@NHirose we released the trajectories here - please see the table below the title, under the links "trajectory".

@player1321 We didn't evaluate quantitatively the prediction of residual motion. Qualitatively it looks good in most cases - I know it sounds hand-wavy, but unfortunately there is no number that I can quote to support this quantitatively.

NHirose · 2020-07-11T04:14:11Z

@gariel-google Thank you for your replying. However your released trajectory file only includes XYZ position. I additionally need roll pitch yaw angles to reproduce the values on your paper.

Or can you provide the evaluation file to reproduce the value about ego motion on your paper? That can help me to find differences!!

adizhol · 2020-08-03T08:29:17Z

Hi,

I'm getting an error when loading EuroC MAV checkpoint for training. [depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll]

  saver = train_model.saver
  with sv.managed_session(config=config) as sess:
      saver.restore(sess, 'depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll/model-1797000')

When using the same code as above with checkpoint saved after training from scratch - no errors.

Key MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights not found in checkpoint
[[node save/RestoreV2 (defined at /depth_from_video_in_the_wild/model.py:117) ]]

@gariel-google

gariel-google · 2020-08-03T16:45:53Z

Did you run the code as is or with modifications? I am asking because a few months ago, when I uploaded the checkpoint, I did verify that it loads, so I am trying to track down the reason for the change in behavior.

…

On Mon, Aug 3, 2020 at 1:29 AM adizhol ***@***.***> wrote: Hi, I'm getting an error when loading EuroC MAV checkpoint for training. [depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll] Key MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights not found in checkpoint [[node save/RestoreV2 (defined at /depth_from_video_in_the_wild/model.py:117) ]] @gariel-google <https://github.com/gariel-google> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADXKUNEGVWFILC3KXWQRVF3R6ZYOZANCNFSM4IMPQMMA> .

adizhol · 2020-08-03T16:51:40Z

No modifications. The cityscape and kitti snapshots load normally. Thank you

…

On Mon, Aug 3, 2020, 19:46 Ariel Gordon ***@***.***> wrote: Did you run the code as is or with modifications? I am asking because a few months ago, when I uploaded the checkpoint, I did verify that it loads, so I am trying to track down the reason for the change in behavior. On Mon, Aug 3, 2020 at 1:29 AM adizhol ***@***.***> wrote: > Hi, > > I'm getting an error when loading EuroC MAV checkpoint for training. > [depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll] > > Key > MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights > not found in checkpoint > [[node save/RestoreV2 (defined at > /depth_from_video_in_the_wild/model.py:117) ]] > > @gariel-google <https://github.com/gariel-google> > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #46 (comment) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/ADXKUNEGVWFILC3KXWQRVF3R6ZYOZANCNFSM4IMPQMMA > > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADD2556GIDE6PEKMOL3MX2LR63SVBANCNFSM4IMPQMMA> .

mathmax12 · 2020-08-04T22:29:21Z

@gariel-google @adizhol I am facing the same issue, i.e., the checkpoints of " cityscape and kitti" works well with the model.py.
But the euro one complains.

sess = tf.Session()
saver = tf.train.Saver()
saver.restore(sess, 'euroc_ckpt_ViconRoom1-01/model-2091000')

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
	 [[node save_3/RestoreV2 (defined at <ipython-input-13-a5df47d040a7>:2) ]]

Are there any updates about this?

Thanks

gariel-google · 2020-08-04T22:33:28Z

Thanks for pointing this out. This seems to be a bug on our side, then. I will look into it, but it may take some time till I can get to it and debug. Sorry about that :-)

adizhol · 2020-08-09T19:28:42Z

After training on custom data, I'm getting different depth when training and when doing inference (on the same images).
When training, the depth is smooth and when inferencing it's very rough and inaccurate.
I'm using the inference code from struct2depth inference.py.
Has anyone experienced something like this?

update
I just saw that I had messages like
util.py:198] Did not find var depth_prediction/conv2_2/bn_1/moving_mean in checkpoint

The following variables in the checkpoint were not loaded:
util.py:210] MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights

update
I changed
vars_to_restore = util.get_vars_to_save_and_restore(model_ckpt)
to:

  vars_to_restore = [v for v in tf.trainable_variables()]

And the error\warning is gone, but the problem still exists

update
Using Batch Normalization or Randomized Layer Normalization in "train" mode during inference yields results like during training.
I don't understand why do you add the Gaussian noise only when is_train=True.
In the code you ramp-up the stddev of the noise for the Randomized Layer Normalization, but you didn't mention anything about it in the paper.

Also, during inference you're inferring on a flipped image, and taking the minimum with the no-flipped image.
This should be used only if trained with flip_mode != none

VladimirYugay · 2020-12-10T23:59:19Z

Hey there, I'm still facing this:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
[[node save_3/RestoreV2 (defined at :2) ]]

Currently, I'm using the latest code version:

vars_to_restore = [ v for v in tf.trainable_variables() if v.op.name.startswith(DEPTH_SCOPE + '/conv') ] vars_to_restore = { v.op.name[len(DEPTH_SCOPE) + 1:]: v for v in vars_to_restore }
@adizhol

jaysonph · 2020-12-15T08:44:46Z

@VladimirYugay @adizhol @mathmax12

Run tf.reset_default_graph() before restoring the checkpoint to cope with the error below

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
[[node save_3/RestoreV2 (defined at :2) ]]

Hzy98 · 2021-10-21T13:57:36Z

@cognitiveRobot If your question is to restore and train the checkpoint that the author provided. Then, you could try to add a file named "checkpoint" in your checkpoint folder (the folder contains .index, .meta .data-xxxx). The content in "checkpoint" file can be the following: model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900"

I have downloaded the checkpoints provided by the author, extracted them and put them in the folder, and also did as you said，added a checkpoint. In the run.sh, I wrote as follows:
"--imagenet_ckpt=/root/depth_from_video_in_the_wild/MY_IMAGENET_CHECKPOINT/model-248900"
But it's still wrong：
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key conv1/bn/beta not found in checkpoint
[[node save_1/RestoreV2 (defined at /miniconda3/envs/huawei/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save_1/RestoreV2/_1067]]
(1) Not found: Key conv1/bn/beta not found in checkpoint
[[node save_1/RestoreV2 (defined at /miniconda3/envs/huawei/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Is my path wrong？Can you help me，thank you.

VladimirYugay · 2021-10-21T17:36:29Z

@Hzy98 Check this this, it's a refactored version of this paper + fixes

Hzy98 · 2021-10-22T04:52:25Z

@Hzy98 Check this this, it's a refactored version of this paper + fixes

Thank you. I'll try this.

liyingliu closed this as completed Mar 24, 2020

depth_from_video_in_the_wild: not able to reproduce the result #46

depth_from_video_in_the_wild: not able to reproduce the result #46

Comments

liyingliu commented Aug 17, 2019

gariel-google commented Aug 17, 2019 via email

liyingliu commented Aug 18, 2019

gariel-google commented Aug 18, 2019 via email

liyingliu commented Aug 18, 2019

gariel-google commented Aug 19, 2019 via email

liyingliu commented Aug 19, 2019

happinessoverdue commented Aug 20, 2019

gariel-google commented Aug 20, 2019 via email

liyingliu commented Aug 20, 2019

gariel-google commented Aug 20, 2019 via email

liyingliu commented Aug 20, 2019

gariel-google commented Aug 22, 2019 via email

liyingliu commented Aug 22, 2019

gariel-google commented Aug 22, 2019 via email

gariel-google commented Aug 22, 2019 via email

liyingliu commented Aug 23, 2019

gariel-google commented Aug 23, 2019 via email

liyingliu commented Aug 24, 2019

buaafish commented Aug 26, 2019

gariel-google commented Aug 26, 2019 via email

buaafish commented Aug 27, 2019

buaafish commented Aug 27, 2019 • edited

gariel-google commented Aug 29, 2019 via email

liyingliu commented Aug 31, 2019

gariel-google commented Sep 13, 2019 via email

cognitiveRobot commented Nov 11, 2019 • edited

liyingliu commented Nov 13, 2019

gariel-google commented Nov 13, 2019 via email

gariel-google commented Nov 13, 2019 via email

cognitiveRobot commented Nov 20, 2019

StephenStorm commented Apr 13, 2020

liyingliu commented Apr 13, 2020

gariel-google commented Apr 13, 2020 via email

StephenStorm commented Apr 14, 2020

player1321 commented Apr 17, 2020 • edited

gariel-google commented Apr 20, 2020

player1321 commented Apr 25, 2020

gariel-google commented Apr 27, 2020

player1321 commented Apr 29, 2020 • edited

frobinet commented May 5, 2020 • edited

frobinet commented May 5, 2020

gariel-google commented May 6, 2020

gariel-google commented May 6, 2020

frobinet commented May 14, 2020 • edited

player1321 commented May 14, 2020 • edited

NHirose commented Jun 19, 2020

gariel-google commented Jul 11, 2020

NHirose commented Jul 11, 2020

adizhol commented Aug 3, 2020 • edited

gariel-google commented Aug 3, 2020 via email

adizhol commented Aug 3, 2020 via email

mathmax12 commented Aug 4, 2020

gariel-google commented Aug 4, 2020

adizhol commented Aug 9, 2020 • edited

VladimirYugay commented Dec 10, 2020 • edited

jaysonph commented Dec 15, 2020 • edited

Hzy98 commented Oct 21, 2021

VladimirYugay commented Oct 21, 2021

Hzy98 commented Oct 22, 2021

buaafish commented Aug 27, 2019 •

edited

cognitiveRobot commented Nov 11, 2019 •

edited

player1321 commented Apr 17, 2020 •

edited

player1321 commented Apr 29, 2020 •

edited

frobinet commented May 5, 2020 •

edited

frobinet commented May 14, 2020 •

edited

player1321 commented May 14, 2020 •

edited

adizhol commented Aug 3, 2020 •

edited

adizhol commented Aug 9, 2020 •

edited

VladimirYugay commented Dec 10, 2020 •

edited

jaysonph commented Dec 15, 2020 •

edited