You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am coming here after seeing your oral at NeurIPS and talking with you at the poster session.
I was looking at your model architecture for segmentation, and noticed that you downsample the full resolution image 4 times before feeding it into the implicit layer.
Would it be to time-consuming to train on the full-resolution image? Did you try anyway?
The text was updated successfully, but these errors were encountered:
Thanks for stopping by at our poster! These two downsamplings were added just as initial processing; for smaller images we don't downsample them at all. For example, in 32x32 CIFAR experiments, we passed in the images at their original resolution without such downsampling: https://github.com/locuslab/mdeq/blob/master/experiments/cifar/cls_mdeq_LARGE.yaml#L17
For extremely large images (e.g., 2000x1000), we do perform these two strided-convolutions for efficiency purposes (as is done by almost all papers that use Cityscapes segmentation :P). To answer your question, I did try to train with full-resolution, and I can get slightly better results, but the model will be incredibly slow because:
We need to process larger images, so each f(x;z) evaluation will be a lot more expensive; and
It also takes more iterations to converge to the fixed point because it's of very high dimensionality.
Hi @jerrybai1995 ,
I am coming here after seeing your oral at NeurIPS and talking with you at the poster session.
I was looking at your model architecture for segmentation, and noticed that you downsample the full resolution image 4 times before feeding it into the implicit layer.
Would it be to time-consuming to train on the full-resolution image? Did you try anyway?
The text was updated successfully, but these errors were encountered: