Cityscapes experiment #2

maria8899 · 2019-02-07T17:09:26Z

Hi,
Thanks for open sourcing the code, this is great!
Could you share your json parameter file for cityscapes?
Also, I think it is is missing the file depth_mean.npy to be able to run it.

Thanks.

The text was updated successfully, but these errors were encountered:

ozansener · 2019-02-08T14:20:27Z

We will update the code in a few days with depth_mean.npy and config files. But, in the mean time here are the config files if you do not want to wait:

Parameter Set w/ Approximation:
optimizer=Adam|batch_size=8|lr=0.0005|dataset=cityscapes|normalization_type=none|algorithm=mgda|use_approximation=True
Parameter Set w/o Approximation:
optimizer=Adam|batch_size=8|lr=0.0001|dataset=cityscapes|normalization_type=none|algorithm=mgda|use_approximation=False

depth_mean.npy is the average depth map of training set. We use it for making the input zero mean.

maria8899 · 2019-02-08T15:50:16Z

Thanks. Will the update code be running with pytorch 1.0? I am getting into a few problems to run it since some features are deprecated (e.g. volatile variables, .data[0], etc.)

maria8899 · 2019-02-08T17:29:05Z

I am also having an error with the FW step:
sol, min_norm = MinNormSolver.find_min_norm_element([grads[t] for t in tasks])

----> 1 sol, min_norm = MinNormSolver.find_min_norm_element([grads[t] for t in tasks])

~/MultiObjectiveOptimization/min_norm_solvers.py in find_min_norm_element(vecs)
     99         # Solution lying at the combination of two points
    100         dps = {}
--> 101         init_sol, dps = MinNormSolver._min_norm_2d(vecs, dps)
    102 
    103         n=len(vecs)

~/MultiObjectiveOptimization/min_norm_solvers.py in _min_norm_2d(vecs, dps)
     42                     dps[(i, j)] = 0.0
     43                     for k in range(len(vecs[i])):
---> 44                         dps[(i,j)] += torch.dot(vecs[i][k], vecs[j][k]).data[0]
     45                     dps[(j, i)] = dps[(i, j)]
     46                 if (i,i) not in dps:

RuntimeError: dot: Expected 1-D argument self, but got 4-D

What exactly should containgrads and [grads[t] for t in tasks]?

Edit: the solution is to replace with
torch.dot(vecs[i][k].view(-1), vecs[j][k].view(-1)).item()

SimonVandenhende · 2019-02-11T20:01:45Z

I can confirm that those changes worked for me to get the code running with pytorch 1.0.
I was able to reproduce the results for the single task models, but so far no luck with the mgda method.
Did you have to include any other changes @maria8899

ozansener · 2019-02-11T20:54:00Z

@r0456230 Can you tell me what exactly you are trying to reproduce? The config files I put as a comment should give exact results of mgda w/ and w/o approximation.

Please note that; we report disparity metric in paper and compute depth metric in the code. Depth map is later separately converted into disparity as post-processing. If the issue is depth, this should explain it.

mIoU should be exactly same with what reported in the code and the paper. We used the parameters I posted as a comment.

ozansener · 2019-02-11T20:57:06Z

@maria8899 Although we are planning to support pytorch 1.0, I am not sure when will it be. I will also update the ReadMe with the exact versions of each Python module we used. Pytorch was 0.3.1

SimonVandenhende · 2019-02-12T15:50:18Z

@ozansener I was able to reproduce the results from the paper for the single task models using your code (depth, instance segmentation and semantic segmentation on cityscapes).
However, when I run the code with the parameters posted above, after 50 epochs the models seems to be far removed from the results obtained in the paper.

maria8899 · 2019-02-13T12:03:59Z

I think I have managed to make it work with pytorch 1.0, but I still need to check the results and train it fully.
@r0456230 I haven't done much other changes, the main problem was in this FW step. Have you setup the scales/tasks correctly in the json file?

YoungGer · 2019-02-18T03:10:11Z

@maria8899 @r0456230 could you please tell me how to solve depth_mean.npy missing problem?

I tried the code below, but I'm not sure if it's correct

depth_mean = np.mean([depth!=0])
depth[depth!=0] = (depth[depth!=0] - depth_mean) / self.DEPTH_STD
#depth[depth!=0] = (depth[depth!=0] - self.DEPTH_MEAN[depth!=0]) / self.DEPTH_STD

maria8899 · 2019-02-18T11:58:57Z

@YoungGer you need to compute a mean image (per pixel) using all training images (or just a few to get an approximation) of the Cityscapes's disparity dataset.

YoungGer · 2019-02-26T08:04:22Z

@YoungGer you need to compute a mean image (per pixel) using all training images (or just a few to get an approximation) of the Cityscapes's disparity dataset.

I know, thank you for your help.

JulienSiems · 2019-03-20T10:07:10Z

@YoungGer Have you noticed, that the find_min_norm_element method actually uses the projected gradient descent method? Only find_min_norm_element_FW is the Frank-Wolfe algorithm as discussed in the paper. They are only guaranteed to be equivalent for a number of tasks equal to 2.

kilianyp · 2019-03-29T10:52:07Z

EDIT: Obviously I realised right after sending that question 2 is because of the optimization. Question 1 still remains.

Hi @ozansener ,
thanks for publishing your code!

I have two questions after reading this answer by @maria8899

Edit: the solution is to replace with
torch.dot(vecs[i][k].view(-1), vecs[j][k].view(-1)).item()

In your code the z variable returned by the 'backbone' of the network is passed to each task.
Its gradient is then used in the find_min_norm algorithm.

First of all as maria noted, the gradient is a 4D variable that needs to be reshaped first to 1D in torch 1.0.1.
I compared the behavior to torch 0.3.1 and it does lead to the same result, but it raised some questions, which very well might come from my missing understanding of your paper.

The gradient still has the batch dimension, why do you calculate the the min_norm_point between all samples as one big vector instead of for example averaging or summing over the batch dimension? Isn't this what is effectively happen after the reshaping? This is just intuitively speaking comparing it with stochastic gradient descent.
Why is there a batch dimension anyway? From the paper it is not quite clear to me what should be fed into the FrankWolfeSolver, but shouldn't it be the gradient of some parameters instead of an output variable? Or does that not matter and lead to the same result?

Thanks a lot!

ozansener · 2019-05-01T13:34:56Z

@kilsenp First let me answer the 2.

2: You are right if you apply MGDA directly, it should be gradients with respect to parameters. However, one of the main contributions of the paper is showing that instead you can actually feed gradients with respect to the representations. This is basically the Section 3.3 of the paper and what we are computing in the code is $\nabla_Z$.
1: No, you need batch dimension since forward pass of the network is different for each image. You can read Section 3.3 of the paper in detail to understand whats going on.

liyangliu · 2020-04-14T09:15:28Z

Hi, @maria8899, @kilsenp,
Have you reproduced the results on MultiMNIST or CityScapes? Thanks.

youandeme · 2020-05-02T09:05:51Z

Hi,@liyangliu ,
Have you reproduced the results on MultiMNIST? I have tried but only got the result like grid search.Would you like to tell me the params you chosen?Thanks.

liyangliu · 2020-05-04T03:43:55Z

@youandeme, I haven't reproduced the results on MultiMNIST. I used the same hyper parameters mentioned by the author in #9, but can not surpass the uniform scaling baseline. Also, I noticed in the "Gradient Surgery" paper (supplementary materials), other researchers report different results on MultiMNIST from this MOO paper. So I doubt that others also have difficulty in reproducing the results on MultiMNIST following this MOO paper.

ozansener · 2020-05-04T07:55:59Z

@liyangliu @youandeme How are you evaluating the MultiMNIST? We did not release any test set, actually there is no test set. The code generates random test set every time you run. For all modules, you simply use the hyper-params I put. Then, you save every epoch result and choose the best epoch with the best val accuracy. Then, you call MultiMNIST test which will generate random test set and evaluate it. If you call the MultiMNIST loader with test param, it should do the trick. If you evaluate this way, the result are not exactly matching since test set is randomly generated, but the order of methods is preserved.

liyangliu · 2020-05-04T08:07:09Z

Hi, @ozansener, as you mentioned, the order of different methods (MGDA-UB vs. uniform scaling) will keep the same whatever test set I use. But on the validation set, I cannot find the superiority of MGDA-UB upon uniform scaling. Also, on CityScapes I cannot reproduce the results reported in the paper. Actually I find that single-task baseline is better than the reported ones (10.28 vs. 11.34, 64.04 vs. 60.68 on the instance and semantic segmentation task respectively). I obtain these numbers with your provided code, so maybe I made some mistakes?

ozansener · 2020-05-04T14:15:47Z

@liyangliu For MultiMNIST, I think there are issues since we did not release a test. Everyone reports slightly different numbers. In hindsight, we should have released the test set but did not even save it. So, I would say please report whatever number you obtained for MultiMNIST. For Cityscapes though, it is strange as many people re-produced the numbers. Please send me an e-mail about the CityScapes so we can discuss.

liyangliu · 2020-05-05T15:48:48Z

Thanks. @ozansener. On CityScapes I re-run your code with instance & semantic segmentation task and get the following results for MGDA-UB and SINGLE task, respectively:

method	instance	semantic
MGDA-UB	15.88	64.53
SINGLE	10.28	64.04
MGDA-UB (paper)	10.25	66.63
SINGLE (paper)	11.34	60.08

It seems that the performance of instance segmentation is a bit strange.

ozansener · 2020-05-08T12:57:21Z

@liyangliu Instance segmentation one looks strange. Are you using the hyper-params I posted for both single task and multi-task. Also, are the tasks uniformly scaled or are you doing any search. Let me know the setup.

liyangliu · 2020-05-08T13:47:50Z

@ozansener, I use exactly the hyper-params you posted for single & multi-task training. I use 1/0 and 0/1 scale for single task training (instance and semantic segmentation) and didn't do any grid search.

AwesomeLemon · 2020-05-08T17:02:29Z

Sorry for an off-topic question, but I have trouble even running the training on CityScapes: for 256x512 input I get 32x64 output, while the target is 256x512. And smaller output makes sense to me because of the not dilated earlier layers & maxpooling.
So could someone please clear up for me, whether target indeed should have the same dimensions as input, and if so, where the spatial upsampling is supposed to happen?

ShirleyHan6 mentioned this issue Mar 10, 2020

Depth loss #7

Open

ozansener mentioned this issue Jul 18, 2023

params.json #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cityscapes experiment #2

Cityscapes experiment #2

maria8899 commented Feb 7, 2019 •

edited

ozansener commented Feb 8, 2019

maria8899 commented Feb 8, 2019

maria8899 commented Feb 8, 2019 •

edited

SimonVandenhende commented Feb 11, 2019

ozansener commented Feb 11, 2019

ozansener commented Feb 11, 2019

SimonVandenhende commented Feb 12, 2019

maria8899 commented Feb 13, 2019 •

edited

YoungGer commented Feb 18, 2019 •

edited

maria8899 commented Feb 18, 2019 •

edited

YoungGer commented Feb 26, 2019

JulienSiems commented Mar 20, 2019

kilianyp commented Mar 29, 2019 •

edited

ozansener commented May 1, 2019 •

edited

liyangliu commented Apr 14, 2020

youandeme commented May 2, 2020

liyangliu commented May 4, 2020

ozansener commented May 4, 2020

liyangliu commented May 4, 2020 •

edited

ozansener commented May 4, 2020

liyangliu commented May 5, 2020 •

edited

ozansener commented May 8, 2020

liyangliu commented May 8, 2020 •

edited

AwesomeLemon commented May 8, 2020

Cityscapes experiment #2

Cityscapes experiment #2

Comments

maria8899 commented Feb 7, 2019 • edited

ozansener commented Feb 8, 2019

maria8899 commented Feb 8, 2019

maria8899 commented Feb 8, 2019 • edited

SimonVandenhende commented Feb 11, 2019

ozansener commented Feb 11, 2019

ozansener commented Feb 11, 2019

SimonVandenhende commented Feb 12, 2019

maria8899 commented Feb 13, 2019 • edited

YoungGer commented Feb 18, 2019 • edited

maria8899 commented Feb 18, 2019 • edited

YoungGer commented Feb 26, 2019

JulienSiems commented Mar 20, 2019

kilianyp commented Mar 29, 2019 • edited

ozansener commented May 1, 2019 • edited

liyangliu commented Apr 14, 2020

youandeme commented May 2, 2020

liyangliu commented May 4, 2020

ozansener commented May 4, 2020

liyangliu commented May 4, 2020 • edited

ozansener commented May 4, 2020

liyangliu commented May 5, 2020 • edited

ozansener commented May 8, 2020

liyangliu commented May 8, 2020 • edited

AwesomeLemon commented May 8, 2020

maria8899 commented Feb 7, 2019 •

edited

maria8899 commented Feb 8, 2019 •

edited

maria8899 commented Feb 13, 2019 •

edited

YoungGer commented Feb 18, 2019 •

edited

maria8899 commented Feb 18, 2019 •

edited

kilianyp commented Mar 29, 2019 •

edited

ozansener commented May 1, 2019 •

edited

liyangliu commented May 4, 2020 •

edited

liyangliu commented May 5, 2020 •

edited

liyangliu commented May 8, 2020 •

edited