Question about the NN structure of Decima and its relation with the problem size #17

kzhang28 · 2020-08-11T01:57:34Z

Questions:

about the input space size and the number of entries of the softmax for selecting state/node: Do you make an assumption about the maximum concurrent runnable nodes/stages in the system? From the paper (Figure 6), it seems that this value (n) needs to be predefined and the softmax should have the same number of input/output entries, is that correct?
It seems the softmax function for selecting the maximum parallelism also needs to fix the number of input/output entries beforehand. You have stated in the paper " Since the number of possible limits can be as large as the number of executors," , so if we want to apply your solution to a new larger cluster, the number of entries for the softmax function should be increased proportionally to the new cluster size. Is my understanding correct?

Answers from Hongzi:

No we don't restrict the number of total nodes. Note that the softmax operation is scale-free --- the input to softmax can have arbitrary size (it's just exponentials with normalization of the their sum). Check out the softmax function in tensorflow or pytorch and how they apply to the input vector.
I think your understanding is correct. We were being lazy and use an output node to represent a parallelism limit --- so you need n nodes if you have n executors. But a more scalable way is to just output the parallelism limit as a number. You can express such continuous (round it afterwards) number by a Gaussian distribution. The neural network output the mean and you sample from the Gaussian distribution (similar to how you sample from softmax output).

Thanks!

kzhang28 · 2020-08-11T04:02:14Z

@hongzimao
After going through your code, I think I can understand your explanation for question 1. Just want to confirm with you about my understanding:
The softmax (surrounded by the green box in the following figure)may have varying number of input entries for different scheduling events. This is because the last fully-connected layer of the actor network has output size 1, the output shape of the Actor network after reshape will be [batchSize, numberOfNodes], where numberOfNodes is a varying number.

hongzimao · 2020-08-11T15:52:52Z

Yes, your understanding is correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the NN structure of Decima and its relation with the problem size #17

Question about the NN structure of Decima and its relation with the problem size #17

kzhang28 commented Aug 11, 2020

kzhang28 commented Aug 11, 2020

hongzimao commented Aug 11, 2020

Question about the NN structure of Decima and its relation with the problem size #17

Question about the NN structure of Decima and its relation with the problem size #17

Comments

kzhang28 commented Aug 11, 2020

kzhang28 commented Aug 11, 2020

hongzimao commented Aug 11, 2020