Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the NN structure of Decima and its relation with the problem size #17

Open
kzhang28 opened this issue Aug 11, 2020 · 2 comments

Comments

@kzhang28
Copy link

Questions:

  1. about the input space size and the number of entries of the softmax for selecting state/node: Do you make an assumption about the maximum concurrent runnable nodes/stages in the system? From the paper (Figure 6), it seems that this value (n) needs to be predefined and the softmax should have the same number of input/output entries, is that correct?

  2. It seems the softmax function for selecting the maximum parallelism also needs to fix the number of input/output entries beforehand. You have stated in the paper " Since the number of possible limits can be as large as the number of executors," , so if we want to apply your solution to a new larger cluster, the number of entries for the softmax function should be increased proportionally to the new cluster size. Is my understanding correct?


Answers from Hongzi:

  1. No we don't restrict the number of total nodes. Note that the softmax operation is scale-free --- the input to softmax can have arbitrary size (it's just exponentials with normalization of the their sum). Check out the softmax function in tensorflow or pytorch and how they apply to the input vector.

  2. I think your understanding is correct. We were being lazy and use an output node to represent a parallelism limit --- so you need n nodes if you have n executors. But a more scalable way is to just output the parallelism limit as a number. You can express such continuous (round it afterwards) number by a Gaussian distribution. The neural network output the mean and you sample from the Gaussian distribution (similar to how you sample from softmax output).

Thanks!

@kzhang28
Copy link
Author

@hongzimao
After going through your code, I think I can understand your explanation for question 1. Just want to confirm with you about my understanding:
The softmax (surrounded by the green box in the following figure)may have varying number of input entries for different scheduling events. This is because the last fully-connected layer of the actor network has output size 1, the output shape of the Actor network after reshape will be [batchSize, numberOfNodes], where numberOfNodes is a varying number.
image

@hongzimao
Copy link
Owner

Yes, your understanding is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants