New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there any way for us to contribute to the automated training? #234
Comments
According to basic security concern, any network contributions have to be certified and trusted by the community. Right now having a single trained network source is the way we ensure security. The potential attack/damage to the process includes:
For the first and second issue, a verification step might (not sure about the second) be introduced to prevent those to happen, but I didn't see any technical way to prevent the last attack. How can we trust any one that claims he follows all the rules we should be following for the project? |
may be contribute money to buy more compute ability from cloud is more reality |
@earthengine From theoretic aspect we DO have some technique to verify honesty of a network, by an interactive verification which is somewhat complicated. But the cost is very high (I thought that it would cost up to about 100x time). |
I can think of a couple of solutions here:
I think with suggested methods (especially using less training steps) training itself is manageable with a single machine, so I originally posted this issue with 2. on my mind, but if other solutions can work, that would be okay too. |
@isty2e Great! To do this we should only specify and verify |
All the code I use for training is and has always been in the source repo, and I've been uploading gigabytes of data in #167 exactly to ensure others can run the training, which is exactly how the 22373747 network has been found! So yes, obviously you can do this, and other people have already successfully contributed in this way. |
Verifying a trained network is very easy: if it's a good gain over the previous one it takes a few hours for anyone with a fast machine to confirm it with autogtp. (If it's a minor gain, it may take half a day) |
@gcp Is there any plan to automate everything on the server side? I still think that is the most clear solution. |
@gcp I wanted to try the training myself too. I know the script/data are there already, but it appears you've changed some parameters in recent training e.g. Are they updated in the script too? I see no recent commits for changing these parameters. If not, could you point to me where I can only find |
It's being automated on my side (the server doesn't have a GPU or anything). Right now it's a few scripts that I launch and check the output for (but which I can do, e.g., from my phone). |
The discussion about how to set the learning rate is in the #78 thread. You should read the AGZ paper and understand how learning rate corresponds to batch size. (You won't be able to use the AGZ batch size on most common GPU) I have no idea where you get the stuff about "training steps". |
@gcp I'm talking about these:
But are they even the same thing...? |
If you start the training it will run and print e.g.
So it's just a question of how long you let it run. |
@gcp Aha, no wonder no exit condition in Thanks for clarifying, I'm pretty new to Deep Learning and TensorFlow stuff, so forgive me about dummy questions. Just to be clear, now you just stops the training when step reaches 1000? Not using the 10000 (10k) others mentioned earlier in #78? |
@bood The number of training steps is scaled according to batch size, so it will be presumably 1000*2048/256=8000. |
Now networks are trained on a regular if not daily basis, and evaluation is distributed. Closing this issue. |
Over 80k games are generated from the current best network, from which we could’ve trained a few networks, if not many.
I think all of us understand that gcp is the leader of this project and his efforts are totally voluntary, but I’m afraid we might lose some contributors if things slow down. Especially, many among the community are eager to see the results from the recently suggested training methods, frequently if possible.
Clearly, gcp can be busy for a couple of days, things might happen, or he might not be able to train a new network every 25k games. So I feel like it is really time for the training process to be automated (with distributed testing for which some people are already working on). Unfortunately, the training is performed on the server side and it is hard for the community to settle this issue. Is there any method for us to help automating/pipelining the process?
The text was updated successfully, but these errors were encountered: