Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way for us to contribute to the automated training? #234

Closed
isty2e opened this issue Dec 3, 2017 · 16 comments
Closed

Is there any way for us to contribute to the automated training? #234

isty2e opened this issue Dec 3, 2017 · 16 comments

Comments

@isty2e
Copy link

isty2e commented Dec 3, 2017

Over 80k games are generated from the current best network, from which we could’ve trained a few networks, if not many.

I think all of us understand that gcp is the leader of this project and his efforts are totally voluntary, but I’m afraid we might lose some contributors if things slow down. Especially, many among the community are eager to see the results from the recently suggested training methods, frequently if possible.

Clearly, gcp can be busy for a couple of days, things might happen, or he might not be able to train a new network every 25k games. So I feel like it is really time for the training process to be automated (with distributed testing for which some people are already working on). Unfortunately, the training is performed on the server side and it is hard for the community to settle this issue. Is there any method for us to help automating/pipelining the process?

@earthengine
Copy link
Contributor

earthengine commented Dec 3, 2017

According to basic security concern, any network contributions have to be certified and trusted by the community. Right now having a single trained network source is the way we ensure security.

The potential attack/damage to the process includes:

  • Broken networks - the uploaded network might be broken or not compatible

  • Weak networks - the "best" network is actually weaker than existing networks (it is true that the current process can have weaker "best" network, but here "weaker" means it is actually trained with less games)

  • Dishonest networks - it is polluted with human games (which breaks the whole purpose of the project) or other irrelevant sources, or with additional constraints represents additional domain knowledge beyond the Alphago Zero paper

For the first and second issue, a verification step might (not sure about the second) be introduced to prevent those to happen, but I didn't see any technical way to prevent the last attack. How can we trust any one that claims he follows all the rules we should be following for the project?

@l1t1
Copy link

l1t1 commented Dec 3, 2017

may be contribute money to buy more compute ability from cloud is more reality

@lwins-lights
Copy link

lwins-lights commented Dec 3, 2017

@earthengine From theoretic aspect we DO have some technique to verify honesty of a network, by an interactive verification which is somewhat complicated. But the cost is very high (I thought that it would cost up to about 100x time).

@isty2e
Copy link
Author

isty2e commented Dec 3, 2017

I can think of a couple of solutions here:

  1. A network is uploaded with detailed training method and the training set so the others can reproduce it.
  2. Gcp shares the code he's using for training along with some server side scripts or APIs or whatever so the the community can work on it.

I think with suggested methods (especially using less training steps) training itself is manageable with a single machine, so I originally posted this issue with 2. on my mind, but if other solutions can work, that would be okay too.

@lwins-lights
Copy link

@isty2e Great! To do this we should only specify and verify
a) training set used
b) random bits used, or alternatively, random seed
Also, the similar approach could be applied to deal with the third issue presented by @earthengine .

@gcp
Copy link
Member

gcp commented Dec 3, 2017

All the code I use for training is and has always been in the source repo, and I've been uploading gigabytes of data in #167 exactly to ensure others can run the training, which is exactly how the 22373747 network has been found!

So yes, obviously you can do this, and other people have already successfully contributed in this way.

@gcp
Copy link
Member

gcp commented Dec 3, 2017

Verifying a trained network is very easy: if it's a good gain over the previous one it takes a few hours for anyone with a fast machine to confirm it with autogtp. (If it's a minor gain, it may take half a day)

@isty2e
Copy link
Author

isty2e commented Dec 4, 2017

@gcp Is there any plan to automate everything on the server side? I still think that is the most clear solution.

@bood
Copy link
Collaborator

bood commented Dec 4, 2017

@gcp I wanted to try the training myself too. I know the script/data are there already, but it appears you've changed some parameters in recent training e.g. training steps, learning rate etc.

Are they updated in the script too? I see no recent commits for changing these parameters. If not, could you point to me where training steps are configured?

I can only find learning rate here: https://github.com/gcp/leela-zero/blob/master/training/tf/tfprocess.py#L80. Don't know how to change the training steps though.

@gcp
Copy link
Member

gcp commented Dec 4, 2017

@gcp Is there any plan to automate everything on the server side? I still think that is the most clear solution.

It's being automated on my side (the server doesn't have a GPU or anything). Right now it's a few scripts that I launch and check the output for (but which I can do, e.g., from my phone).

@gcp
Copy link
Member

gcp commented Dec 4, 2017

it appears you've changed some parameters in recent training e.g. training steps, learning rate etc.

The discussion about how to set the learning rate is in the #78 thread. You should read the AGZ paper and understand how learning rate corresponds to batch size. (You won't be able to use the AGZ batch size on most common GPU)

I have no idea where you get the stuff about "training steps".

@bood
Copy link
Collaborator

bood commented Dec 4, 2017

@gcp I'm talking about these:

running 1000 training steps (scaled for minimatch size) and evaluating immediately

I restarted the training from 92c658d weights again with all of the previous training data and trained it again 10k steps

But are they even the same thing...?

@gcp
Copy link
Member

gcp commented Dec 4, 2017

If you start the training it will run and print e.g.

step 24200, policy loss=4.98316 mse=0.0975871 (644.309 pos/s)
step 24300, policy loss=4.98177 mse=0.0966891 (634.301 pos/s)

So it's just a question of how long you let it run.

@bood
Copy link
Collaborator

bood commented Dec 4, 2017

@gcp Aha, no wonder no exit condition in tfprocess.py

Thanks for clarifying, I'm pretty new to Deep Learning and TensorFlow stuff, so forgive me about dummy questions.

Just to be clear, now you just stops the training when step reaches 1000? Not using the 10000 (10k) others mentioned earlier in #78?

@isty2e
Copy link
Author

isty2e commented Dec 4, 2017

@bood The number of training steps is scaled according to batch size, so it will be presumably 1000*2048/256=8000.

@isty2e
Copy link
Author

isty2e commented Dec 20, 2017

Now networks are trained on a regular if not daily basis, and evaluation is distributed. Closing this issue.

@isty2e isty2e closed this as completed Dec 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants