Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary file format #1740

Closed

Conversation

OmnipotentEntity
Copy link
Contributor

@OmnipotentEntity OmnipotentEntity commented Aug 17, 2018

This is a first attempt at defining a binary file format.

I fully expect something to be wrong, and I believe that in fact there is something already wrong with the value net output (though the policy net seems fine at least), though I cannot locate what I've done wrong. So if someone who is more comfortable with the format of the network can chime in that would be helpful.

As mentioned in #1733, this is not enough still. Because neither the training pipeline nor the server yet has support for this file format. But this should be enough to get started on those.

The latest 20b network is only 48042715 bytes (46MB) when uncompressed. It does compress a bit further with gzip, down to 43MB. This is under half of the gzip compressed file on the server, so this is a big gain in that area at the very least.

src/Network.cpp Outdated
if (!block) { // Very first has a different shape because it's the input layer
count = filters * 162;
}
std::vector<float> weights = process(count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest putting

auto weights = process(count);

and something similar for rest of the process() calls.

src/Network.cpp Outdated
return {0, 0};
}
std::copy(cbegin(weights), cend(weights), begin(m_ip2_val_b));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a lot of repetitive code. How about moving the code body into process() calls and handle the return {0,0}; case as an exception throw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I didn't see a lot of exceptions in the code, so I thought there was some sort of unwritten rule against them. But I suppose I was being over cautious.

@ihavnoid
Copy link
Member

I like the implementation - but suggest using auto whenever possible, for consistency with rest of the leela-zero code.

@Ttl
Copy link
Member

Ttl commented Aug 17, 2018

The output with #1736 is 52 MB compressed. If the space saving from binary is just 9 MB I'm not sure if it's worth the trouble.

@ihavnoid
Copy link
Member

@Ttl I guess we have to see the size of the compressed binary format. Plus, this is without losing any precision, and this code actually includes half-precision binary formats too. My hope is that half precision weight will further save area while not losing much strength, but that will need some experimenting.

One question that I had was whether it is okay to lose precision on the final fully connected layers (when half precision is used to store weights). The previous half precision implementations used half precision only within the residual tower so it isn't totally equivalent. I would like to make sure that half-precision binary weights are okay, or at least keep the final few layers to be single precision even when half precision format is used.

@ihavnoid
Copy link
Member

Okay perhaps I misunderstood - the 48MB numbers are for the fp16 type. So yes probably we aren't saving a lot of disk. But anyway will probably need to see the strength difference.

@OmnipotentEntity
Copy link
Contributor Author

OmnipotentEntity commented Aug 17, 2018

@Ttl I could also attempt some form of value length encoding to save more space. The reason why the space saving seems less than it ought to is because we're already not storing a lot of entropy. Though I'd argue that 9 MB on 52MB is still significant, as it's around 16%.

@OmnipotentEntity
Copy link
Contributor Author

This should be everything outside of the training code, which I will be the first to admit I do not fully understand and do not really have a good method of testing. May I get a review? Is the file format sane and fully featured? Does the code make sense and is it readable? Is this something that the project is interested in?

@ihavnoid
Copy link
Member

My opinion is that we should eventually migrate to some form of binary format. I believe the binary format should be at least a little bit faster than the text format (since we don't need any parsing at all) if done properly. The current text files aren't that useful for debugging anyway (large files with very long lines - too hard to do anything on a text editor) but this may be just me.

The reason I say "eventually" is because changing file formats can cause quite a bit of trouble - we have to force everybody to move beyond a certain version (which we sort of do when forcing a minimum autogtp/leelaz version for training) but I am not sure what else we are breaking...

@OmnipotentEntity
Copy link
Contributor Author

OmnipotentEntity commented Aug 19, 2018

Did a little bit of performance tuning, these are the times it takes to load various difference versions of the same weights:

Type (20x256) Seconds
v3 0.731481
v3.gz 0.652128
v1 1.35178
v1.gz 1.13941

Note, this is a computer is a very fast hard drive, if you have a slower hard drive, this difference will be more pronounced.

About not being sure what else we're breaking: I wish I could be certain that this worked 100% properly. Does anyone have some computer time they want to lend? I'd run matches like I used to, but I burned out one of my cards from overuse, and I haven't been able to replace it.

@@ -30,6 +30,7 @@
#include <boost/utility.hpp>
#include <boost/format.hpp>
#include <boost/spirit/home/x3.hpp>
// #include <ctime>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some debug code has been left in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented out the timing code, so that other people could replicate my findings. I'll remove it entirely if need be.

@ihavnoid
Copy link
Member

@OmnipotentEntity - maybe not 'breaking', but at least this will make things be a bit more complicated. I think we should still keep the 32-bit formats for training, but only the distribution versions of the net will be binary-converted to 16-bit. This adds a bit more complexity.

Other than that, my only other concern is that we also reduced precision for the final 'dual head' part of the net. Our current half precision net implementation applies half precision on the residual towers, and the 'dual head' is still single precision. I see that the final dual-head parameters have quite big numbers (~70) and on some older nets I recall seeing values in the range of thousands. My suggestion is that we should be a bit more conservative (after all, the 'dual head' is only a very small part of the weights) unless we have proof that it is okay to use fp16 even on the final heads.

@OmnipotentEntity
Copy link
Contributor Author

OmnipotentEntity commented Aug 20, 2018

A kind person allowed me to use his computer to set up a match. So far the result after 120 games is exactly 60-60, so there doesn't seem to be much problem with the loss of precision so far.

leelaz-next-3 v leelaz-next-1 (120/400 games)
board size: 19   komi: 7.5
                wins              black         white       avg cpu
leelaz-next-3     60 50.00%       24 40.00%     36 60.00%    337.32
leelaz-next-1     60 50.00%       24 40.00%     36 60.00%    327.70
                                  48 40.00%     72 60.00%

@gcp
Copy link
Member

gcp commented Aug 20, 2018

Quick thoughts:

a) We're not sure what the best float format is, see "half" vs "Bfloat16" discussion. Ironically the text format can deal with both. Would be nice to have some flexibility there if we have a binary one.

b) Without working training code we can't deploy such nets.

c) If we are revising the network format, we should add beta and gamma weights to the batchnorm layers. We currently encode the beta into the convolution bias of the layer before, but we can't encode gamma in a way that we can reconstruct it afterwards during training. This is a long-standing design mistake in Leela Zero.

@gcp
Copy link
Member

gcp commented Aug 20, 2018

I'm closing this to make clear it's not going to get pulled in its current form. The gain is all things considered little (except in loading perf for large networks!), there is a lot to do to make it usable, and the design needs more fleshing out.

@gcp gcp closed this Aug 20, 2018
@gcp
Copy link
Member

gcp commented Aug 20, 2018

May I get a review? Is the file format sane and fully featured? Does the code make sense and is it readable? Is this something that the project is interested in?

I hope I sufficiently addressed this. We can continue in an issue if you want.

Improving the network format is something we should do, but it has a large impact as it needs changes all over, so I'd prefer to do it right or not at all.

@gcp
Copy link
Member

gcp commented Aug 20, 2018

Whethe process_bn_var should be done at writing time is another thing to consider.

@OmnipotentEntity OmnipotentEntity deleted the binary-file-format branch July 6, 2019 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants