-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roadmaps for Binary File Format implementation #1751
Comments
Re: to the 16-bit floating point format. Why not examine all of the weights and use the minimum number of exponent bits to cover the range needed? Then the rest can simply be significand? Finally, store the number of exponent bits in a nibble in the header. |
Sounds like a good approach, though you will want to make sure the significand has at least, say, 7 bits of precision or so? If you can't combine that with the exponent, you may have to round-to-zero for small values and clamp big ones. |
How about using HDF5 as a container format instead of rolling our own standard? It will be slightly less efficient than pure arrays just serialised and dumped (+ some minimal metadata), but there is a lot of infrastucture around HDF5 already, and it is very extensible. |
No, the HDF5 libs are not a reasonable dependency to add to Leela, for offering almost nothing of value. Leela Chess uses protobuff in some places, but I'm not too fond of that either, for similar reasons. Note that HDF5 nor Protobuffs would not have helped with any of the points mentioned. (Aside from "extensible" where I'm not even sure what it means in this context. You're not going to extend anything if the processing code in the engine doesn't support it in the first place) |
I have previously created a JSON-based binary format that I used in a game I previously published. This might be overkill, but it is an option. By extensible, I meant that the file format can be extended to handle other situations (without having to change the file parsing code significantly, and without harming backwards compatibility), such as the ELF weight stuff the project has had to adapt to in the past. |
Files are so large that one could conceivably put in the decoding code first, then just read that in, eval it and use the function to "decompress" the data format. "What could possibly go wrong" 😄 If we'd use a separate "compress/decompress" scripts after downloading to produce a current format weight file (slightly slow, but would introduce no changes to rest of the pipeline), we could even arrange a competition out of it: Who'll make the smallest network files with reasonable packing/unpacking speed. (just to clarify: latter is an actual idea, first paragraph is just attempt at humor) |
I considered that (just wrapping the existing format with encoder/decoder scripts) because it's much more manageable than updating all the code that deals with networks. But you can't solve the missing batchnorm layer that way, unfortunately. |
We do not to reinvent the wheel, have a look at https://github.com/LLNL/zfp |
Looks interesting, but I doubt our weights data has 2D spatial correlation, so not clear how well it would work. |
That looks really cool. Reading the paper and following the literature a bit, I also found this development: In the paper they explain that they try do exploit smoothness in the data (which we don't really have, at least not obviously), but in case none is found:
That's a little bit unspecific, but it is explained later in the arXiv article (paraphrasing here):
So basically, they also truncate, but after normalising first. In the last step they still try to profit somewhat from correlations in the data, despite this being the section about unpredictable data. |
Following up on the JSON idea, because based on what I said before it might not be entirely clear what I meant, here's a text format version of what the current save file might look like:
It's then serialized to binary (with a proper header) to save, and on restore deserialized from binary. Any field missing can have a default value assigned by the engine, so if the format of the nets change, for instance, by preprocessing the batchnorm, it's possible to simply add a field indicating that in the "meta" area of the save file. This way the decision on this and perhaps other things aren't blocking on save file format. |
Hi Leela people, If you don't mind the self-promotion, let me tell you about joedb. I use it in my generic AlphaZero system to store the training sets as well as the network weights: Joedb can manipulate vectors of numbers efficiently and conveniently: It has convenient features for automatic schema upgrade, if you ever need to add more fields or tables, or even make any other arbitrary change: At the moment, it is C++ only, though. So it is not a good choice if you have to manipulate data in python. But if your program is C++, joedb is a really convenient and efficient way to store binary data. Storage is efficient (that is to say, storing a large vector of floats takes as little disk space as a raw dump of the vector + a few more bytes for the file format), and C++ code to conveniently manipulate the data is automatically generated. I considered adding half-precision support, but decided to wait for C++ to support half. For the moment, half-precision numbers can be stored as 16-bit integers. I'd be glad to answer questions if you wish to know more. |
Hi @Remi-Coulom! Pleasure to see you chiming in. :) Thanks for the info on joedb. Hopefully the more technical types will take a look or give it a try. @gcp is traveling this week. |
The advantages don't seem that relevant to us, i.e. see first post, and the limitations are showstoppers (we need Python support, and flexible binary/float representation, etc). To be honest I'd take something like the JSON over it if only because that's easily manipulated and well supported by literally everything. But remember that one of the main reasons to take a binary format is parsing and startup time. There's fast parsers for JSON, for sure, but it still means parsing text again... Not sure how good JSONB support is. |
There are, of course, a few competing "standards" for binary encoded JSON-like data, and there's also the option of "rolling our own." Both are fairly easy to do, because JSON is by design easy to parse and represent in memory. Here are some of the standards with a feature list.
Of these MsgPack and CBOR seem to be the best suited. |
I was playing around with binary files. For these experiments I took the network 33986b7f9456660c0877b1fc9b310fc2d4e9ba6aa9cee5e5d242bd7b2fb1b166 with size
First thing I tried was to just dump the 4 bytes long floats in a binary file in the IEEE 754 notation without spaces. I had to add a counter at the beginning of each layer to tell how many floats there are.
After this I applied something I already use in the past: analyzing the weight values in a layer we can find one factor per layer that map the actual floating point value to a range of floating point value between 0 and 65535. Then we can round this number up to the next integer (what I actually did was get the integer part) and save the number in a 16 bit value. Of course at the beginning of the layer with the count for the weights we have to save ONE floating point value this factor.
I guess to know if the errors are really low we need to run validation between the two network original and compressed. |
I think this is what we do to quantize the weights if we are to use 'INT16' inference. |
Are there any objections to the CBOR format with a custom FP type that contains a user defined precision, and with the logical structure described upthread? If not, then I'll slowly start working on this. |
To reduce startup time, perhaps just load the network in a separate thread? If the gtp connection doesn't block until the network is needed, users would probably perceive this as an improvement in startup time. |
Speaking of which, if the client didn't block until network is needed, that'd allow time left settings to be processed/etc before the genmove arrives. An issue I sometimes have with slow startup is if a player is doing a time control like Fischer or Byotomi with no main time and a short time period, it's possible for the bot to time out on the first move of the game since the game clock starts and time_left is sent to the bot before the network has been loaded (since bot is started when first move is needed). If startup takes 10 seconds, that 10 seconds before 'time_left' is loaded into the bot but by then the real clock is 10 seconds shorter and it might think too long. If network loading is in a separate thread and commands like time_left can be processed immediately while network is loading, that'd make the bot's understanding of time left accurate, right? Could be a real improvement for bot operators. ;) |
The GUI should ping the bot (with sequenced GTP commands) to see if it is finished starting up, IMHO. |
That's true, but I do really think you'll run into other problems unless you block until startup finishes. Like the bot forfeiting on time in very fast timecontrols.
But without a network loaded, you won't be able to interact much :-/ Maybe it's useful if you're going to send a sequence of play commands, I guess. |
The typical sequence of GTP commands in GridMaster before starting a game is something like: protocol_version then after some time for the user to select the first move (or start the clock) we get: play ... and then finally where we need the network: genmove w So, we have in the order of 10 gtp commands that could be answered without having the network (fully) loaded. I would already be happy if only the first four (protocol_version, list_commands, name, version) would be answered instantaneously (GridMaster uses those to determine that the engine is alive and understands gtp -- this is also why on most older devices engine installs need the slow timeouts setting), but it's only a minor inconvenience. The compressed networks are a big win. |
That makes sense, yes. Doing the load in a separate thread and blocking on that finishing just before we use it probably isn't too complicated, but it's going to be mostly messy due to the debug information that is printed out while the network is loaded and benchmarked. |
I don't think it needs to get messy. Most of the initial gtp commands don't take noticeable time, so the program could just wait for a command that requires the network or the gtp input stream to go idle (and by then, if you really want to, you could still do it in a mode that blocks the main thread). Getting network loading and OpenCL tuning out of the initial startup phase would be a big win IMO. |
Long overdue update. I became stalled looking for a good, modern, and fully featured JSON library to bring into the project. So I took some time and am making a contribution to nlohmann/json to add CBOR binary field support to the library. You can follow that here: nlohmann/json#1662 Once that is complete, I'll begin working on this a bit more earnestly. |
As mentioned in #1740, there is a number of things that @gcp wants to hit when rethinking how network weights are done and implementing a binary format.
process_bn_var
might be better done at writing timeThe text was updated successfully, but these errors were encountered: