Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about distributed learning #1

Open
aTan-aka-Xellos opened this issue Oct 25, 2017 · 117 comments
Open

Question about distributed learning #1

aTan-aka-Xellos opened this issue Oct 25, 2017 · 117 comments
Labels

Comments

@aTan-aka-Xellos
Copy link

Hello!
Since it required huge amount of computing resources to teach network - is it possible to create some distributed system, where everyone who is willing can join and contribute their machine resources?
According their paper, 40 days is needed to overcome previous version, but for AlphaGo Lee version 3 days is enough. So it seems less than 150 years is needed :D
If 10-100 people contribute machine resources - it seems possible to make some progress. Also may donation is option to rent some powerful GPU based server (this should be even simpler since no need to build any distributed system).
Sorry if question is silly :)

Could you estimate - does it possible to achieve any tangible progress with this network learning on GTX1080? (by tangible I mean - to achieve at least low dan lvl for a month of time for example)

Thank you!

@gcp
Copy link
Member

gcp commented Oct 25, 2017

Yes, we are working on the server portion of the distributed system. It is probably possible to get very decent performance with 10-100 people in a few months, or less. The idea is also to start with an even smaller network to see if the system works correctly, and where we end up strength-wise. I will post more info here and on the computer-go mailinglist when we have something ready to test.

Aside from the server you need a small self-play GTP client (I have this ready and will upload it to a separate github repo soon) and a script to fetch the network weights from the server and upload the games back. The code in this repo also needs 2 minimal tweaks to include more randomness in the games.

Low dan level should be achievable, I think. You only need a small network for that (~6 residual layers or so). The supervised network that is available as an example probably already reaches ~1 dan level on a GTX 1080, and it was only trained for 2 days on a single GPU. (Of course, I cheated by not bootstrapping from self-play)

Despite that bad news that replicating the final results will literally take ages, the good news is that initial progress goes very quickly.

@aTan-aka-Xellos
Copy link
Author

@gcp thank you for fast answer!
Could you also clarify - do you plan to provide compiled build? Since not everyone familiar with C++ :) Or I missed link somewhere?

@gcp
Copy link
Member

gcp commented Oct 25, 2017

Yes, sure. I wonder if there's some way to do it automatically on github.

@timuzhti
Copy link

You can distribute binaries using Github Releases. Not exactly automatic, but I think that's the easiest way. I wonder how many people will be willing to donate compute time compared to something Fishtest.

@gafferongames
Copy link

Why distribute the training when Google provides access to TPU based processors through their cloud? You have access to the same hardware that Google used to train AlphaGo... just start a kickstarter or a Patreon to cover the cost of training...

@apetresc
Copy link
Contributor

apetresc commented Oct 25, 2017

@gafferongames Cloud TPUs are still in "preview" mode, i.e they reach out to you if you get to have access, but as far as I know nobody who applied through the public sign-up sheet has actually gotten in yet. Nor have they given any details about pricing of Cloud TPU, beyond the 1,000 free ones they're going to be donating to certain academic groups.

Unless I've missed some announcement somewhere?

@gcp
Copy link
Member

gcp commented Oct 25, 2017

Various reasons:

  1. The costs ramp up rather quickly. DeepMind's CEO estimated that they used about $25 million in computing resources. I think people are more comfortable lending their machine than they are lending their money. In any case I have zero interest in asking for people's money.
  2. I can't even get a GCP account. It's not available to European individuals (only businesses), or my account is wedged in an indeterminate state and it's impossible to get help or get it fixed. (I understand such experiences with Google Customer Support are not rare, and a reason not to depend on Google infrastructure) Amazon EC2 GPU instances aren't very cost effective and relatively slow.
  3. My understanding is that using the TPUs requires using TensorFlow. I'd rather have a program that can run with minimal setup on anyone's machine. Anyone's free to write a TensorFlow version of this, run 1000's of instances on the Google Cloud at their own cost, and submit the data. I am actually hoping people write optimized versions of this for specific hardware, which is why I specified the data formats.

@gcp
Copy link
Member

gcp commented Oct 26, 2017

Small update here: the missing randomization parts are implemented. I managed to build a Windows binary (probably required if we're hoping many people join!).

Still needed: Finishing up the self-play tool, and then beta-testing the server setup.

I am testing different training configurations with a human database (supervised learning) to get some idea for the parameters (learning rate, decay, value vs policy weighting), the performance on normal GPUs, and to be able to optimize the search parameters (UCT coeff etc) in leela-zero.

@aTan-aka-Xellos
Copy link
Author

https://github.com/yenw/computer-go-dataset could it be useful as additional database? Since it contains thouthands of FineArt/DeepZen/CGI games with pro and top amateurs?

@gcp
Copy link
Member

gcp commented Oct 26, 2017

Only for calibrating with a supervised learning dataset. I already have a database of comparable size, but thanks for the link. (AlphaGo Zero's paper used a smaller dataset of only KGS games, so the size shouldn't be an issue)

@xelxebar
Copy link

I'm jumping in the middle of a conversation here, but would it be feasible to write a BOINC application to do the distribution for us?

BOINC is essentially a portal of distributed computation projects and makes it easy to contribute computer cycles to whatever you find personally interesting. It houses projects like SETI@Home, CERN's physics stuff, prime number searches, etc. I think a lot of non-Go players would even be interested in contributing.

@FlorinAndrei
Copy link

FlorinAndrei commented Oct 30, 2017

Once distributed training is ready to go, I'll donate some time on my Titan X.

What might help would be binary releases once in a while. Win64, whatever's the current latest Ubuntu LTS, etc. Static linking whenever possible would simply things even further. This will attract a wider base of users.

@Ka-zam
Copy link

Ka-zam commented Nov 1, 2017

I'll donate time on my GTX1080 for this.

gcp referenced this issue Nov 2, 2017
At the start of the learning, the network can't score games correctly so
it certainly can't make reasonable resignation decisions. Resigning can
only be enabled if training has progressed somewhat.
@yoonws
Copy link

yoonws commented Nov 3, 2017

I will also donate time on my GTX1080. I am student of Go in Korea, My dream is professional Go player.
I really like board game and Leela.

I have a pure curious about distributed learning system. If distribute learning is possible, there are many students of Go in Korea who want to participate.

Is it possible to learn network with seperated computer without real time network connection?

For example, if we run learning algorithm on single computer it may build small network.
If we adjust that networks, it may just reduce biased error where error come from choice of training set.

How can it is possible.

@gcp
Copy link
Member

gcp commented Nov 3, 2017

Is it possible to learn network with seperated computer without real time network connection?

In theory, yes, as long as you can download a new network periodically and upload the data periodically. Depending on the delay in updating, you'll have a slower learning-feedback loop, though, so it's not ideal. And the client would have to be written to deal with this, which certainly isn't going to be true for the first versions.

If we adjust that networks

What would "adjust" even mean here.

@yoonws
Copy link

yoonws commented Nov 3, 2017

What I was meaning is merge,
I thought we devide total move for distribute learning.
For example, moves1-20, moves 10-20.

The complicity of Go is come from uncertainty for large move. We can't expect for large number of move even though we have learned a lot.

But with well trained value network, story is different. We know who is good, and who is bad without total play game.

We can decision who will win at anytime, with well trained value network.
That means we can devide total number of move for distributed learning.

If we look them local moves (1-20, 10-20) it became easy game.

That is why I thought we are doing adjust. I thought we merge small networks.

I am not familiar with programming. Sorry for amateur like reply.

@gcp
Copy link
Member

gcp commented Nov 3, 2017

Sorry, but I completely fail to understand what you are proposing.

The whole point is to learn an accurate value network. If we already had one then most of the effort wouldn't be necessary. If the score is very much to one side, then the program can resign and terminate the game early (but there are caveats here - see the original paper which addresses them well).

Even so I don't understand what you would want to achieve by only looking at 10 moves from a total game, or how this would help distribution for clients that aren't connected to the internet, or anything really.

@gcp
Copy link
Member

gcp commented Nov 3, 2017

There is no "merging" of networks. Every client plays with the full, best-so-far network. The only thing that is "merged" is the training data, which consists of the network inputs, the search outcome and the game outcome sampled over many games. (The latter is described in the README "Training data format")

@wensdong
Copy link

wensdong commented Nov 5, 2017

Deepmind claims that they only trained for 3 days on 4 TPUs to reach a comparable level of the old Go. Nvadia claims their Volta GPU probably have the better performance than the TPU. Anyone can explain the discrepancy why 1700years. That is a difference between normal PC and Quantum computer.

@ssj-gz
Copy link

ssj-gz commented Nov 5, 2017

@wensdong As far as I can tell, this is something that has been misreported by the media: when playing a game (e.g. against Alphago Lee), Alphago Zero does indeed use a single machine with 4 TPUs, but when generating the self-play games for training, it is practically certain that they use many more machines: generating 29 million self-play games on a single machine would be pretty tough :) As far as I'm aware, there's no data on the hardware used for self-play.

Edit:

@gcp I was wondering - do you intend to release the raw self-play games that you collect,or just the generated weights?

@gcp
Copy link
Member

gcp commented Nov 5, 2017

The idea is that all data stays open and public domain.

@gcp
Copy link
Member

gcp commented Nov 5, 2017

@wensdong See the post to computer-go that is linked from the file named README where this is all already explained. It is simply not possible to get the required data with only that hardware, and nowhere did DeepMind claim that. They only talked about the playing machine (when the end result is already available!), and the learning machine (64 GPU), which is a small part of the required learning pipeline (it doesn't generate the actual games!).

@leehiufung911
Copy link

Is the server done yet? If so/when it's done I'd gladly let this train on my gtx1080 when I'm sleeping

@gcp
Copy link
Member

gcp commented Nov 7, 2017

Updates:

  1. I managed to get good convergence (both on policy and value) on a small supervised network, and it's uploaded to the location linked in the README. Combined with a good GPU, it makes Leela Zero already play quite reasonably. I will use this to tune the search parameters (required for the learning procedure), and its strength will be a target to try to surpass reinforcement learning (self-learning). Having a good network also made it possible to find and fix some bugs for corner cases.

  2. I've rewritten my learning code to TensorFlow and open sourced it. With what's in the repo you can easily (if you have a working TensorFlow install...) learn a new supervised network, and the adaptation for the reinforcement learning with the server is trivial. The only thing missing is the automatic generation of symmetries to get more learning data (and maybe some optimization). Rewriting the code also flushed out a few bugs or missing details in the format specification.

  3. Amazon has recently added NVIDIA Volta V100's to AWS, under the p3.* instances. Unlike TPU and the Google Cloud, they are available now, even to me, and they perform well for reasonable prices. I still don't plan to spend a lot of money renting thousands of instances, but they could come in very handy if we risk getting bottle-necked on the learning instead of the self-play.

With (2) now finished, it's time to do the final tweaks and start putting autogtp and leela-zero into a package that interacts with the server.

@gcp gcp added the question label Nov 8, 2017
@gcp
Copy link
Member

gcp commented Nov 10, 2017

Updates:

  1. There's a feature-complete autogtp in the repository, but it'll need some small optimizations before being deployed more widely. Right now it downloads the network every game. The server needs an extra endpoint so autogtp can check if there's been an update without actually downloading it. Right now I'm using it for testing and seeing that we can get SGF and training data out of the db correctly.

  2. Tuning for the search parameters is ongoing.

  3. Given that the first 25k or so games are played by a random neural network, there's no point in actually using the GPU to play them out. I will probably hack up a Leela that just plays randomly and seed those myself, and start the distributed effort from epoch 2. (The thing the network learns from the first batch of games is who is won in a final position, i.e. how to count)

  4. It's time to start estimating what kind of size network is feasible. I'm thinking of doing the first run with 6 residual layers of 128 filters, if it looks like that should progress at a fair speed (it's about 1/13th of the small Zero network, and 1/26th of the big Zero). This is about the size regular Leela uses when running on a CPU, and a fairly sweet spot for diminishing returns with supervised learning, so it would be interesting to see where Zero ends up. For the playouts, something like 1000? I'm not sure how the DeepMind guys selected 1600. Make this too big, and it's slow, make it too small, and the search doesn't have time to "discover" things the neural network doesn't know. It should at least be able to look at all the root moves, and explore the best for a few extra ply.

@marcocalignano
Copy link
Member

I was looking at the code and I have a question:
why you need to process to play against each other? one process can receive the command 'genmove black' and after that 'genmove white'.
Do you have a problem with the evaluation log for both players being in one game?

@gcp
Copy link
Member

gcp commented Nov 10, 2017

Good question! I need a controller program anyway to handle the HTTP networking, unzip etc and I already had some code to autoplay 2 GTP programs that I adapted. It is true that because the distributed program uses the same network for both players, it could play against itself in a single process. It's probably pretty easy to adapt autogtp to ditch the second process.

@gcp
Copy link
Member

gcp commented Nov 19, 2017

@marcocalignano I will test it tomorrow, I think by then I will have some 85k networks to test with it too.

@marcocalignano
Copy link
Member

But I should go on with the multi games option?
I also try to rebase on the next from your repository

@gcp
Copy link
Member

gcp commented Nov 19, 2017

Yes, this looks good. Specifying the GPU id explicitly works but requires the user to understand how to launch Leela on the console and find the right numbers (no problem for my use of course). I think for using this technique in autogtp it should take a simple count of # of GPU to use, and then just use --gpu 0, --gpu 1 etc. As said, I will change leela-zero so it numbers the GPUs in order of preference.

@marcocalignano
Copy link
Member

Ok I am ready to test multi games mode what do I have to puul to have the last client?

@marcocalignano
Copy link
Member

Game data 852c24aaf81ef708dc773157e298c0a296d6c9e4ab9ba8d9ceddaeb0686ded50 stored in database

Game data 661291685827292130b0d566cd76cb47c65a593fd4c8469bee3e495de565cf06 stored in database

Can you check these two game if they are ok?

@marcocalignano
Copy link
Member

@gcp did you check the previous games?

@gcp
Copy link
Member

gcp commented Nov 20, 2017

They arrived fine in the db, i.e.:

"(;GM[1]FF[4]RU[Chinese]DT[2017-11-19]SZ[19]KM[7.5]PB[Leela Zero 0.3]PW[Human]RE[B+11.5]\n\n;B[oa];W[md];B[ch];W[dk];B[mf];W[nh];B[sc];W[mg];B[di];W[jf]\n;B[le];W[mi];B[ob];W[qo];B[gb];W[fc];B[lb];W[eq];B[sh];W[ne]\n;B[rm];W[bg];B[ao];W[pr];B[hs];W[el];B[ek];W[pj];B[gn];W[ab]\n;B[ib];W[pq];B[jh];W[bd];B[fn];W[ka];B[la];W[fg];B[hh];W[ba]\n;B[ci];W[dl];B[qi];W[ca];B[pl];W[nl];B[eb];W[ai];B[lj];W[hp]\n;B[rn];W[em];B[pf];W[lh];B[jp];W[fh];B[ii];W[bf];B[dc];W[db]\n;B[jm];W[hj];B[dd];W[of];B[ll];W[gg];B[lf];W[he];B[on];W[om]\n;B[nf];W[fk];B[lr];W[mb];B[oq];W[ed];B[fe];W[en];B[ki];W[ld]\n;B[fp];W[fo];B[dg];W[ee];B[ef];W[aa];B[qg];W[il];B[eh];W[af]\n;B[lm];W[cj];B[gr];W[gi];B[bj];W[me];B[im];W[kj];B[cc];W[bc]\n;B[sr];W[lq];B[jj];W[po];B[nc];W[ic];B[qs];W[ck];B[ss];W[as]\n;B[ep];W[sg];B[ec];W[kk];B[fi];W[hm];B[pm];W[io];B[gc];W[cs]\n;B[bi];W[gd];B[se];W[gm];B[fs];W[fj];B[lg];W[hd];B[kq];W[mr]\n;B[ke];W[ge];B[ha];W[dm];B[jl];W[jg];B[cf];W[gj];B[oi];W[oh]\n;B[nd];W[ql];B[ds];W[rl];B[je];W[fd];B[rg];W[bm];B[rc];W[cg]\n;B[sd];W[eg];B[sq];W[hc];B[pn];W[df];B[hi];W[ni];B[ml];W[ik]\n;B[nk];W[cd];B[ri];W[hf];B[gh];W[fb];B[bk];W[if];B[kc];W[co]\n;B[fl];W[re];B[og];W[lc];B[qh];W[jo];B[dr];W[gp];B[hg];W[go]\n;B[kg];W[lk];B[nn];W[ei];B[id];W[dh];B[gl];W[eh];B[ej];W[dq]\n;B[kh];W[jk];B[oe];W[hq];B[oo];W[am];B[mo];W[dj];B[cm];W[dg]\n;B[cl];W[kb];B[li];W[tt];B[cp];W[ph];B[mc];W[ra];B[kp];W[rf]\n;B[qp];W[jc];B[oj];W[eo];B[ji];W[ps];B[cn];W[jb];B[ip];W[mm]\n;B[sf];W[mj];B[gk];W[od];B[ej];W[cq];B[kr];W[dn];B[ag];W[pd]\n;B[nm];W[qk];B[kd];W[ek];B[mq];W[lc];B[qf];W[ea];B[js];W[md]\n;B[ir];W[ma];B[rj];W[hn];B[br];W[gf];B[ld];W[me];B[ja];W[fm]\n;B[bn];W[lp];B[pk];W[kf];B[ia];W[hk];B[op];W[mh];B[mk];W[hl]\n;B[bq];W[mp];B[ls];W[qd];B[gk];W[rd];B[is];W[bl];B[pc];W[es]\n;B[qa];W[de];B[ne];W[jn];B[km];W[kn];B[bp];W[md];B[gq];W[er]\n;B[lb];W[al];B[rr];W[la];B[pe];W[pg];B[fr];W[lo];B[ns];W[lc]\n;B[mn];W[rq];B[qm];W[ro];B[nq];W[fa];B[ol];W[ce];B[np];W[ho]\n;B[bs];W[ij];B[ac];W[dp];B[mm];W[jr];B[ng];W[fq];B[fp];W[aj]\n;B[pa];W[rk];B[qr];W[do];B[an];W[ej];B[pi];W[cr];B[iq];W[ln]\n;B[ak];W[cf];B[ms];W[bm];B[hr];W[qe];B[jq];W[sp];B[so];W[hb]\n;B[qj];W[in];B[bh];W[fl];B[si];W[fn];B[of];W[gl];B[oc];W[ap]\n;B[am];W[fi];B[nb];W[ig];B[ad];W[ga];B[om];W[gc];B[ko];W[ja]\n;B[ib];W[gb];B[ks];W[sa];B[nr];W[gn];B[nj];W[ph];B[bl];W[mj]\n;B[qc];W[rd];B[pg];W[sj];B[oh];W[qd];B[aq];W[rf];B[ph];W[bb]\n;B[mi];W[cb];B[lh];W[pd];B[cc];W[ie];B[rp];W[ha];B[sk];W[pp]\n;B[ni];W[jd];B[jr];W[gk];B[ap];W[ff];B[kl];W[da];B[mj];W[lb]\n;B[qe];W[mh];B[qq];W[ef];B[rs];W[ia];B[ae];W[ib];B[nh];W[ah]\n;B[ds];W[dc];B[mr];W[cc];B[qn];W[be];B[sl];W[ae];B[qk];W[dd]\n;B[sj];W[ad];B[rk];W[pb];B[ql];W[ec];B[tt];W[ac];B[rl];W[rb]\n;B[no];W[os];B[tt];W[dr];B[nl];W[id];B[mg];W[ag];B[rh];W[ep]\n;B[or];W[qo];B[bo];W[pr];B[pp];W[os];B[mh];W[ih];B[tt];W[po]\n;B[me];W[qb];B[tt];W[sn];B[al];W[fp];B[od];W[tt];B[tt])

@marcocalignano
Copy link
Member

Ok I did them with the multigames version. With all the improvement merge that comes in ;) I have to keep rebasing but I'd like to do a pull request maybe on the next branch

@kimitsu
Copy link

kimitsu commented Nov 22, 2017

Question: can measures be implemented to prevent simple attacks on Leela? Like checking some statistical characteristics of the input data per user? My only fear is that later on or even right now someone is spamming training data with noise.

@ghost
Copy link

ghost commented Nov 22, 2017

Currently since Leela Zero uses two threads, the result isn't deterministic. So confirming results isn't really possible.

@OmnipotentEntity
Copy link
Contributor

You can give each thread its own random engine and deterministically split work, so it's definitely possible. However, you'd need to implement your own distribution functions in order to make it replicable across platforms.

It would also require server support, because the server would need to select a random engine initialization sequence for each thread and each run.

I have quite a bit of experience in making reproducible pseudorandom results across platforms with multithreaded execution, and if you need I can take a look at the code and see what I can do.

@gcp
Copy link
Member

gcp commented Nov 22, 2017

The problem isn't the random number generators, the problem is that the threads are used to feed hardware that is used by other processes.

And the whole setup of fast parallel game tree searchers generally means that "deterministically split work" means that the algorithm runs much slower than optimal, because the whole point is to split at the most opportune moment. (Work stealing algorithms tend to rule the roost)

Parallel game tree search is like a 35+ years old research topic, the odds of making a breakthrough in a few afternoons aren't that good.

@OmnipotentEntity
Copy link
Contributor

I'm not certain that work stealing algorithms necessarily preclude reproducibility. This is might just be something that no one's tried, rather than a breakthrough.

You're right that the approach in my previous post is completely wrong. (Because this is a tree search, whereas my previous work had been on a 2d plane.)

However, we can shift the problem like this: (assuming we need x bits of random numbers maximum per node visit) We don't know which nodes we will visit, so we need a way of numbering nodes, if we come up with a way, we can simply bin a bunch of random numbers and pull from the proper area of the array. There's a rub though, and that's coming up with a method of numbering nodes such that we don't need to bin gigabytes of RNG, due to how explosively huge the search space is.

If it is possible to order nodes by their desired visit order deterministically (ie, like an A* search), then the scheme will work. Otherwise a small and simple hashing scheme might be doable (think 16-24 bit), but probably would ultimately screw up the randomness and likely lead to worse training data due to hash collisions reusing RNG data, (birthday paradox says about 1% collisions at 16-bit (that's about 10 per move), and 6e-3% for 24-bit (that's about 10 per game), but this requires generating 64x kb and 16x Mb of RNG engine calls per move respectively).

Another possible scheme is some sort of (cryptographic?) hash function that takes a naive numbering scheme (such as breadth first numbering, would require gmp which can be expensive) and a single random number from the engine and uses the hash as the RNG bits (if x is more than the hash length, it's possible to generate more hashes by appending sequential bytes to the end of the message). But again, this has some question marks surrounding just how good of random numbers you'll get (but this is at least probably fine.)

Honestly, I should actually read the paper so I know what the hell I'm talking about. So if I'm that far off base, just tell me to RtFP. And I totally understand if all this complexity isn't really warranted, as the probability of bad actors is rather low.

@gcp
Copy link
Member

gcp commented Nov 22, 2017

I don't want to ruin your fun in exploring that topic, but as for the problem at hand, it's just easier to run 2 instances with 1 thread, if we worry about bad actors.

The multi-threading is important for playing real games with a clock - where running a 2nd instance does nothing useful - but for just generating the data it is not needed. After removing multi-threading, I suspect making the engine reproducible is not that difficult, although the floating point accuracy of GPUs (how stable are those?) may be an issue.

Requiring independently verified results cuts the throughput in half, but it seems to be how BOINC works. I think that may be required if a larger, more long term run, on the order of the full Alpha Go Zero network size, would be done.

@OmnipotentEntity
Copy link
Contributor

OmnipotentEntity commented Nov 22, 2017

It's not just GPUs. CPUs also have floating point reproducibility issues. For instance, the RSQRTPS instruction is only required to give an estimate, and on Haswell CPUs (at least) the lower 12 bits of the mantissa are always all zeros:

rsqrt(1.162357688 (3F94C823)) = precise 0.927534580 (3F6D72E8) | rsqrtps 0.927490234 (3F6D7000)
rsqrt(1.745346546 (3FDF6784)) = precise 0.756936014 (3F41C68F) | rsqrtps 0.756835938 (3F41C000)
rsqrt(1.880780697 (3FF0BD6C)) = precise 0.729173601 (3F3AAB1F) | rsqrtps 0.729248047 (3F3AB000)
rsqrt(1.839437246 (3FEB72AE)) = precise 0.737322569 (3F3CC12C) | rsqrtps 0.737304688 (3F3CC000)
rsqrt(1.323228836 (3FA95F90)) = precise 0.869325697 (3F5E8C21) | rsqrtps 0.869506836 (3F5E9800)
rsqrt(1.288511395 (3FA4EDF1)) = precise 0.880959332 (3F61868D) | rsqrtps 0.880981445 (3F618800)
rsqrt(1.943412185 (3FF8C1BB)) = precise 0.717327595 (3F37A2C8) | rsqrtps 0.717285156 (3F37A000)
rsqrt(1.910214782 (3FF481EB)) = precise 0.723533928 (3F393985) | rsqrtps 0.723510742 (3F393800)
rsqrt(1.700642347 (3FD9AEA6)) = precise 0.766820133 (3F444E53) | rsqrtps 0.766845703 (3F445000)
rsqrt(1.856116772 (3FED953C)) = precise 0.734002173 (3F3BE791) | rsqrtps 0.734008789 (3F3BE800)

@hot-bit-git
Copy link

Hi, do you still consider launching BOINC project for Leela?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests