Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw training data #167

Open
gcp opened this issue Nov 24, 2017 · 194 comments
Open

Raw training data #167

gcp opened this issue Nov 24, 2017 · 194 comments

Comments

@gcp
Copy link
Member

gcp commented Nov 24, 2017

http://leela.online-go.com/training/

License: Public Domain
Hosting sponsored by OGS.

@zhu-jz
Copy link

zhu-jz commented Nov 24, 2017

thank you

@Ttl
Copy link
Member

Ttl commented Dec 4, 2017

I have serious problems with downloading the files. I have tried to download the 92c658d7.zip for several times and each time the download gets interrupted after few minutes. I tried with both Firefox and Chrome but neither seem to be able to download it. Neither browser is able to resume the download and I must start the download from the beginning after failure. The same issue happened earlier with the other files but I was able to download them after retrying few times.

@Marcin1960
Copy link

Perhaps FTP would be better? There are nice freeware clients that will continue broken downloads and have various options.

@gcp
Copy link
Member Author

gcp commented Dec 4, 2017

FTP is a far, far worse option as its much less reliable through firewalls. HTTP supports resuming just as well. There's a reason FTP is not used anywhere any more.

Unfortunately I can't really do much here. They are on Google Cloud storage buckets. It's possible downloading them with gsutil allows resuming to work, but I'm not sure. I can try splitting up the data in smaller chunks, maybe.

@Prillan
Copy link

Prillan commented Dec 4, 2017

You can use the -c option with wget to continue a download.

@Marcin1960
Copy link

Well, one can try Firefox Add-ons.

@gcp
Copy link
Member Author

gcp commented Dec 4, 2017

Firefox (and Chrome and every browser) can resume HTTP downloads themselves. There must be something with Google's Cloud Platform that breaks HTTP resumes if it doesn't work.

@Marcin1960
Copy link

https://cloud.google.com/compute/docs/instances/transfer-files

On Windows workstations, use the WinSCP client to manage files on your instances through a graphical file browser interface.

Use the gcloud command-line tool. The gcloud compute scp command provides SCP file transfer, creating a SSH key pair for you the first time you connect. Your private key is stored on your local device and its corresponding public key is copied to project or instance metadata.

Open an SFTP connection in your native file browser. On many Linux and macOS systems, the native file browser can connect to the home directory on your instance.

Use the SCP command-line tool. This command works similarly to the gcloud compute scp, but requires you to manually manage your SSH keys.

@marcocalignano
Copy link
Member

I have a little question. I am running "dump_supervised all.sgf train.txt" on all.sgf that has 900K games.
Is it normal that it is running since more than 24 Hours wrote more then 12 Gb data and goes over the game over and over again? ( 271169740 positions Writing chunk 16550.)

@gcp
Copy link
Member Author

gcp commented Dec 26, 2017

It passes over the SGF multiple times every time writing out one out of every x positions in a random rotation, so yes.

@isty2e
Copy link

isty2e commented Dec 31, 2017

During several trials, I constantly fail to download these files. The server doesn't allow me to resume, and the connection is really slow and unstable. Will it be a bad idea to create bittorrent seeds for these?

@killerducky
Copy link
Contributor

Have you tried "wget -c"? (I haven't tried it because I'm able to download the files in one shot.)

@zediir
Copy link
Contributor

zediir commented Jan 1, 2018

I'm going to download them and create a torrent as I have around 200Mbps to ogs.

@isty2e
Copy link

isty2e commented Jan 1, 2018

@killerducky Currently the download is ongoing... But it will take at least a week per file, so for the answer to be clear it will take some time.

@zediir
Copy link
Contributor

zediir commented Jan 1, 2018

Here's the magnet link


@isty2e
Copy link

isty2e commented Jan 1, 2018

@zediir Thank you for sharing. The bittorrent client is trying to download it, hopefully it will be faster.

@zediir
Copy link
Contributor

zediir commented Jan 1, 2018

@isty2e I'm also uploading it to my seedbox so that might make downloading faster. It'll take a while though as I'll have to upload 156GB.

@zediir
Copy link
Contributor

zediir commented Jan 3, 2018

@isty2e Managed to get the seedbox working. Your download speed should be somewhat faster now.

@isty2e
Copy link

isty2e commented Jan 3, 2018

@zediir Indeed it is! It was like 35KB/s before and it is much faster now.

@zediir
Copy link
Contributor

zediir commented Jan 4, 2018

I'm going to keep uploading the training data to the seedbox and posting (editing this post) updated magnet link to a new torrent when @gcp uploads a new batch of training data.

@roy7
Copy link
Collaborator

roy7 commented Jan 4, 2018

@isty2e Is your download speed issue that you're in China? If @zediir doesn't already, maybe he or someone else out there would run a seedbox located in China. That might give the best result.

@zediir
Copy link
Contributor

zediir commented Jan 4, 2018

Unfortunatly seedbox.io that I use only offers servers from Netherlands, Romania and France. Though @isty2e seems to be downloading much faster (about 2MB/s I think) now that they are downloading from the seedbox.

@MartinVingerhoets
Copy link

@zediir I downloaded it at 12MB/s (from the Netherlands)

@zediir
Copy link
Contributor

zediir commented Jan 4, 2018

Ah that was you :). 12.5MB/s (100Mbps) is the limit for that particular seedbox .

@roy7
Copy link
Collaborator

roy7 commented Jan 4, 2018

Perhaps @isty2e sets up his own Chinese seedbox to mirror off yours and then downloads from that one once it's all there. :) (Google tells me there are many Chinese seedbox companies.)

@isty2e
Copy link

isty2e commented Jan 5, 2018

Well I am not in China, and my network connection and bandwidth are more than fine. They are exceptionally bad for OGS strangely. Anyway, thanks to @zediir, the training data are all downloaded.

@bjiyxo
Copy link

bjiyxo commented Apr 11, 2019

@gcp Will you upload training data this week?

@Ttl
Copy link
Member

Ttl commented Apr 30, 2019

leela.online-go.com hasn't worked for a day. Gives error 522 "Connection timed out" for me.

@anoek
Copy link
Contributor

anoek commented Apr 30, 2019

@Ttl: There seems to be an issue going on at the provider, I've opened up a support ticket to get the issue resolved.

@gcp
Copy link
Member Author

gcp commented May 2, 2019

Any progress? The old networks are also offline as they are hosted on the storage server, and that also means I can't clean up those on the live server...

@anoek
Copy link
Contributor

anoek commented May 2, 2019 via email

@gcp
Copy link
Member Author

gcp commented May 3, 2019

I started re-uploading old networks from my local backup. This is about 108G and it will take me about 2 days.

Re-uploading the >1TB of training data...well you do the math. I will start uploading the new data and hope the old server eventually pops up back long enough so we can transfer from there.

I think we also had the 9x9 networks, the SGF files for matches and self-play on there (just mentioning this for my own reference).

Thanks for the help @anoek. The new server seems to be a lot more responsive too, so yay.

@anoek
Copy link
Contributor

anoek commented May 3, 2019

Yeah I noticed it was a lot faster as well, so at least that's a win.

Terribly sorry about the problems all. I've never had a service provider just drop a server on the floor and not even respond to any tickets before.

I'll let you know if they do manage to get it back up before you finish uploading everything so we can finish with a server to server transfer.

@gcp
Copy link
Member Author

gcp commented May 5, 2019

Everything should be back up except for older training data.

@bubblesld
Copy link

the file train_c9fb22c7.zip uploaded on May 3 is of size 7.4gb.
the same file name is of size 4.6gb on May 5.
Is there something wrong?

Are we going to get new training data this week?

@AncalagonX
Copy link
Contributor

AncalagonX commented May 14, 2019

When someone has a chance, could we get new Tensorboard logs posted? Right now there aren't any LZ Tensorboard logs available for download.

@gcp
Copy link
Member Author

gcp commented May 14, 2019

Is there something wrong?

I'll redump and reupload this but that will take a while. Possibly the second dump was missing the part of the data that went out of the training window.

Are we going to get new training data this week?

Uploading should finish in about 10 minutes.

@bubblesld
Copy link

Is there something wrong?

I'll redump and reupload this but that will take a while. Possibly the second dump was missing the part of the data that went out of the training window.

I have most of the old files. I can upload them if necessary.

Are we going to get new training data this week?

Uploading should finish in about 10 minutes.

Thanks a lot.

@gcp
Copy link
Member Author

gcp commented May 14, 2019

When someone has a chance, could we get new Tensorboard logs posted? Right now there aren't any LZ Tensorboard logs available for download.

Up in https://sjeng.org/zero

@gcp
Copy link
Member Author

gcp commented May 28, 2019

the file train_c9fb22c7.zip uploaded on May 3 is of size 7.4gb.
the same file name is of size 4.6gb on May 5.
Is there something wrong?

This file was re-uploaded now.

@bubblesld
Copy link

Can we also have the update of ac9bcd63? The one uploaded on August 1 should not contain all selfplay from the size of the file.

@gcp
Copy link
Member Author

gcp commented Aug 20, 2019

Ok, will check.

@l1t1
Copy link

l1t1 commented Aug 20, 2019

train_a4f5d99a.zip 1433662354 2019-Aug-19 09:06
train_1d9360fd.zip 6657558763 2019-Aug-19 09:04
train_657ae4dd.zip 658146911 2019-Aug-19 09:00
train_fe85a8e4.zip 5849475618 2019-Aug-19 08:59
train_16d9854c.zip 8094154573 2019-Aug-01 07:46
train_ac9bcd63.zip 7188541271 2019-Aug-01 07:35

@gcp
Copy link
Member Author

gcp commented Aug 23, 2019

Redumped and re-uploaded ac9bcd63.

@bubblesld
Copy link

Redumped and re-uploaded ac9bcd63.

many thanks

@bubblesld
Copy link

It looks like the self-play games from v241 (466fa23a) are missing in the update today?

@22nsuk
Copy link

22nsuk commented Dec 2, 2019

@gcp When will it be updated? It's been a while since it wasn't updated.

@gcp
Copy link
Member Author

gcp commented Apr 20, 2020

@anoek The server seems to have been down for a few days, can you investigate?

@anoek
Copy link
Contributor

anoek commented Apr 20, 2020

@gcp fixed. For some reason the VM was powered off, I'll investigate further. I guess I should have had a heartbeat check on that server..

@gcp
Copy link
Member Author

gcp commented Apr 20, 2020

Thanks! Will try to upload missing data overnight.

@gcp
Copy link
Member Author

gcp commented Mar 3, 2021

I just finished re-uploading all the files that were lost in one of the server moves/outages. So all the training data is complete now. It will stay up as long as @anoek keeps hosting it.

@VivianoRiccardo
Copy link

Do these training data show also the output of the Value function of a board position?

@Vandertic
Copy link

Do these training data show also the output of the Value function of a board position?

I don't think so. Should be just input position (8 moves), color of the current player, outcome of the game and visits distribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests