Skip to content
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.

Training on TPU #89

Open
Dhanachandra opened this issue Jan 12, 2022 · 1 comment
Open

Training on TPU #89

Dhanachandra opened this issue Jan 12, 2022 · 1 comment

Comments

@Dhanachandra
Copy link

How to train the GPT2-xl on TPU? And which TPU can be used to train? And what would be RAM size?

@Noah-Huppert
Copy link

I'm not 100% sure, because I decided to ditch my TPU efforts before I got training working (TPUs ended up being way to expensive and during my dev work I was on a way too small VM so training was failing to do OOM errors on the VM), but I think if you put the following code before the tf.Session() is created in train.py it will connect to a TPU:

tpu_resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu="<TPU NODE NAME HERE>")
tf.config.experimental_connect_to_cluster(tpu_resolver)
tf.tpu.experimental.initialize_tpu_system(tpu_resolver)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants