Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to pretrain a ctrl model from scratch ? #29

Closed
xurongqiang opened this issue Sep 26, 2019 · 5 comments
Closed

how to pretrain a ctrl model from scratch ? #29

xurongqiang opened this issue Sep 26, 2019 · 5 comments

Comments

@xurongqiang
Copy link

We wanna pretrain a ctrl model from scratch, could you provide some implementation details?
What is the format of the training sample and can the training process be finished with script training.py ?

@keskarnitish
Copy link
Contributor

The training_utils folder should be a great starting point here.
Collect your data (Wikipedia/Reddit/News/etc.), convert to TFRecords with the appropriate control codes, transfer the data to GCS if using TPUs, and then train. We used Adagrad with a linear warmup and no learning rate decay. With TPUs, you can spawn a cloud TPU pod and the Estimator will take care of all data parallelism.

@xurongqiang
Copy link
Author

The training_utils folder should be a great starting point here.
Collect your data (Wikipedia/Reddit/News/etc.), convert to TFRecords with the appropriate control codes, transfer the data to GCS if using TPUs, and then train. We used Adagrad with a linear warmup and no learning rate decay. With TPUs, you can spawn a cloud TPU pod and the Estimator will take care of all data parallelism.

How are these two files(codes && control_codes.txt) generated?

@keskarnitish
Copy link
Contributor

codes has nothing to do with the control codes; it is the BPE codes you get from fastBPE (see https://github.com/glample/fastBPE#learn-codes)

For control_codes.txt, you can first collect your data and decide the list of control codes you want (this is the first column). Then, TFRecord each file with its corresponding control code map and then figure out the percentage of data later (if this is relevant to you; for training it isn't).

@leejason
Copy link

leejason commented Oct 1, 2019

Is it feasible to train a CTRL model from scratch on Colab with free TPU? If not, how many TPUs would be required for how much money?

@keskarnitish
Copy link
Contributor

keskarnitish commented Oct 2, 2019

On a large amount of data, I don't think that would work. We trained on 256 cores of the Cloud TPU v3 Pod. You should be able to train on slightly smaller slices with a commensurate increase in training time as well. Regd. pricing, I think the best resource would be Google's official sheet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants