-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a question, thank you for your reply #15
Comments
Hi!
Our code base also supports tokenization for any hugging face dataset without any additional effort on your part. E.g., if you want to try Fine Web, Slim Pajama, etc.
|
So thank you for your timely reply! |
I am currently trying to train a model from scratch using the Pile dataset. I would like to add that it was necessary to run |
Hi, thank you for your nice work. I have a question about training. If I want to train your model on the pile-uncopyrighted dataset (just uncopyrighted pile), how should I prepare or pre-process the dataset?
The text was updated successfully, but these errors were encountered: