You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently trying to run your data/process.py code with customed crawled video.
And everything works well except the text_iterator().
I thought it is because I couldn't make "txt.jsonl.zst" which is going to use as random_text for pretraining batch
So I was wondering if there is any reference code or sample data to make "text.jsonl.zst" for my own ?
If it isn't possible, could you be able to explain the role of "random_text" in pretraining step for understanding your work ?
(Because I couldn't understand to align the "random text" with MERLOT-Reserve pre-training objectives)
Thank you,
Haena
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for releasing your work.
I'm currently trying to run your data/process.py code with customed crawled video.
And everything works well except the text_iterator().
I thought it is because I couldn't make "txt.jsonl.zst" which is going to use as random_text for pretraining batch
So I was wondering if there is any reference code or sample data to make "text.jsonl.zst" for my own ?
If it isn't possible, could you be able to explain the role of "random_text" in pretraining step for understanding your work ?
(Because I couldn't understand to align the "random text" with MERLOT-Reserve pre-training objectives)
Thank you,
Haena
The text was updated successfully, but these errors were encountered: