You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Indeed it is, you have an overflow on the sentence id. Currently we do not support a config to let you change the datatype for sentence id, so the quickest solution is to find your document with >= 312748 sentences and split it into smaller documents (it's document number 5179808 on this specific input file, based on your log).
Note that the issue here is not the number of documents but rather that one individual document has a lot of sentences
I'm processing a large dataset (30BN tokens) -- is this a numerical overflow issue?
Is there a quick fix? Thank you!
The text was updated successfully, but these errors were encountered: