You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @rom1504
I know there are three versions of the parquet files as below.
Version
Parquet file size
Hash value
Total size
1.0
1.6G
5b54c5d5
400 million
2.0
3.6G
03f11a48
800 million
3.0
4.9G
f27692e1
1.1 billion
So I wonder know if the parquet files in different versions are one-to-one correspondence.
I download the 400 million version dataset. What should I do if I'd like to download the newest version of the dataset without downloading the duplicate files?
The text was updated successfully, but these errors were encountered:
The other 2 versions you mention are work in progress, and are not yet fully ready for use (for example these versions 2 and 3 are not fully randomly shuffled unlike version 1, which is an important property for use of the dataset)
We will release a larger version of the dataset with a few billions samples in a few months.
Do you have any deadlines / uses of the larger dataset (larger than 400m) on your side?
Hi, @rom1504
I know there are three versions of the parquet files as below.
So I wonder know if the parquet files in different versions are one-to-one correspondence.
I download the 400 million version dataset. What should I do if I'd like to download the newest version of the dataset without downloading the duplicate files?
The text was updated successfully, but these errors were encountered: