New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download stall at the end #164
Comments
I remember having one successful termination using Ctrl-C but with CLI argument --image_size 512 instead of --resize_mode no, but I can't be sure. The default command suggested at https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md |
How's your resource usage when it's stuck ? (CPU, network, disk) You may choose keep_ratio mode if you want to resize without border. |
Also you want to choose the process count to be like your number of cores |
I did not setup a knot receiver. Regarding the process count to be number of cores, the instance I think should have matching specs. Also the weirld thing is every run usually seem to stall after it has downloaded everything it can. For both tries, the number of images downloaded is the same. For cc3m, both runs stopped around 2.7 plus mil. I also have one run for cc12m, which stalled after 11.5 mil. |
Also, when I mean stalled, I mean progress seemed to have stop and stdout shows same number of images done. I believe the program is still running as I am able to terminate it with ctrl-c. |
When this happens, in the download folder, the timestamp for the tar files can be long in the past like the latest modified file being few hours ago. |
I see. Seems to be the same thing as #74 which I wasn't able to reproduce in my environment. I'm interested to be able to reproduce why it's getting stuck. However in practice the output should be ok if you Ctrl+c Do you see anything wrong about the output? Are you using webdataset for loading the output ? |
I'm using webdataset to use the files for training Dalle 2 models. Training stopped with dataloader process complaining of tar files abrupt end of file. |
Can you share the errors ? And can you use a loader with error handling like this https://github.com/rom1504/laion-prepro/blob/main/laion5B/usage_guide/dataloader_pytorch.py ? |
|
Just an update, I was able to complete one round of cc3m download with the following parameters:
|
I'm now rerunning the above with --resize_mode no to see if it is the culprit. |
this should now be solved thanks to retrying feature, please update and try again |
I'm trying to download the CC3M dataset on an AWS Sagemaker Notebook instance. I first do pip install img2dataset. Then I fired up a terminal and do
Code runs and downloads but stalls towards the end. I tried terminating by restarting the instance (restart), as a result, some .tar files are having read error "Unexpected end of file" while using the tar files for training. I also tried to terminate it using Ctrl-C on a second run, which result in the same read error when using the tar files for training. The difference between two termination methods is the later seemed to do some cleanup which removed "_tmp" folder within the download folder.
The text was updated successfully, but these errors were encountered: