-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when training on Synth 90k #31
Comments
Thanks for the report! Could you please provide the image in the dataset that errors out? Does this error appear from the beginning, or at some point in the middle of the training process? |
happens in the middle, roughly around 1300-1500 steps with batch size of 512. I hard tried but I could not identify that problematic image. Please help. |
It's pretty much impossible to help unless I know what the image is, unfortunately. You can try inserting some debugging line that would output the list of images in the batch, and then try to narrow it down to a particular one, or just add a catch that would ignore a failed batch and continue training. Might be that your dataset is corrupted. |
Hi @emedvedev, I've faced similar errors while training on my data. I debugged which images are giving the similar errors and I used Imagemagick's convert command to convert that image to gray scale and the commands are working fine. I think the issue is with this line How about changing it to:
|
@tumusudheer thanks for investigating the issue! Can you confirm that the proposed change works with a "broken" image? @thisismohitgupta you can try applying the proposed patch and re-training your model. Please tell me if it helps! |
@emedvedev it did'nt work for me. |
adding the following lines solves the problem try:
image = Image.open(IO(img)).convert('RGB')
except Exception as e:
continue
if self.max_width and (image.size[0] <= self.max_width):
... |
Well, then we're just silently skipping broken images, which isn't very good. Is there any way to make an image reading/conversion more bulletproof so that we wouldn't skip anything? |
HI @emedvedev , I trained with my proposed change yesterday, and I just verified the results. They are good This change worked for me:
While skipping the images, we can create a log file which lists all broken images. After preparing training data, people can verify what is wrong with the images in log file and try to fix/verify the images. |
Sweet! Would you mind opening a PR with the change then? If you want, you can also implement image skipping there, would be great. |
Hi @emedvedev , |
Hi I got some problems when I want to train. NotFoundError (see above for traceback): ./home/user/Dataset/CAPTCHAs/training.tfrecords Did you met this problem before? I don't know how to figure it. |
Your dataset path is incorrect. I'd say it's that dot in the beginning. :) |
Hi Guys, I encounter the same problem. But, for some reason both solutions are not working for me. Therefore I set the batch size to 1 and checked witch image caused the problem, it is: mnt/ramdisk/max/90kDICT32px/2194/2/334_EFFLORESCENT_24742.jpg' But, there could be more ... Cheers, |
@MBleeker Hi there! Just checking: have you set the
If it's set correctly, then could you provide the full log of your run? |
Hi @emedvedev, I found the problem already. I never used setup.py before ... I did not know about the .egg files. The updates I made were therefor not used while running the code. It is working now. There are several corrupted images About the bias terms we discussed in #70 . I added them, results seem not be significantly better, but not worse either (the only problem is that you cannot use previous trained models anymore, because the variables are not stored in the checkpoint). Did you try this code with a different set op hyper params than the defaults? Any different results? Cheers, |
If there's no visible benefit, I'd rather maintain backward compatibility, but if you find out that bias terms do have significant benefit with some datasets, please submit a PR, I'd really appreciate it!
I tried to tweak it a little, but mostly just depends on the dataset. I find the defaults to be sensible, but maybe someone else will have something to add here, too. :) |
I'll close the issue since the original problem has been fixed, so if anyone else has issues with Synth90k, just open a new one. :) |
2017-10-22 23:07:17.471187: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Invalid JPEG data, size 1024
In
image = tf.image.decode_png(img, channels=1)
The text was updated successfully, but these errors were encountered: