Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting accuracy on MMRecognition SAR Model Training #858

Closed
payal211 opened this issue Mar 21, 2022 · 31 comments
Closed

Not getting accuracy on MMRecognition SAR Model Training #858

payal211 opened this issue Mar 21, 2022 · 31 comments

Comments

@payal211
Copy link

HI @gaotongxiao

I am training Custom dataset for Text recognition using SAR Model. I have total 7K plus images for training. Can you please help me how long should I have to wait for trained model.
As of now it completed 65th epoch and the Accuracy matric at 65th epoch is as below:

2022-03-21 08:06:45,433 - mmocr - INFO - Epoch(val) [65][100] 0_word_acc: 0.0000, 0_word_acc_ignore_case: 0.0000, 0_word_acc_ignore_case_symbol: 0.0000, 0_char_recall: 0.1346, 0_char_precision: 0.1089, 0_1-N.E.D: 0.0776

As you can see precision and recall are very less.

Also can you please suggest any preprocessing technique which you are aware to achieve good accuracy with respect to text recognition task?

Here is the attached SS of training continuation:
Training_66th_epoch

@Mountchicken
Copy link
Collaborator

Hi @payal211 , can you share your config file?

@gaotongxiao
Copy link
Collaborator

According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.

As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.

@payal211
Copy link
Author

payal211 commented Mar 21, 2022

Hi @payal211 , can you share your config file?

Hi @Mountchicken

Here is attached config file,
sar_r31_parallel_decoder_custom_dataset.txt

Thanks

@payal211
Copy link
Author

According to your log, it still needs 18 days to complete training. The accuracy is still very low and it probably has something to do with your hyperparameter configuration. FYI, we trained SAR on 48 GPUs, and you might scale down the learning rate accordingly. We have also provided the detailed log for reference https://download.openmmlab.com/mmocr/textrecog/sar/20210330_105728.log.json.

As for data augmentation techniques, ABINet's pipeline is empirically effective in boosting the model's final performance. But using them won't necessarily reduce the convergence time.

Hi @gaotongxiao,

Thanks for the quick response and suggestion.
Can you please more elaborate what should be the hyperparameters?
I am training this model for 37 characters. (0-9 digits and A-Z alphabets) with 600 epochs.
Is it right or should I have to train it for more time?
If ABINET is good choice for this task then I can try that too.
Thanks for your help.

@Mountchicken
Copy link
Collaborator

Hi @payal211
You should try a larger batch size for faster training. Using 8 samples in a batch only occupy 2g gpu memory in your case. Try 32 or 64 if possible.
BTW, can you show me the config file generated when training start? It contains full information, and it should be located at ./work_dirs or somewhere.

@payal211
Copy link
Author

Hi @Mountchicken,
Thanks for point out me.
Here I am attaching the config file generated when training start.
sar_r31_parallel_decoder_custom_dataset.txt

@Mountchicken
Copy link
Collaborator

Mountchicken commented Mar 21, 2022

Hi @payal211
What does your data look like? BTW I see that you are using DICT90 and if you are predicting characters from 0~9 and A~Z, remember to modify it.

@Mountchicken
Copy link
Collaborator

And there is a hiding bug in SAR. It can't recognize the number 0 and if every test image has 0 in its label, the accuracy will always be 0.00%

@payal211
Copy link
Author

Hi @Mountchicken
Can you please correct me where I have to modify
As per my Knowledge, I modified these 2 files:

  1. .\mmocr-main\mmocr\models\textrecog\convertors\base.py
    line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')

  2. .\mmocr-main\configs_base_\recog_models\sar.py
    line 2 : type='AttnConvertor', dict_type='DICT90', with_unknown=True) to type='AttnConvertor', dict_type='DICT36', with_unknown=True)

and any suggestion how to overcome this hiding bug in SAR?

Thanks.

@Mountchicken
Copy link
Collaborator

Mountchicken commented Mar 22, 2022

Sorry for the late reply. It's ok to modify the dictionary in this way. Here is the bug: When calculating CrossEntropy Loss in SAR, we set the ignore_index to 0 as the default. The ignore_index should point to the <PAD> token in the dictionary for attention mechanism. However, the bug is that when we are building the dictionary, the <PAD> token is not at the position of index 0, instead, the end of the dictionary. check here. So the model has to correctly predict the <PAD> token to minimize the CE loss which is unnecessary and may cause the model hard to converge in your situation.
Here is a method to quickly fix the bug by moving the <PAD> token to the front of the dictionary:
modify this two line in this way:

#self.idx2char.append(padding_token)
self.idx2char.insert(0, padding_token)
#self.padding_idx = len(self.idx2char) - 1 
self.padding_idx = 0

@payal211
Copy link
Author

Thank you so much @Mountchicken.
I will start Training and will update you for result.

@payal211
Copy link
Author

Hi @Mountchicken & @gaotongxiao
I started training on 22nd March 2022, after 15th epoch Accuracy is

[23-03-2022 09:33]
mmocr - INFO - Epoch(val) [15][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9500, 0_1-N.E.D: 0.9679

Then after 21st Epoch Accuracy is:
2022-03-23 12:04:30,316 - mmocr - INFO - Epoch(val) [21][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

Then After 45th Epoch Accuracy is:
2022-03-25 03:38:54,303 - mmocr - INFO - Epoch(val) [45][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9800, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9680

Then After 91th Epoch Accuracy is:

2022-03-28 06:06:50,483 - mmocr - INFO - Epoch(val) [91][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

and after 99th epoch still there's no more difference in Accuracy

2022-03-28 16:55:05,712 - mmocr - INFO - Epoch(val) [99][3056] 0_word_acc: 0.8273, 0_word_acc_ignore_case: 0.8273, 0_word_acc_ignore_case_symbol: 0.8273, 0_char_recall: 0.9799, 0_char_precision: 0.9501, 0_1-N.E.D: 0.9679

so is there Anything I am missing here?
As precision and recall is pretty much good but still not able to recognize text proper on the test Dataset.

@Mountchicken
Copy link
Collaborator

@payal211
Sorry for the late reply. The training process seems stuck after the 15th epoch. And the strange thing is that the char precision is so high.

  • Is it possible that there are some characters in your dataset that are not in DICT90?
  • Could you please describe what's images in your dataset look like. The max decode sequence length is set to be 30 here, and your label length may exceed 30 and that can also cause such a phenomenon.
  • BTW, your training batch size is small, only 8. Try a larger one after we solve this problem.

@payal211
Copy link
Author

payal211 commented Apr 1, 2022

Hi @Mountchicken,

- Is it possible that there are some characters in your dataset that are not in DICT90?
I am training on DICT36 as said earlier
.\mmocr-main\mmocr\models\textrecog\convertors\base.py
line 22 : DICT36 = tuple('0123456789abcdefghijklmnopqrstuvwxyz') to DICT36 = tuple('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')

Could you please describe what's images in your dataset look like. The max decode sequence length is set to be 30 here, and your label length may exceed 30 and that can also cause such a phenomenon.
Sure, Can you please share your Email Id so I can share sample image with you. And decode sequence label length will increased based on character accuracy as it decode multiple characters with different probability for one character.

BTW, your training batch size is small, only 8. Try a larger one after we solve this problem.
Yes, I changed it to 64, and it is taking 10 GB of RAM of GPU out of 24 GB. so sure I will try with larger one after solve this problem.

@payal211
Copy link
Author

payal211 commented Apr 1, 2022

here, is attached config for batch 64 and DICT36 classes, which I modified
sar_r31_parallel_decoder_custom_dataset_batch64.txt
base.txt

.

@Mountchicken
Copy link
Collaborator

@payal211 927922033@qq.com

@Mountchicken
Copy link
Collaborator

Mountchicken commented Apr 1, 2022

Hi @payal211
Your config file is totally fine. It seems that the image you sent to me is contextless. The random combination of numbers and characters can easily confuse SAR reviewed in image 1. Those pictures come from the paper RobustScanner. The accuracy table below is from an experiment that tests recognition algorithms on some random text which is contextless like yours. As you can see, the word accuracy of SAR is the worst. However, the character accuracy can still be high in the first picture.
image
image

  • This might be the reason. A practical way to check is to see the prediction on your test sets. You can try the following commands to visualize the prediction. If the prediction somehow shares a common with the first picture above, then that would be the problem of the algorithm itself.
python tools/recog_test_imgs.py  {PATH_TO_YOUR_TEST_IMAGES} {PATH_TO_YOUR_TXT_FORMAT_LABEL} configs/textrecog/sar/sar_r31_parallel_decoder_custom_dataset_batch64.py {PATH_TO_CHECKPOINTS}
  • BTW, your dataset seems to have a clean background. And I recommend you use CRNN instead. CRNN works pretty well on contextless datasets and training CRNN is also much faster with a less GPU cost.

@payal211
Copy link
Author

payal211 commented Apr 1, 2022

Hi @Mountchicken
Really Appreciated.
Thank you for all this details. So Definitely I will Try CRNN and check the accuracy.

@payal211
Copy link
Author

payal211 commented Apr 8, 2022

Hi @Mountchicken

I trained CRNN Model but after the second epoch, loss_ctc and loss both went infinite.
Here you can find log file for your reference.
20220407_095658.log
Can you please look into this.

Thank you

@Mountchicken
Copy link
Collaborator

Hi @payal211
You can replace loss=dict(type='CTCLoss') to loss=dict(type='CTCLoss', flatten=False, zero_infinity=True) in https://github.com/open-mmlab/mmocr/blob/main/configs/base/recog_models/crnn.py#L10. You can also take a look at this issue

@payal211
Copy link
Author

Hi @Mountchicken

I had tested Both model SAR and CRNN and I am not able to recognize the TEXT with Good accuracy.

Here I am sharing the accuracy log.
Trained SAR model and checked ACCURACY at 183rd epoch, here is the attached log file for this.
20220411_043023.log

Trained CRNN Model and checked accuracy at 421th epoch and here is the attached log file for this
20220411_074324.log

As of now CRNN model recognize only digits with very less score.
Can you please help me? should I have to stop training or continue?

Thank you

@Mountchicken
Copy link
Collaborator

Hi @payal211
I think we should stop training now.

  • I rechecked your log file and found that the number of repetitions in both your training and test sets is 100. Let's start by setting them to 1. ( This is an example to show you where to change train_repeat test_repeat) This can save you a lot of training time and may even be the problem.
  • I also rechecked the training set image you sent me and I found that there is a large amount of white border in the whole image and the text area only takes up a small piece. I presume this is a screenshot and not the actual dataset? If the dataset really looks like this, then you need to crop out the text area separately and feed it to the recognizer.

@payal211
Copy link
Author

@Mountchicken

Thanks. I will do needful.
And yes, I sent you the raw data, I am cropping that particular portion containing text and feeding those cropped images into the training.

@payal211
Copy link
Author

Hi @Mountchicken,

I tried with your suggestion, but no luck.

After 1200th epoch the accuracy result is in attached log file
20220412_100903.log

and I continuing till 2400th epoch, and log file is here,
20220412_144955.log

@Mountchicken
Copy link
Collaborator

We are now in a bottleneck.

  • Is the log file above about training CRNN?Previously, your SAR was able to have about 80% accuracy, why is the accuracy of these training 0? Maybe we can try SAR again with repeat = 1.
  • I am also confused now. Do the test set images have the same style as the training set? For example, do the test set images have a black background and light text like the one you sent me before. If the test set is quite different from the training set, you may need to consider data augmentation. All I can think of right now is that you comment out the Normalize operation(NormalizeOCR in SAR) in the train pipeline and test pipeline first. Because your image style is not very suitable for normalization

@payal211
Copy link
Author

Yes, the above log file is about CRNN.
Previously, SAR model just show the precision and recall above 80% but it wasn't work well on test data.
here, is the attached log file for SAR Model
20220411_043023.log

so okay, I can give one more try again with SAR and repeat = 1 & without NormalizationOCR.

Yes, I have same style test data as the training set.

@balandongiv
Copy link
Contributor

Hi @Mountchicken, I still unable to understand what the purpose of repetition and how it affect the training time.

I rechecked your log file and found that the number of repetitions in both your training and test sets is 100. Let's start by setting them to 1. ( This is an example to show you where to change train_repeat test_repeat) This can save you a lot of training time and may even be the problem.

The documentation define repeat as Repeated times of dataset.

Correct me if am wrong. Say for example, if we set repeat=100 for both training and test sets. Does it mean the dataset being train or evaluated for 100 times ?

@Mountchicken
Copy link
Collaborator

@balandongiv
Yes. For example, if repeat is set to 10, then the number of training iterations will also be expanded by a factor of ten. However, the number of repeat for the test set should be 1.

@balandongiv
Copy link
Contributor

Thanks for the confirmation @Mountchicken .

But, any particular reason to repeat the training on a train dataset for x repeat of times? Wont this cause over fitting to the training dateset. Also, any advice/recommendation for the maximum number for repeat. I notice, at least in the toy datesets, the value was assigned to 100.

@gaotongxiao
Copy link
Collaborator

Sometimes this feature is needed when we train a model on a set of datasets with imbalanced sizes, where tiling the small dataset several times is the most straightforward way to alleviate the bias brought by the large ones. SAR is an example.

@balandongiv
Copy link
Contributor

Thanks for detail explanation @gaotongxiao

@gaotongxiao gaotongxiao closed this as not planned Won't fix, can't repro, duplicate, stale Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants