Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throwing errors #2

Closed
rpro91 opened this issue Nov 6, 2019 · 10 comments
Closed

throwing errors #2

rpro91 opened this issue Nov 6, 2019 · 10 comments

Comments

@rpro91
Copy link

rpro91 commented Nov 6, 2019

Hi Sushant,

Great work by you!! kudos sir.

I am facing the following issues while running this model.

in DataLoader.py file
for reading the data from ground truth text file

GT text are columns starting at 10

  | 77 | gtText_list = lineSplit[9].split('|')
  | 78 | gtText = self.truncateLabel(' '.join(gtText_list), maxTextLen)

this throws the error -- index out of range and on correcting
gtText_list = lineSplit[8].split('|')

Also in main.py file
totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch
 
 |26| while True:
| 27 | epoch += 1
| 28 | print('Epoch:', epoch, '/', totalEpoch)

is also throwing the error. On commenting totalEpoch line and sending epoch to print statement-

#totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

while True:
    epoch += 1
    print('Epoch:', epoch, '/', epoch)

Also Autocorrect in spellchecker.py is shown as depreceated and on changing it to pyspellchecker v.4.0

I am able to run the model but on training from scratch its showing very high validation CER of around 43.
let me know if change in spellchecker and other performed changes can lead to this. Also let me know if some other approach has to be taken for training this model on IAM line based dataset

@Sammul40619
Copy link

i also get very high validation CER of around 43 when training. want to know why

@rpro91
Copy link
Author

rpro91 commented Nov 7, 2019

@Sammul40619 is there any way out of reducing the CER or any other idea related to handwritten text recognition.

@Sammul40619
Copy link

@rpro91 i am trying, i am checking the model and input dataset, dont get anything now. how about you?

@rpro91
Copy link
Author

rpro91 commented Nov 7, 2019

@Sammul40619 not yet.. tried every method .. now running Line HTR model of lamhoangtung(parent version of this model) with IAM dataset ... Lets see if it works . If you found any solution or any other model. please let me know.

@Sammul40619
Copy link

@rpro91 ok, i also want to try the model of lamhoangtung. are you Chinese? If you found any solution or any other model. please let me know too.

@rpro91
Copy link
Author

rpro91 commented Nov 7, 2019

@Sammul40619 I am an Indian :) :) surely will let you know!!

@Sammul40619
Copy link

@rpro91 ok! keep in touch!

@NonMundaneDev
Copy link

Has anyone found any solution to this?

@Sammul40619
Copy link

@sushant097
Copy link
Owner

sushant097 commented Jan 14, 2020

Hi Sushant,

Great work by you!! kudos sir.

I am facing the following issues while running this model.

in DataLoader.py file
for reading the data from ground truth text file

GT text are columns starting at 10

  | 77 | gtText_list = lineSplit[9].split('|')
  | 78 | gtText = self.truncateLabel(' '.join(gtText_list), maxTextLen)

this throws the error -- index out of range and on correcting
gtText_list = lineSplit[8].split('|')

Also in main.py file
totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch
 
 |26| while True:
| 27 | epoch += 1
| 28 | print('Epoch:', epoch, '/', totalEpoch)

is also throwing the error. On commenting totalEpoch line and sending epoch to print statement-

#totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

while True:
    epoch += 1
    print('Epoch:', epoch, '/', epoch)

Also Autocorrect in spellchecker.py is shown as depreceated and on changing it to pyspellchecker v.4.0

I am able to run the model but on training from scratch its showing very high validation CER of around 43.
let me know if change in spellchecker and other performed changes can lead to this. Also let me know if some other approach has to be taken for training this model on IAM line based dataset

@rpro91 are you using word IAM dataset or lines IAM dataset. It depends on parsing the lines.txt or words.txt file. I have no idea why such high CER is found. Play with batch size=50, or you normalize images or not before training. Even without data augmentation it should have around 23 %.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants