throwing errors #2

rpro91 · 2019-11-06T09:39:12Z

Hi Sushant,

Great work by you!! kudos sir.

I am facing the following issues while running this model.

in DataLoader.py file
for reading the data from ground truth text file

GT text are columns starting at 10

| 77 | gtText_list = lineSplit[9].split('|')
| 78 | gtText = self.truncateLabel(' '.join(gtText_list), maxTextLen)

this throws the error -- index out of range and on correcting
gtText_list = lineSplit[8].split('|')

Also in main.py file
totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

|26| while True:
| 27 | epoch += 1
| 28 | print('Epoch:', epoch, '/', totalEpoch)

is also throwing the error. On commenting totalEpoch line and sending epoch to print statement-

#totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

while True:
    epoch += 1
    print('Epoch:', epoch, '/', epoch)

Also Autocorrect in spellchecker.py is shown as depreceated and on changing it to pyspellchecker v.4.0

I am able to run the model but on training from scratch its showing very high validation CER of around 43.
let me know if change in spellchecker and other performed changes can lead to this. Also let me know if some other approach has to be taken for training this model on IAM line based dataset

The text was updated successfully, but these errors were encountered:

Sammul40619 · 2019-11-07T02:18:43Z

i also get very high validation CER of around 43 when training. want to know why

rpro91 · 2019-11-07T05:43:27Z

@Sammul40619 is there any way out of reducing the CER or any other idea related to handwritten text recognition.

Sammul40619 · 2019-11-07T05:46:41Z

@rpro91 i am trying, i am checking the model and input dataset, dont get anything now. how about you?

rpro91 · 2019-11-07T05:58:25Z

@Sammul40619 not yet.. tried every method .. now running Line HTR model of lamhoangtung(parent version of this model) with IAM dataset ... Lets see if it works . If you found any solution or any other model. please let me know.

Sammul40619 · 2019-11-07T06:24:08Z

@rpro91 ok, i also want to try the model of lamhoangtung. are you Chinese? If you found any solution or any other model. please let me know too.

rpro91 · 2019-11-07T06:37:28Z

@Sammul40619 I am an Indian :) :) surely will let you know!!

Sammul40619 · 2019-11-07T06:38:18Z

@rpro91 ok！ keep in touch！

NonMundaneDev · 2019-12-02T13:10:17Z

Has anyone found any solution to this?

Sammul40619 · 2019-12-03T08:10:57Z

try this https://github.com/lamhoangtung/LineHTR

sushant097 · 2020-01-14T04:17:23Z

Hi Sushant,

Great work by you!! kudos sir.

I am facing the following issues while running this model.

in DataLoader.py file
for reading the data from ground truth text file

GT text are columns starting at 10

| 77 | gtText_list = lineSplit[9].split('|')
| 78 | gtText = self.truncateLabel(' '.join(gtText_list), maxTextLen)

this throws the error -- index out of range and on correcting
gtText_list = lineSplit[8].split('|')

Also in main.py file
totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch

|26| while True:
| 27 | epoch += 1
| 28 | print('Epoch:', epoch, '/', totalEpoch)

is also throwing the error. On commenting totalEpoch line and sending epoch to print statement-

#totalEpoch = loader.trainSamples//Model.batchSize # loader.numTrainSamplesPerEpoch
while True:
    epoch += 1
    print('Epoch:', epoch, '/', epoch)
Also Autocorrect in spellchecker.py is shown as depreceated and on changing it to pyspellchecker v.4.0

I am able to run the model but on training from scratch its showing very high validation CER of around 43.
let me know if change in spellchecker and other performed changes can lead to this. Also let me know if some other approach has to be taken for training this model on IAM line based dataset

@rpro91 are you using word IAM dataset or lines IAM dataset. It depends on parsing the lines.txt or words.txt file. I have no idea why such high CER is found. Play with batch size=50, or you normalize images or not before training. Even without data augmentation it should have around 23 %.

sushant097 closed this as completed Jan 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

throwing errors #2

throwing errors #2

rpro91 commented Nov 6, 2019 •

edited

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

NonMundaneDev commented Dec 2, 2019

Sammul40619 commented Dec 3, 2019

sushant097 commented Jan 14, 2020 •

edited

GT text are columns starting at 10

throwing errors #2

throwing errors #2

Comments

rpro91 commented Nov 6, 2019 • edited

GT text are columns starting at 10

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

rpro91 commented Nov 7, 2019

Sammul40619 commented Nov 7, 2019

NonMundaneDev commented Dec 2, 2019

Sammul40619 commented Dec 3, 2019

sushant097 commented Jan 14, 2020 • edited

GT text are columns starting at 10

rpro91 commented Nov 6, 2019 •

edited

sushant097 commented Jan 14, 2020 •

edited