Skip to content

ruathudo/post-ocr-correction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

POST OCR Correction

To train the model:

python -m src.main --data yle-all.txt --model tf_ctx_trained_full --batch 256 --epoch 3 --rand 0 --window 3

To resume the training, add --resume option.

The codes are implemented for this paper: https://arxiv.org/abs/2011.03502. If you are using this repo for your research purposes, please cite this as:

@misc{duong2020unsupervised,
      title={An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish}, 
      author={Quan Duong and Mika Hämäläinen and Simon Hengchen},
      year={2020},
      eprint={2011.03502},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Thanks!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published