POST OCR Correction

To train the model:

python -m src.main --data yle-all.txt --model tf_ctx_trained_full --batch 256 --epoch 3 --rand 0 --window 3

To resume the training, add --resume option.

The codes are implemented for this paper: https://arxiv.org/abs/2011.03502. If you are using this repo for your research purposes, please cite this as:

@misc{duong2020unsupervised,
      title={An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish}, 
      author={Quan Duong and Mika Hämäläinen and Simon Hengchen},
      year={2020},
      eprint={2011.03502},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
models		models
src		src
.gitignore		.gitignore
README.md		README.md
batch_job.sh		batch_job.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POST OCR Correction

About

Releases

Packages

Languages

ruathudo/post-ocr-correction

Folders and files

Latest commit

History

Repository files navigation

POST OCR Correction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages