Skip to content

Project Aim: Write Nougat project from scratch for personal learning purposes. Also aiming to productionalize an open-source version of this tool for the web.

Notifications You must be signed in to change notification settings

shahbaz-mogal/AcademicPDFParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AcademicPDFParser

Project Aim: To rewrite lukas-blecher's Nougat (Meta Research's NLP project) from scratch from personal learning purposes. Also aiming to productionalize this tool to an open-source free-to-use platform to practice MLOps skills.

Goal: This tool accepts uneditable academic PDFs and converts them to editable LaTeX markdown files. We use Optical Character Recognition models (Swin Transformer) paired with Semantic Learning models (mBART Transformer Decoder) to do this.

Citation for Nougat project:

@misc{blecher2023nougat,
      title={Nougat: Neural Optical Understanding for Academic Documents}, 
      author={Lukas Blecher and Guillem Cucurull and Thomas Scialom and Robert Stojnic},
      year={2023},
      eprint={2308.13418},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

About

Project Aim: Write Nougat project from scratch for personal learning purposes. Also aiming to productionalize an open-source version of this tool for the web.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages