Skip to content
@pd3f

pd3f

PDF text extraction pipeline: self-hosted, local-first and Docker-based

Pinned

  1. pd3f pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 220 29

  2. pd3f-core pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    Jupyter Notebook 26 6

  3. dehyphen dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 32 5

Repositories

Showing 7 of 7 repositories
  • pd3f Public

    🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

    HTML 220 AGPL-3.0 29 14 3 Updated Oct 13, 2023
  • pd3f-core Public

    📑 Python Package to reconstruct the original continuous text from PDFs with language models

    Jupyter Notebook 26 AGPL-3.0 6 1 23 Updated Sep 8, 2023
  • pd3f.com Public

    📝 Website to advertise & document pd3f

    JavaScript 1 MIT 2 0 1 Updated Jan 22, 2023
  • pd3f-dataset-bmjv Public

    Dataset of (mostly German) PDFs used to develop pd3f

    Python 1 MIT 1 0 5 Updated Dec 8, 2022
  • dehyphen Public

    📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF

    Python 32 GPL-3.0 5 4 1 Updated Mar 8, 2022
  • pd3-flair Public

    Flair's language models without unnecessary dependencies

    Python 3 2,054 0 0 Updated Sep 15, 2020
  • pd3f-results Public

    Results with pd3f on some PDF datasets

    Jupyter Notebook 1 GPL-3.0 1 0 0 Updated Aug 21, 2020

Top languages

Loading…

Most used topics

Loading…