oroszgy

György Orosz oroszgy

Freelance NLP engineer

169 followers · 234 following

@ec-doris
Budapest, Hungary
08:32 (UTC +01:00)
https://gyorgy.orosz.link
in/oroszgy

Highlights

Developer Program Member

Organizations

Stars

Document structure analysis

73 repositories

DevashishPrasad / CascadeTabNet

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Python 1,553 429 Updated Aug 27, 2021

tabulapdf / tabula-java

Extract tables from PDF files

Java 2,018 450 Updated Mar 19, 2025

camelot-dev / camelot

A Python library to extract tabular data from PDFs

Python 3,647 535 Updated Mar 13, 2026

the-black-knight-01 / Tabulo

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

Python 198 40 Updated Nov 24, 2022

phamquiluan / PubLayNet

ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...

Python 183 38 Updated May 11, 2021

deepdoctection / deepdoctection

A Repo For Document AI

Python 3,146 188 Updated Mar 14, 2026

madhav1ag / CDeCNet

CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images

Python 134 33 Updated Sep 11, 2025

cellsrg / tabbypdf

A tool for extracting arbitrary tables from untagged PDF documents

Java 40 18 Updated Jan 8, 2021

tabbydoc / tabbypdf2

PDF table extraction

Java 10 2 Updated Dec 14, 2021

pdfminer / pdfminer.six

Community maintained fork of pdfminer - we fathom PDF

Python 6,930 1,026 Updated Mar 13, 2026

cndplab-founder / ICDAR2019_cTDaR

The ICDAR 2019 cTDaR is to evaluate the performance of methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables a…

178 68 Updated Aug 10, 2022

ibm-aur-nlp / PubTabNet

Jupyter Notebook 482 85 Updated Jul 8, 2025

ibm-aur-nlp / PubLayNet

Jupyter Notebook 1,042 165 Updated Jul 9, 2025

hikopensource / DAVAR-Lab-OCR

OCR toolbox from Davar-Lab

Python 762 155 Updated Nov 16, 2023

tabulapdf / tabula

Tabula is a tool for liberating data tables trapped inside PDF files

CSS 7,358 684 Updated Mar 14, 2025

atlanhq / camelot

Camelot: PDF Table Extraction for Humans

Python 3,716 362 Updated Jan 5, 2023

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 9,924 871 Updated Jan 28, 2026

doc-analysis / TableBank

TableBank: A Benchmark Dataset for Table Detection and Recognition

1,082 146 Updated Aug 12, 2024

doc-analysis / DocBank

DocBank: A Benchmark Dataset for Document Layout Analysis

Python 639 79 Updated Aug 12, 2024

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,046 2,693 Updated Jan 23, 2026

doc-analysis / ReadingBank

ReadingBank: A Benchmark Dataset for Reading Order Detection

117 4 Updated Aug 26, 2024

open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark

Python 32,493 9,846 Updated Aug 21, 2024

Layout-Parser / layout-model-training

The scripts for training Detectron2-based Layout Models on popular layout analysis datasets

Python 218 60 Updated Sep 26, 2023

dhlab-epfl / dhSegment

Generic framework for historical document processing

Python 382 113 Updated Jul 9, 2021

tstanislawek / awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

1,504 166 Updated Jun 2, 2023

microsoft / table-transformer

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,876 309 Updated Jun 24, 2024

facebookresearch / detr

End-to-End Object Detection with Transformers

Python 15,160 2,664 Updated Mar 12, 2024

thoqbk / traprange

(Java)A Method to Extract Tabular Content from PDF Files

HTML 337 133 Updated Apr 22, 2023

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

C# 631 69 Updated Oct 1, 2023

UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

C# 2,379 311 Updated Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly