Skip to content
View oroszgy's full-sized avatar
:octocat:
:octocat:

Organizations

@ec-doris @huspacy

Block or report oroszgy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Document structure analysis

73 repositories

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Python 1,553 429 Updated Aug 27, 2021

Extract tables from PDF files

Java 2,018 450 Updated Mar 19, 2025

A Python library to extract tabular data from PDFs

Python 3,647 535 Updated Mar 13, 2026

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

Python 198 40 Updated Nov 24, 2022

ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...

Python 183 38 Updated May 11, 2021

A Repo For Document AI

Python 3,146 188 Updated Mar 14, 2026

CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images

Python 134 33 Updated Sep 11, 2025

A tool for extracting arbitrary tables from untagged PDF documents

Java 40 18 Updated Jan 8, 2021

PDF table extraction

Java 10 2 Updated Dec 14, 2021

Community maintained fork of pdfminer - we fathom PDF

Python 6,930 1,026 Updated Mar 13, 2026

The ICDAR 2019 cTDaR is to evaluate the performance of methods for table detection (TRACK A) and table recognition (TRACK B). For the first track, document images containing one or several tables a…

178 68 Updated Aug 10, 2022
Jupyter Notebook 482 85 Updated Jul 8, 2025
Jupyter Notebook 1,042 165 Updated Jul 9, 2025

OCR toolbox from Davar-Lab

Python 762 155 Updated Nov 16, 2023

Tabula is a tool for liberating data tables trapped inside PDF files

CSS 7,358 684 Updated Mar 14, 2025

Camelot: PDF Table Extraction for Humans

Python 3,716 362 Updated Jan 5, 2023

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

Python 9,924 871 Updated Jan 28, 2026

TableBank: A Benchmark Dataset for Table Detection and Recognition

1,082 146 Updated Aug 12, 2024

DocBank: A Benchmark Dataset for Document Layout Analysis

Python 639 79 Updated Aug 12, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 22,046 2,693 Updated Jan 23, 2026

ReadingBank: A Benchmark Dataset for Reading Order Detection

117 4 Updated Aug 26, 2024

OpenMMLab Detection Toolbox and Benchmark

Python 32,493 9,846 Updated Aug 21, 2024

The scripts for training Detectron2-based Layout Models on popular layout analysis datasets

Python 218 60 Updated Sep 26, 2023

Generic framework for historical document processing

Python 382 113 Updated Jul 9, 2021

A curated list of resources for Document Understanding (DU) topic

1,504 166 Updated Jun 2, 2023

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,876 309 Updated Jun 24, 2024

End-to-End Object Detection with Transformers

Python 15,160 2,664 Updated Mar 12, 2024

(Java)A Method to Extract Tabular Content from PDF Files

HTML 337 133 Updated Apr 22, 2023

Document Layout Analysis resources repos for development with PdfPig.

C# 631 69 Updated Oct 1, 2023

Read and extract text and other content from PDFs in C# (port of PDFBox)

C# 2,379 311 Updated Mar 15, 2026