yolo-doclaynet

👏 Update 7/10/2024 - Add YOLOv10 models.

👏 Update 6/21/2024 - Add YOLOv9 models.

predict results by yolov8n-doclaynet

Why this repo?

You know that RAG is very popular these days. There are many applications that support talking to documents. However, there is a huge performance drop when talking to a complex document due to the complex structures. So it's a challenge to extract content from complex document and organize it into parsable form. This repo aims to solve this challenge with a fast and good performance method.

YOLO is the most advenced detect model developed by Ultralytics. YOLO has 5 different sizes of base model and a super powerful framework for training and deployment. So I chose YOLO to solve this challenge.
DocLayNet is a human-annotated document layout segmentation dataset containing 80863 pages from a broad variety of document sources. As far as I know, it's the most qualified document layout analysis dataset.

What I did?

Offer a script to turn DocLayNet dataset into YOLO detect training ready dataset.
Offer train, eval and serve codes.
Train and release 5 different sizes of YOLOv8 models: yolov8n, yolov8s, yolov8m, yolov8l and yolov8x.
- yolov8n, yolov8s and yolov8m can be found on HuggingFace.
- yolov8l and yolov8x are only slightly better than yolov8m. If you really want to try, please buy from yolov8l and yolov8x, as I rent GPUs to train them.

How to use?

from ultralytics import YOLO

model = YOLO("{path to model file}")
pred = model("{path to test image}")
print(pred)

The definition of predict result please refer to the doc.

Server

You can simply python main.py to serve the model. Open http://localhost:8000/redoc check the API.

Dataset

DocLayNet can be found more details and download at this link. It has 11 labels:

Text: Regular paragraphs.
Picture: A graphic or photograph.
Caption: Special text outside a picture or table that introduces this picture or table.
Section-header: Any kind of heading in the text, except overall document title.
Footnote: Typically small text at the bottom of a page, with a number or symbol that is referred to in the text above.
Formula: Mathematical equation on its own line.
Table: Material arranged in a grid alignment with rows and columns, often with separator lines.
List-item: One element of a list, in a hanging shape, i.e., from the second line onwards the paragraph is indented more than the first line.
Page-header: Repeating elements like page number at the top, outside of the normal text flow.
Page-footer: Repeating elements like page number at the bottom, outside of the normal text flow.
Title: Overall title of a document, (almost) exclusively on the first page and typically appearing in large font.

Prepare data

download DocLayNet dataset by this link
unzip to datasets folder
use my convert script to make datasets ready for training

wget https://codait-cos-dax.s3.us.cloud-object-storage.appdomain.cloud/dax-doclaynet/1.0.0/DocLayNet_core.zip
mkdir datasets
mv DocLayNet_core.zip datasets/
cd datasets/ && unzip DocLayNet_core.zip && rm DocLayNet_core.zip
cd ../
python convert_dataset.py

Train & Eval

train

After preparing data, thanks to Ultralytics, training is super easy. You can choose base models from this link. I use the YOLOv8 series.

python train.py {base-model}

Eval

After training, you can evaluate your best model on test split.

python eval.py {path-to-your-model}

Result

Figure of overall mAP50-95 on test between different models.

Full table of mAP50-95 on test compare between different models.

label	boxes	yolov8n	yolov9t	yolov10n	yolov8s	yolov9s	yolov10s	yolov8m	yolov9m	yolov10m	yolov10b	yolov8l	yolov9c	yolov10l	yolov8x	yolov10x
Params (M)		3.2	2.0	2.3	11.2	7.2	7.2	25.9	20.1	15.4	19.1	43.7	25.5	24.4	68.2	29.5
Caption	1542	0.682	0.68	0.713	0.721	0.735	0.738	0.746	0.749	0.761	0.762	0.75	0.746	0.772	0.753	0.77
Footnote	387	0.614	0.638	0.642	0.669	0.684	0.681	0.696	0.693	0.713	0.72	0.702	0.689	0.722	0.717	0.725
Formula	1966	0.655	0.678	0.648	0.695	0.719	0.698	0.723	0.737	0.727	0.715	0.75	0.752	0.736	0.747	0.76
List-item	10521	0.789	0.802	0.803	0.818	0.827	0.833	0.836	0.838	0.845	0.844	0.841	0.843	0.851	0.841	0.849
Page-footer	3987	0.588	0.599	0.6	0.61	0.612	0.614	0.64	0.62	0.645	0.659	0.641	0.65	0.671	0.655	0.661
Page-header	3365	0.707	0.731	0.699	0.754	0.77	0.761	0.769	0.77	0.765	0.774	0.776	0.785	0.779	0.784	0.79
Picture	3497	0.723	0.764	0.749	0.762	0.789	0.778	0.789	0.787	0.79	0.803	0.796	0.796	0.8	0.805	0.806
Section-header	8544	0.709	0.72	0.71	0.727	0.736	0.729	0.742	0.742	0.742	0.744	0.75	0.741	0.743	0.748	0.748
Table	2394	0.82	0.86	0.839	0.854	0.88	0.863	0.88	0.881	0.879	0.879	0.885	0.884	0.891	0.886	0.889
Text	29917	0.845	0.856	0.85	0.86	0.869	0.868	0.876	0.874	0.879	0.874	0.878	0.877	0.88	0.877	0.882
Title	334	0.762	0.778	0.774	0.806	0.81	0.822	0.83	0.836	0.838	0.846	0.846	0.838	0.845	0.84	0.848
All	66454	0.718	0.737	0.73	0.752	0.766	0.762	0.775	0.775	0.78	0.784	0.783	0.782	0.79	0.787	0.793

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
annotated-test.png		annotated-test.png
convert_dataset.py		convert_dataset.py
eval.py		eval.py
main.py		main.py
plot.png		plot.png
requirements.txt		requirements.txt
test.png		test.png
test.py		test.py
train.py		train.py
yolo-doclaynet.pt		yolo-doclaynet.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

yolo-doclaynet

Why this repo?

What I did?

How to use?

Server

Dataset

Prepare data

Train & Eval

train

Eval

Result

About

Releases 3

Sponsor this project

Packages

Languages

License

ppaanngggg/yolo-doclaynet

Folders and files

Latest commit

History

Repository files navigation

yolo-doclaynet

Why this repo?

What I did?

How to use?

Server

Dataset

Prepare data

Train & Eval

train

Eval

Result

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Sponsor this project

Packages 0

Languages

Packages