Doc-GCN: Heterogeneous Graph Convolutional Networks for Document

Layout Analysis This repository contains code for the paper Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis.

Siwen Luo*, Yihao Ding*, Siqu Long, Soyeon Caren Han, Josiah Poon

Dataset Prepare

This paper uses three widely used benchmark datasets, including FUNSD (paper), Publaynet (paper), and Docbank (paper). (All three datasets are publicly available and can be gotten via their officially provided download link.)

Before feeding into various graphs to get enhanced feature representation, some preprocessing procedures are required to generate multi-aspect feature representations. Detailed procedure please refer here. For Publaynet dataset, we use Google Cloud Vision OCR tool to extract the text content before we feed into the pre-trained BERT to get textual representations.

Acquire Multi-aspect Features

We provide a tutorial to show how to generate an appropriate json file on FUNSD dataset for training or get GCN enhanced representation by our pre-trained GCN models. Other two datasets could follow the same procedure to get the required format json file for the following procedures. Please refer our paper to see the detailed descriptions of node and edge representations and check the google colab notebook to see how to generate them.

Graph Construction

We use GCNs to enhance the proposed four aspect feature representations: Appearance, Density, Semantic and Syntactic. Those GCNs have the same architecture but different node and edge representations. We generally divided into two types based on distinct edge representations：

Appearance and Density Graphs (Gap-distance Weighted)

The first type is gap distance based of including apprearance and density graphs of which edge features is the inverse of the nearest-top k segments. Node features of this type are visual and density features of each segment, repectively. Please refer this ipybn notebook to check how it works on FUNSD dataset.

Semantic and Syntactic Graphs (Parent-Child based)

Another type is the parent-child relation based (see example on FUNSD dataset). If two segments have parent-child relation, the edge value is set to be 1, otherwise 0. The graph construction workflow can be found in below graphs. More detailed information can be found in our paper and Appendix.

Classifier

After get the enhance the feature representations, we feed them into our model for training and testing. We also provide an ipynb notebook to show how it works on FUNSD dataset. Please refer DocGCN paper to get more detailed description about our classifier.

Evaluation Results

DocGCN can achieve SoTA performance based on considerable experiments compared with other baselines. Here we just show the overall performance on three benchmark datasets, more results analysis and ablation studies can be found in Section 5 of our paper

The overall performances of DocGCN compared to the baselines on test set in Precision rate (%), Recall rate (%) and F1 score (%). The second best is underlined. Our DocGCN can achieve highest performance among all benchmark datasets and evaluation metrics.

Case Study

We visualized the predicted results from DocGCN and compared with Top-3 baseline models for each dataset. Here is an example on PubLayNet Dataset. Below figure shows RoBERTa and Faster-RCNN have wrongly recognised a Text into List, whereas our Doc-GCN has accurately recognized all components. This is because by simply considering the semantic or visual information, it is hard to distinguish the List and Text, indicating the importance of capturing the mutli-aspect features and structural relationships between layout components for the better performance. More case studies can be found in Appendix B of DocGCN paper.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
appendix		appendix
classifier		classifier
figures		figures
gcns		gcns
preprocessing		preprocessing
tools		tools
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

appendix

appendix

classifier

classifier

figures

figures

gcns

gcns

preprocessing

preprocessing

tools

tools

README.md

README.md

Repository files navigation

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document

Dataset Prepare

Acquire Multi-aspect Features

Graph Construction

Appearance and Density Graphs (Gap-distance Weighted)

Semantic and Syntactic Graphs (Parent-Child based)

Classifier

Evaluation Results

Case Study

About

Releases

Packages

Languages

mxsurui/doc_gcn

Folders and files

Latest commit

History

Repository files navigation

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document

Dataset Prepare

Acquire Multi-aspect Features

Graph Construction

Appearance and Density Graphs (Gap-distance Weighted)

Semantic and Syntactic Graphs (Parent-Child based)

Classifier

Evaluation Results

Case Study

About

Resources

Stars

Watchers

Forks

Languages