Unifying Vision, Text, and Layout for Universal Document Processing

Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

Code Release Here

Code is rehosted at part of the i-code project

Open Source Checklist:

Release Model (Encoder + Text decoder)
Release Most Scripts
Vision Decoder / Weights (Due to fake document generation ethical consideration, we plan to release this functionality as an Azure API)
Demos

Introduction

UDOP unifies vision, text, and layout through vision-text-layout Transformer and unified generative pretraining tasks including vision task, text task, layout task, and mixed task. We show the task prompts (left) and task targets (right) for all self-supervised objectives (joint text-layout reconstruction, visual text recognition, layout modeling, and masked autoencoding) and two example supervised objectives (question answering and layout analysis).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
udop.png		udop.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unifying Vision, Text, and Layout for Universal Document Processing

Code Release Here

Open Source Checklist:

Introduction

About

Releases

Packages

Contributors 4

License

microsoft/UDOP

Folders and files

Latest commit

History

Repository files navigation

Unifying Vision, Text, and Layout for Universal Document Processing

Code Release Here

Open Source Checklist:

Introduction

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages