New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How were the 2D positional segments generated from OCR for the pretraining tasks of LayoutLMv3 #838
Comments
Many OCR engines support line findings. For example, the open-source Tesseract OCR Engine discussed "Line Finding" in Section 3.1.
{
"status": "succeeded",
"createdDateTime": "2021-02-04T06:32:08.2752706+00:00",
"lastUpdatedDateTime": "2021-02-04T06:32:08.7706172+00:00",
"analyzeResult": {
"version": "3.2",
"readResults": [
{
"page": 1,
"angle": 2.1243,
"width": 502,
"height": 252,
"unit": "pixel",
"lines": [
{
"boundingBox": [
58,
42,
314,
59,
311,
123,
56,
121
],
"text": "Tabs vs",
"appearance": {
"style": {
"name": "handwriting",
"confidence": 0.96
}
},
"words": [
{
"boundingBox": [
68,
44,
225,
59,
224,
122,
66,
123
],
"text": "Tabs",
"confidence": 0.933
},
{
"boundingBox": [
241,
61,
314,
72,
314,
123,
239,
122
],
"text": "vs",
"confidence": 0.977
}
]
}
]
}
]
}
} We use the extracted |
Describe
Model: LayoutLMv3
I understand that LayoutLMv3 uses 2D positional encodings for whole text segments instead of positional encodings per word. How are these generated for the pretraining task? Is there a specific OCR that was used for this?
I understand data for the FUNSD finetuning task was modified based on the labels of the training data and what key group / value group they belonged to, but how are the text segment positional encodings generated without the use of labels?
Is there any discussion around the OCR engine that was used for pretraining to utilize segment positions instead of word-level positions? Is this a special OCR engine or a model trained for segment extraction? It's a little confusing how much is being hand-waved here on this piece or if I'm just missing something.
Thanks!
The text was updated successfully, but these errors were encountered: