An end to end model for two sub-tasks of Table Recognition: table structure recognition, cell detection
Dataset: Pubtabnet
Architecture: base on this paper
- Consists one of Shared Encoder, one Shared Decoder and three separate Decoder for three sub-tasks
- Shared Encoder using a CNN backbone network as the feature extractor
- Four Decoders are inspired by original Transformer decoder
config.py
contains hyperparametersparsing_data.py
match raw data from Pubtabnet to anotationtokenizer.py
encode characters, html tagssub_module.py
build necessary sub-modules like Cross Attention, Self Attention, Positional Encoding, ...main_model
build last model from sub-modulestrain_infer.py
train loop