Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference individual Image for table detection #17

Open
matchalambada opened this issue Jan 4, 2022 · 14 comments
Open

Inference individual Image for table detection #17

matchalambada opened this issue Jan 4, 2022 · 14 comments
Labels
question Further information is requested

Comments

@matchalambada
Copy link

Hi authors,
I would like to visualize the result table detection for an specific Image. Which output in the code should I take out and modify in order to have coordianates of predicted bouding box to visualize it on the infered image?

@bsmock bsmock added the question Further information is requested label Jan 4, 2022
@mzhadigerov
Copy link

mzhadigerov commented Jan 17, 2022

Is there any update on that?

@Architectshwet
Copy link

How can we extract data in rows/column format from the table image from the trained model?

@bsmock
Copy link
Collaborator

bsmock commented Feb 3, 2022

In the current version of the code, you can find the function that takes the model output and processes it into a table representation here:

pred_table_structures, pred_cells, pred_confidence_score = objects_to_cells(pred_bboxes, pred_labels, pred_scores,

@Jiangwentai
Copy link

Jiangwentai commented Feb 5, 2022

@bsmock

hello ,I want to know if I used the Functiuon objects_to_cells ,How can I get the page_tokens if I will use a new Image input

@bsmock
Copy link
Collaborator

bsmock commented Feb 8, 2022

How can I get the page_tokens if I will use a new Image input

Right now the code is written to be used with the PubTables-1M dataset or any dataset in the same format. For each table image in PubTables-1M, there is also a JSON file with a list of words in the image, which is read in as page_tokens. So the input image and the list of words (page_tokens) are what you need for inference.

You can have a look at the dataset to see examples of the format for page_tokens. Basically page_tokens needs to be a list of dicts, where each dict corresponds to a word or token and looks like this:
{"text": "Table", "bbox": [xmin, ymin, xmax, ymax], "flags": 0, "block_num": 0, "line_num": 0, "span_num": 0}

At a minimum you'll need to fill in the "text", "bbox", and "span_num" fields, where "span_num" is an integer that puts the words in some order. When the code returns the text for each cell as a string, the words in the text string will be sorted by "block_num", then "line_num", then "span_num". So you can leave "flags", "block_num", and "line_num" as 0 as long as you put a unique integer for each word in "span_num".

@jshtok
Copy link

jshtok commented Sep 21, 2022

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data.
Thank you!

@suonbo
Copy link

suonbo commented Sep 21, 2022

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data. Thank you!

You can find some samples from here:
https://drive.google.com/drive/folders/0B5h08T2mGP3ffnZLbTZ0WVNRT3Zjdjl2eC11aW0tOFVCaU5Mb2c2Q0dmc21lNWo1Y3BuT3c?resourcekey=0-bphHgPyZKg0yT5V8F7BWjw&usp=sharing

@jshtok
Copy link

jshtok commented Sep 21, 2022

@bsmock , Can you please add at least one example image with all the required data structures to make a working inference example? It would help to understand the format without downloading 110Gb of data. Thank you!

You can find some samples from here: https://drive.google.com/drive/folders/0B5h08T2mGP3ffnZLbTZ0WVNRT3Zjdjl2eC11aW0tOFVCaU5Mb2c2Q0dmc21lNWo1Y3BuT3c?resourcekey=0-bphHgPyZKg0yT5V8F7BWjw&usp=sharing

Thank you, @suonbo , but in this location I can only see the .jpg images (and they are cropped tables, not whole pages). I am looking for example with data required in the inference example:

python main.py --mode eval --data_type structure --config_file structure_config.json --data_root_dir /path/to/pascal_voc_structure_data --model_load_path /path/to/structure_model --table_words_dir /path/to/json_table_words_data

specifically, I need the config file (not in the repo!), pascal_voc_structure, table_words_dir (what's there?), json_table_words_data ...

@Danferno
Copy link

To anyone interested, I uploaded an example of the table structure recognition files here. It holds the annotation (pascal voc), the words (json) and the table image (.jpg)

@mineshmathew
Copy link

Has anyone figured how to run table detection alone ?

@Danferno
Copy link

Has anyone figured how to run table detection alone ?

NielsRogge made a notebook with examples

@muneeb2001
Copy link

NielsRogge made a notebook with examples

Can you share some tutorial where the table is converted to csv or html?

@nuocheng
Copy link

nuocheng commented Dec 1, 2023

Has anyone figured how to run table detection alone ?

NielsRogge made a notebook with examples

Hello, thank you for providing a simple case study.
I encountered an issue while running jupyter notebook. There is a dependency on resnet18 in the Microsoft/table transformer detection configuration, but I failed to download using the third-party Python library timm. Do you have any method to make table transformer detection load the local resnet18 configuration?

@NielsRogge
Copy link

Hi,

See #158 with updated notebooks and demos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests