Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounding boxes are displaced from math regions #3

Open
VladimirKalachikhin opened this issue Jun 15, 2020 · 17 comments
Open

Bounding boxes are displaced from math regions #3

VladimirKalachikhin opened this issue Jun 15, 2020 · 17 comments

Comments

@VladimirKalachikhin
Copy link

Yes, I rendered the image to sizes from file_sizes file. But bounding boxes are fully displaced.
1

I see that pages numeration on math_gt .csv files start from 0, but convert_pdf_to_image.py created pages from 1. Also, convert_pdf_to_image.py creates images different them in file_sizes sizes.

I make my own convert_pdf_to_image, and rending images correct sizes. I start numeration from 0 or 1. Nothing happened.

I tried http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf as AIF_1970_493_498.pdf

@MaliParag
Copy link
Owner

Did you try other pdfs?

@VladimirKalachikhin
Copy link
Author

Yes, I download these files:
http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf ,AIF_1970_493_498.pdf
http://aif.centre-mersenne.org/article/AIF_1999__49_2_375_0.pdf ,AIF_1999_375_404.pdf
http://www.numdam.org/article/ASENS_1970_4_3_3_273_0.pdf ,ASENS_1970_273_284.pdf
http://www.numdam.org/article/ASENS_1997_4_30_3_367_0.pdf ,ASENS_1997_367_384.pdf
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323452/pdf/pnas00314-0027.pdf ,Borcherds86.pdf
http://www.numdam.org/article/BSMF_1970__98__165_0.pdf ,BSMF_1970_165_192.pdf
http://www.numdam.org/article/BSMF_1998__126_2_245_0.pdf ,BSMF_1998_245_271.pdf
http://people.virginia.edu/~lls2l/finite_dimensional.pdf ,Cline88.pdf

Other files are unavailable.

Only for Borcherds86.pdf and Cline88.pdf bounding boxes are placed on math regions correctly. For other files bounding boxes are fully displaced.

@BigPandaCPU
Copy link

Dear sir,
I got the same errors too, There are 9 pdf files displaced. They are
AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271,
InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292.
Others are match well with the label.
The fellow is AIF_1999_375_404.pdf 1.png
1

@MaliParag
Copy link
Owner

Which version of pdf2image are you using?

I think I used the following version -

Name: pdf2image
Version: 1.5.4

@MaliParag MaliParag reopened this Jul 20, 2020
@macqueen09
Copy link

many PDF link are not aviliable.
who has a package of all pdf files? can you share a link by GoogleDriver or BaiDu or something else? Thanks.

@VladimirKalachikhin
Copy link
Author

The answer to questions:
https://github.com/VladimirKalachikhin/marmot-to-ICDAR

@humeme
Copy link

humeme commented Nov 12, 2020

i got the same problem on AIF_1999_375_404.pdf @2.png!! with pdf2image-version==1.5.4@MaliParag
222

2

@Jeozhao
Copy link

Jeozhao commented Jan 14, 2021

Hi @VladimirKalachikhin ,
I have the same problem as you. I found that some images do not match their corresponding GT. Have you solved this problem now?
Thank you!

@Jeozhao
Copy link

Jeozhao commented Jan 14, 2021

Hi @MaliParag ,

Could you please share your image dataset with us?
I found that different download channels and different versions of the pdf2png conversion tool may cause the image to not match GT. So, it would be very grateful to us if you share your data set with us.

@VladimirKalachikhin
Copy link
Author

Have you solved this problem now?

I used MARMOT dataset, see above.

@Jeozhao
Copy link

Jeozhao commented Jan 14, 2021

Have you solved this problem now?

I used MARMOT dataset, see above.

Hi @VladimirKalachikhin
Can this data be converted to be the same as TDF-ICDAR2019?
Or is it just that the format can be kept consistent, but the content is not consistent?
Thanks!

@VladimirKalachikhin
Copy link
Author

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

@Jeozhao
Copy link

Jeozhao commented Jan 14, 2021

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

Thank you for your reply. I have understand your mean.

@ducMNSD
Copy link

ducMNSD commented Feb 20, 2021

Dear sir,
I got the same errors too, There are 9 pdf files displaced. They are
AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271,
InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292.
Others are match well with the label.
The fellow is AIF_1999_375_404.pdf 1.png
1

could you share me all image datasets that you created, thank you very much !

@BigPandaCPU
Copy link

Dear sir,
I got the same errors too, There are 9 pdf files displaced. They are
AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271,
InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292.
Others are match well with the label.
The fellow is AIF_1999_375_404.pdf 1.png
1

could you share me all image datasets that you created, thank you very much !

I get the data from this.
https://github.com/MaliParag/TFD-ICDAR2019#download-instructions
QQ截图20210224093816

The download link file.
22

@MingchangLi
Copy link

NOTE: If you find the bounding boxes are displaced from math regions, it is because the document image that you have rendered is of different size than the one used while annotating. datasetV2 provides file sizes for each image. Resize the image that you have rendered to the size provided in datasetV2 and you should be able to use the annotations.

@VladimirKalachikhin
Copy link
Author

datasetV2 provides file sizes for each image.

I know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants