Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault during processing #42

Open
mmartinello opened this issue Nov 7, 2019 · 9 comments
Open

Segmentation fault during processing #42

mmartinello opened this issue Nov 7, 2019 · 9 comments

Comments

@mmartinello
Copy link

pdf2htmlEX version 0.18.7, installed on Debian Stretch with poppler 0.81.0 compiled.

I'm receiving "segmentation fault" when I try to process a PDF file:

me@server:~$ pdf2htmlEX --version
pdf2htmlEX version 0.18.7
Copyright 2012-2015 Lu Wang <coolwanglu@gmail.com> and other contributors
Libraries: 
  poppler 0.81.0
  libfontforge 20190114
  cairo 1.14.8
Default data-dir: /usr/share/pdf2htmlEX
Supported image format: png jpg svg

me@server:~$ pdf2htmlEX --embed cfijo --fit-width 1024 --bg-format jpg --split-page 1 --dest-dir /tmp/test/ file.pdf 
Preprocessing: 44/44
Segmentation fault
@starryfu
Copy link

starryfu commented Nov 7, 2019

Please give the pdf file link.

@stephengaito
Copy link
Contributor

@mmartinello I know that there are (segfault) problems if you are using "non-standard" fonts... as I did not realize that while I had updated pdf2htmlEX for poppler, that I had not done the same for fontforge.

I have now tooled up to attempt the fontforge work, unfortunately this might take some time (the initial poppler work certainly did).

However, as @starryfu suggests, it would be very helpful to have a copy of your PDF file (as it must be using fonts that trigger pdf2htmlEX's use of fontforge). Without your PDF file I will not be able to verify that I have found the fontforge related problems.

(This issue will be related to #41 -- though in issue #41 the problems were detected at compile time rather than run time).

@GMolini
Copy link

GMolini commented Nov 19, 2019

Hi. Im also having a segmentation fault. I have uploaded the pdf file here: https://gofile.io/?c=lONa9b

If iI use the --split-pages 1 option it works fine until the page 139/156, then crashes.
This is my pdf2htmlEx version:

pdf2htmlEX version 0.18.7
Copyright 2012-2015 Lu Wang coolwanglu@gmail.com and other contributors
Libraries:
poppler 0.81.0
libfontforge 20191113
cairo 1.15.10
Default data-dir: /usr/local/share/pdf2htmlEX
Supported image format: png jpg svg

@stephengaito
Copy link
Contributor

@GMolini,

Many thanks for uploading your PDF. I have just tested it using my development server on ubuntu 19.10, and I have no problems.

I can see from the pdf2htmlEX version information, that you have recently recompiled libfontforge, can you tell me which version of the sources you are using?

Could you also let me know which underlying operating system you are using?

@guoxuequan
Copy link

guoxuequan commented Nov 20, 2019

I have the same problem for font. I have tested it both on MacOS and docker in CentOS7.
pdf2htmlEX version on MacOS is:

pdf2htmlEX version 0.14.6
Copyright 2012-2015 Lu Wang <coolwanglu@gmail.com> and other contributors
Libraries: 
  poppler 0.57.0
  libfontforge 20191114
  cairo 1.16.0
Default data-dir: /usr/local/Cellar/pdf2htmlex/0.14.6_23/share/pdf2htmlEX
Supported image format: png jpg svg

And I use lldb :


Core file '/cores/core.88120' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x000000010b6f5a1d libfontforge.2.dylib`otf_dumpALookup + 696
    frame #1: 0x000000010b6f11f1 libfontforge.2.dylib`dumpg___info + 1781
    frame #2: 0x000000010b6f21d9 libfontforge.2.dylib`otf_dumpgsub + 34
    frame #3: 0x000000010b7063a9 libfontforge.2.dylib`initATTables + 2182
    frame #4: 0x000000010b6fd5a5 libfontforge.2.dylib`initTables + 1751
    frame #5: 0x000000010b6fc8ec libfontforge.2.dylib`_WriteTTFFont + 387
    frame #6: 0x000000010b6fdd87 libfontforge.2.dylib`WriteTTFFont + 121
    frame #7: 0x000000010b61f72f libfontforge.2.dylib`_DoSave + 750
    frame #8: 0x000000010b622848 libfontforge.2.dylib`GenerateScript + 2220
    frame #9: 0x000000010b1692df pdf2htmlEX`ffw_save + 93
    frame #10: 0x000000010b15a65a pdf2htmlEX`pdf2htmlEX::HTMLRenderer::embed_font(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, GfxFont*, pdf2htmlEX::FontInfo&, bool) + 6548
    frame #11: 0x000000010b15b648 pdf2htmlEX`pdf2htmlEX::HTMLRenderer::install_embedded_font(GfxFont*, pdf2htmlEX::FontInfo&) + 124
    frame #12: 0x000000010b15b3bd pdf2htmlEX`pdf2htmlEX::HTMLRenderer::install_font(GfxFont*) + 1357
    frame #13: 0x000000010b160fec pdf2htmlEX`pdf2htmlEX::HTMLRenderer::check_state_change(GfxState*) + 456
    frame #14: 0x000000010b161f5f pdf2htmlEX`pdf2htmlEX::HTMLRenderer::drawString(GfxState*, GooString*) + 155
    frame #15: 0x000000010b20acab libpoppler.68.dylib`Gfx::doShowText(GooString*) + 2627
    frame #16: 0x000000010b1fe842 libpoppler.68.dylib`Gfx::opShowSpaceText(Object*, int) + 340
    frame #17: 0x000000010b203196 libpoppler.68.dylib`Gfx::go(bool) + 642
    frame #18: 0x000000010b202e9c libpoppler.68.dylib`Gfx::display(Object*, bool) + 202
    frame #19: 0x000000010b242d02 libpoppler.68.dylib`Page::displaySlice(OutputDev*, double, double, int, bool, bool, int, int, int, int, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) + 362
    frame #20: 0x000000010b242b92 libpoppler.68.dylib`Page::display(OutputDev*, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) + 66
    frame #21: 0x000000010b24568c libpoppler.68.dylib`PDFDoc::displayPage(OutputDev*, int, double, double, int, bool, bool, bool, bool (*)(void*), void*, bool (*)(Annot*, void*), void*, bool) + 162
    frame #22: 0x000000010b14fb64 pdf2htmlEX`pdf2htmlEX::HTMLRenderer::process(PDFDoc*) + 1138
    frame #23: 0x000000010b149987 pdf2htmlEX`main + 1122
    frame #24: 0x00007fff67fa42e5 libdyld.dylib`start + 1
    frame #25: 0x00007fff67fa42e5 libdyld.dylib`start + 1

I have uploaded the pdf file here :https://we.tl/t-tiBbvvGSqF

@GMolini
Copy link

GMolini commented Nov 20, 2019

So, Im using it with this fontforge version: https://github.com/fontforge/fontforge/releases/tag/20190413

I can confirm this only happens with the version of pdf2htmlEx and fontforge that I posted, i have tried it in another machine with this version, and it doesnt break

pdf2htmlEX version 0.15.0
Copyright 2012-2015 Lu Wang coolwanglu@gmail.com and other contributors
Libraries:
poppler 0.63.0
libfontforge 20181213
cairo 1.14.6
Default data-dir: /usr/local/share/pdf2htmlEX
Supported image format: png jpg svg

Thing is, I had this old version in both machines, then decided to upgrade one of them to version 0.18.7 and poppler 0.81 to see if it would work. Its worked mostly fine, its only this PDF so far thats giving me trouble

@stephengaito
Copy link
Contributor

stephengaito commented Nov 20, 2019

@GMolini (and @guoxuequan )

SO.. the pdf2htmlEX sources have not (yet) been updated to use any fontforge after tag/20170731 (which most Ubuntu releases are still using).

I am, at the moment, exploring how to release pdf2htmlEX as both an AppImage and a Docker image so that pdf2htmlEX can use the more recent fontforge and poppler releases on older distributions.

Once I have done that, I will then update the pdf2htmlEX sources to use the most recent poppler and fontforge (stable) releases.

Alas this might take me a couple of weeks.

I know that with the current fontforge, whether or not a given PDF succeeds or segFaults is very hit-or-miss. It depends entirely on the fonts embedded in the PDF.

I would suggest trying to work on an Ubuntu 18.04 OS with the older deb that I released until I can get AppImage and Docker images working.

(@guoxuequan I have put your PDF into my collection of examples and will have a look at it as soon as I can).

@guoxuequan
Copy link

guoxuequan commented Nov 27, 2019

Thanks. I have made a docker on Ubuntu 19.10 with the pdf2htmlex_0.18.6-1.git20190927r583b1-0ubuntu1.disco1_amd64.deb. It run well for my PDF.

This is my docker https://hub.docker.com/repository/docker/guoxuequan/pdf2htmlex
docker pull guoxuequan/pdf2htmlex

@langsz
Copy link

langsz commented Jan 18, 2020

We found that if the pdf content have a difference punctuation, for example , a string most of Chinese characters but only one English punctuation, it will catch exception and exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants