Skip to content
This repository has been archived by the owner on Jun 15, 2023. It is now read-only.

URLs truncated at line endings #21

Open
bitsgalore opened this issue Oct 28, 2016 · 4 comments
Open

URLs truncated at line endings #21

bitsgalore opened this issue Oct 28, 2016 · 4 comments

Comments

@bitsgalore
Copy link

First of all: great tool! I did however come across a problem with URLs that span more than one line. I've attached a PDF that reproduces the problem here:

testpdfx.pdf

Command:

pdfx -v testpdfx.pdf -o testpdfx.txt

The URL in the footnote is extracted as::

http://jpylyzer.openpreservation.org//2016/01/06/Release-of-

Whereas this should be:

http://jpylyzer.openpreservation.org//2016/01/06/Release-of-jpylyzer-1-17-0

I used pdfx version 1.3.1 on Linux Mint.

@aberja
Copy link

aberja commented Mar 29, 2017

Hi, I'm not sure if you are still working on this code. But on the chance that you are, I wanted to let you know that I also experience the same issue in pdfx v 1.3.1 that bitsgalore reported above.

@Doubledimas
Copy link

I would love to see a solution to this issue. It is one of two problems that is stopping me from using pdfx for my academic research.

@markratledge
Copy link

I see the same issue; reported good or 404 URLs are truncated at 20 characters when using the command format:
pdfx testpdfx.pdf -c

@sscirrus
Copy link

Same issue here. Lots of URLs are ignored or treated as invalid because they cover multiple lines in a PDF (especially when the lines are narrow). Please fix - this is a critical issue preventing me from using pdfx!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants