Expose PDF.js getTextContent method via a Content property. #20

yveszoundi · 2014-03-12T22:33:20Z

This is a suggested implementation for #19.

Right now I roll pdf2json with a minor modification in the pdf.js script.

Thank you for your great work.

palin27 · 2014-03-28T10:09:12Z

Great Suggestion!

modesty · 2014-03-30T23:14:40Z

I can see the "content" property would be useful in certain use cases, but it doesn't fit into the current output format. Because the output format is designed to be a simplified structure that can be used to re-construct the PDF content (not just text, but also color, styles, lines, positions, sizes, fields, types, formats, etc.) in client renderer, and text content is already part of Texts property.

In case you only need Text from PDF, don't care about other content, I'd suggest to add another top level method that only returns text content.

palin27 · 2014-03-31T07:59:42Z

I known the goal of your work is re-construct the PDF content. I need only for text but I really don't understand why Texts property contains ASCII character, for example 2C instead ','. Instead, If I use promise object from page.getTextContent() it returns clear text!

You did a great work.

Expose PDF.js getTextContent method via a Content property. ---- I've got a couple of more inquiries on getting raw text out of PDF, reopen this pull request and merge it for testing. Thanks for the contribution and sorry for the delay.

modesty · 2014-07-20T19:09:57Z

I've got a couple of more inquiries on getting raw test content from PDF lately, reopen this pull request and merged it for more testing. Thanks for the contribution and sorry for the delay.

Expose PDF.js getTextContent method via a Content property.

76a38db

modesty closed this Mar 30, 2014

modesty mentioned this pull request Mar 30, 2014

Expose pdf.js getTextContent method for a pdf page #19

Closed

GeoffreyBooth mentioned this pull request Mar 31, 2014

Pass in options that tell pdf2json what to output #21

Open

modesty reopened this Jul 20, 2014

modesty merged commit ec960ba into modesty:master Jul 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose PDF.js getTextContent method via a Content property. #20

Expose PDF.js getTextContent method via a Content property. #20

yveszoundi commented Mar 12, 2014

palin27 commented Mar 28, 2014

modesty commented Mar 30, 2014

palin27 commented Mar 31, 2014

modesty commented Jul 20, 2014

Expose PDF.js getTextContent method via a Content property. #20

Expose PDF.js getTextContent method via a Content property. #20

Conversation

yveszoundi commented Mar 12, 2014

palin27 commented Mar 28, 2014

modesty commented Mar 30, 2014

palin27 commented Mar 31, 2014

modesty commented Jul 20, 2014