Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive form (AcroForm) support #7613

Open
timvandermeij opened this issue Sep 7, 2016 · 23 comments

Comments

@timvandermeij
Copy link
Contributor

commented Sep 7, 2016

This is a tracking issue only, so this is not the place for any other questions or discussions. Open a new issue for that.

This is a meta issue for interactive form (AcroForm) support according to Chapter 12.7 of the PDF reference (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737). This includes all form elements except for signature fields, which are tracked in #1076. The objective is to get https://github.com/mozilla/pdf.js/blob/master/test/pdfs/f1040.pdf.link to render completely, but also to resolve other open issues and PRs.

General

  • Prepare core and display layer for implementing form elements (#7596)
  • Reference testing (#7602)
  • Preference (#7602)
  • Remove global PDFJS.renderInteractiveForms usage (#7640)
  • Refactor field name construction code in WidgetAnnotation (#7775)
  • Refactor or clarify where annotations are rendered
    • Mostly in the display layer, but text widget annotations with appearance streams are rendered in the core layer, which causes confusion...
  • Appearances
    • It seems like the current font code for text widgets is dead as fontRefName is never set...
    • Parse AcroForm dictionary
  • Storing entered values for when the page is destroyed when it is not visible
  • Printing entered values
    • Either print the HTML elements or render the contents onto the canvas (use appendToOperatorList)
  • Form actions/interaction between elements
    • Full names, reset/submit form, scripts
  • Enable by default (and adjust the Chrome manifest to add a title)
  • Remove the fallback bar (and maybe preference)
  • Update the example (#8030)
  • Creating more reference tests and unit tests for good coverage
    • The reference tests do not display selection marks in fields when checkbox/radio button styling is supported by the browser.

Text widgets

  • Rendering of single-line fields (#7602)
  • Handle maximum length (#7622)
  • Handle flags: multiline and read-only (#7633)
  • Handle flags: comb (#7649)
  • Handle justification (#7622)
  • Sanitize maxLen and textAlignment in the core layer and unit tests for this (#7629)

Choice widgets

  • Rendering of combo boxes (#7671)
  • Rendering of list boxes (#7671)

Button widgets

  • Rendering of pushbuttons (#9191)
  • Rendering of checkboxes (#7898)
  • Rendering of radio buttons (#7898)
  • Handle uncommon entries/flags/behavior: Opt, NoToggleToOff, RadiosInUnison and checkboxes working like radio buttons in the UI
@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2016

This is a meta issue for tracking interactive form (AcroForm) support according to Chapter 8.6 of the PDF reference (https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf#page=671&zoom=auto,-246,244).

It might be a good idea to instead base the work on the latest version of the PDF specification, just in case there are any differences: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf#G11.2110737.

Also, perhaps a good idea to add a "General" TODO item about ensuring proper test-coverage?

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2016

Both items have been addressed. Thank you!

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Sep 17, 2016

I think that we're also going the have to actually parse the contents of the AcroForm dictionary, since otherwise we're not able to e.g. load all the necessary font resources.
Obviously, we cannot use custom fonts in the display layer, but we should be able to at least infer the correct font-family (and things like e.g. bold/italic) that should be used and pass that info on to the display layer.

Also, for printing forms, we might be able to utilize (or build upon) the already existing appendToOperatorList functionality, but that will definitely require that font resources present in the AcroForm dictionary has been loaded.

Another thing that we probably should attempt to support, is using the correct text colour in the display layer (note how in Adobe Reader the text in the form fields of f1040.pdf is blue). This probably ties in to better and more complete Appearance stream support.

Finally, a general question: Will we actually be able to support forms in a meaningful way, without partial (and well sanitized) script support?

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2016

Good points. I just added them to the item list above. I don't think we really need script support as the AcroForms generally just require filling and printing. AFAIK scripts are only used for interaction between elements, but we can implement the most used functionality ourselves (such as resetting the form or button actions for printing it). We'll have to see how widely used such script functionality is.

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Sep 18, 2016

Handle flags: multiline and read-only

There's other flags that we might need to try and support as well, one example is comb which controls the spacing between the characters in an input field. That one is actually used on the second page of f1040.pdf, see the "Personal identification number (PIN)" field.

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2016

Sounds like a good idea. I have added it to the list.

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2016

It would probably also be a good idea see if the WidgetAnnotation code that builds the fullName property can be cleaned up or improved upon, see

// Building the full field name by collecting the field and
// its ancestors 'T' data and joining them using '.'.
var fieldName = [];
var namedItem = dict;
var ref = params.ref;
while (namedItem) {
var parent = namedItem.get('Parent');
var parentRef = namedItem.getRaw('Parent');
var name = namedItem.get('T');
if (name) {
fieldName.unshift(stringToPDFString(name));
} else if (parent && ref) {
// The field name is absent, that means more than one field
// with the same name may exist. Replacing the empty name
// with the '`' plus index in the parent's 'Kids' array.
// This is not in the PDF spec but necessary to id the
// the input controls.
var kids = parent.get('Kids');
var j, jj;
for (j = 0, jj = kids.length; j < jj; j++) {
var kidRef = kids[j];
if (kidRef.num === ref.num && kidRef.gen === ref.gen) {
break;
}
}
fieldName.unshift('`' + j);
}
namedItem = parent;
ref = parentRef;
}
data.fullName = fieldName.join('.');
.

Also, regarding WidgetAnnotations it seems that different types can have different requirements for the V entry in the annotation dictionary, so it might be better to fetch and validate data.fieldValue in each specific WidgetAnnotation subclass.

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2016

The first point is now in the list, for which I've got some ideas. I found out about the second point in a patch I'm currently finalizing for choice widget annotations, so that will be addressed there.

@lexcorp

This comment has been minimized.

Copy link

commented Oct 5, 2016

Hey @timvandermeij
When this functionality will be available? How I can help?

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Oct 5, 2016

We're currently in the process of implementing this, but it's a large piece of functionality that will take time before it's complete. The ticked boxes above show which elements are already implemented and for other boxes there are already work-in-progress pull requests, so we're on track with this functionality. Feel free to test it by using the master branch and setting the renderInteractiveForms parameter to true. It's disabled by default as it's not ready yet.

@lexcorp

This comment has been minimized.

Copy link

commented Oct 5, 2016

Thank you tim, what can you tell me about digital signatures? There is progress according to this discussion thread #1076

This was reported by the user: soa-x opened this issue on 13 Jan 2012

Almost 5 years have passed since it was reported.

Even someone has already done much of the implementation

viveksjain commented on February 22
@complience Hi, I have a proof-of-concept working at https://github.com/viveksjain/pdf.js/tree/sig-verify-support. You can try it by using git clone --recursive https://github.com/viveksjain/pdf.js.git. With a little bit more work it Should be ready for a pull request into esta repo, but I just Have not Had the time yet.

Do you know if these jobs were added to recent versions of pdf.js?

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2016

Re: #7613 (comment)

Signatures in PDF files is a big and complex topic, one which is somewhat orthogonal to implementation of basic AcroForm support (which is what this particular issue is tracking).

The current issue is just a tracking issue for implementation of basic AcroForm features, signatures are already tracked elsewhere (in #1076, which is where that feature should be discussed).

@lexcorp Please refrain from posting unrelated information and/or asking questions here, since it detracts from the purpose of this issue (which is to track support for basic AcroForm features).
Also, you've now posted basically the same information in three different issues, please do not spam the issue tracker in this way!

@anujgeek

This comment has been minimized.

Copy link

commented Oct 5, 2016

Hello @timvandermeij @Snuffleupagus,
We really like your solution for adding support for AcroForm fields. We're planning to use these features in an app we're currently developing. We'd really appreciate if you can provide us a tentative date where you'd be able to add support for all types of form fields like checkboxes, etc. and export the filled data into an XFDF file or any other format. Thanks.

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Oct 6, 2016

@anujgeek As I've already mentioned in #7613 (comment), this is a tracking issue and not really a good place for this kind of general discussion and/or asking questions!

There's a number of fairly difficult TODOs left to implement, see the possibly incomplete list above, hence it's not possible to give any sort of estimate of when, or even if, this feature will be completely implemented.

Also, note that so far all work has been done by contributors, and given that Mozilla is replacing PDF.js in Firefox (see https://wiki.mozilla.org/Mortar_Project) forms support will most likely take a while to complete.

@timvandermeij timvandermeij moved this from TODO to In progress in AcroForms support Feb 18, 2017

@mozilla mozilla locked and limited conversation to collaborators Mar 20, 2017

@mozilla mozilla unlocked this conversation Mar 20, 2017

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Mar 20, 2017

This is a tracking issue (refer to #7613 (comment)), so this is not the place for discussion or questions. Contact us on IRC in case of questions or file a separate issue if you found a bug. Thanks.

(I'm unlocking the conversation to be able to let users use the reaction button to measure the interest for this feature, but irrelevant comments will be removed.)

@Alex-DE-74

This comment has been minimized.

Copy link

commented Dec 11, 2017

Hello together!

What is the progress with AcroForm fill?
Used example https://www.irs.gov/pub/irs-pdf/f1040.pdf (and other) still does not work. Or is it not configured by default?
Some basic JavaScript like set field(s), clear field(s), send button support mentioned?

Thanks.

@Snuffleupagus

This comment has been minimized.

Copy link
Contributor

commented Dec 11, 2017

@Alex-DE-74 Please read through the above comments carefully, in particular #7613 (comment) and #7613 (comment) are relevant.
Furthermore, you've already asked these questions in #9261 (where answers were provided); please let's try and keep this tracking issue free from that kind of general discussion.

@Alex-DE-74

This comment has been minimized.

Copy link

commented Dec 11, 2017

@Snuffleupagus

Excuse me, but for me it's not really traceable throught many topics, which item has which stage. And cyclic references are not helpfull at all. From point of https://github.com/mozilla/pdf.js/projects/1 it is clear for me, what pice of AcroForms is supported now (complettely) and what is on plan. Moreover, many topics address renering/viewing, but no words about fill/check/select/submitt etc. interactive feature. So, by example, "Text widgets" part above has nothing about "Text typing". Than, if "AcroForm Dictionary" is currently not parsed at all, how can it works really well?
Maybe if would be helpfull for "users" to see a simply table where AcroForm featrures with their properties and a state of whole/particular/planned support listed. (why this showed bold=?!)

P.S. It is pain to me, I'm not JS/HTML5 expert, but done a lot of things on the other site (creating PDF with C#) and familiar wth other programming languages too. Is it worth to me to try to understand the current code in order to provide some more interactive support and help to develop this project? Or will be this take a huge amount of time just to understand the current architecture?

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Dec 11, 2017

I have removed the bold style for you. I would like to emphasize again that this is not the place for such a discussion; a channel like IRC would be more appropriate so we can give some background information. Filling in/submitting/printing forms is in fact in the checkbox list above, it just hasn't been implemented yet. The "text widgets" part is about rendering text widgets, which means the input fields you can type in. That's done; the part that remains is storing the entered values. Anyone is welcome to help out with implementing this.

@kekkc

This comment has been minimized.

Copy link

commented Feb 4, 2018

BTW: Chrome is also not able to save PDFs with forms, but there's a workaround. Forms are rendered by default and one is able to print them and one can even print them as PDF by default, including the form input.

Maybe this is applicable for pdf.js, too and we can just utilize the existing FF save as PDF ( https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/tabs/saveAsPDF )?

@timvandermeij timvandermeij removed the 1-core label Aug 3, 2018

@dhufnagel

This comment has been minimized.

Copy link

commented Aug 12, 2018

I am playing around with pdf.js trying to print entered form text field values. I have a rudimentary working proof of concept where I can render entered values to the printing PDF. I now want to dicuss my approach and see if someone comes up with a better or simpler one.

In my approach I pass the entered values to the worker task by adding a map to the task. This map is currently filled on the 'beforeprint' event.
In the 'getOperatorList' mehtod of the 'TextWidgetAnnotation' I read the object stream and replace the old text value of the 'Tj' operator with the new one. This works, but has a lot of problems coming along. The first one is, that it fails, if the stream has no 'Tj' operator because the field had no value. The second one is, that the placement for alignments other than 'left' will be wrong.
So the next idea is to create a completely new stream calculating all values by myself. This will be a lot of work, so I wanted to discuss this approach first.
I can already create a new stream and displaying the values, but again, there is the problem with the offset values of the 'Td' operation. I digged into to the code a bit and I think I need to calculate the offset X and Y position by taking into account the width and height of the String with the given Font. I found the FontDescriptor for one embedded font, but not for a system font. With the font descriptor I have the ascent and descent value of the font, with which I think I can calculate the y offset The x offset will be fixed for left-aligned texts, but needs to be calculated for centered, or right-aligned texts. I think I am able to do this with the widths array of the Font xRef, but again, there is no such for system fonts. So I think I would have to use a canvas and the measureText method.

So as you see there is a lot of 'thinking'. But before I try to implement and test my approach, I'd like to know what others are thinking of it.

@timvandermeij

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2018

Some time ago we had a discussion about how we could approach this. Refer to https://mozilla.logbot.info/pdfjs/20161219. The idea is to have two different operator lists: one for the UI and one for printing. In the one for printing, we would replace operations based on the entered/selected value in the widget.

I think this is somewhat easier than what you're describing since we let the remaining logic do the heavy lifting for us; we just have to provide the correct operator list.

This is a problem that we have to solve in multiple small steps. The first step is to make the annotation code asynchronous, which is done by @dmitryskey in #9822. The next step would be to parse the AcroForm dictionary for e.g., fonts and to parse the default appearance entry in the annotation dictionary for all appearance information. For this we can probably use the evaluator to get the information as an operator list, which required the annotation code to be asynchronous. Then, we can create the printing operator lists for each annotation type.

@dhufnagel

This comment has been minimized.

Copy link

commented Aug 12, 2018

I also thought of creating the operation list by myself, but this would be more complicated for me than my approach. I just create the pdf object stream with 'BMC ... EMC' and pass the stream to the evaluator, which generates the operationlist.
If I create the operation list array myself, I will have the same problems as with generating a new object stream. But imho it is more complicated to create the oplist than to create a string and convert it to a objectstream. This already works in my proof of concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.