-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better parsing of annotation data in worker thread. #10716
Better parsing of annotation data in worker thread. #10716
Conversation
…IRT, RT, StateModel, State, CreationDate, ModifiedDate, RichText, IT, ... . _preparePopup replaced with MarkupAnnotation base class. Added classes for missing annotations, so all anotations have correct subtype in data property.
…t annotation has reply type Group.
I'm not sure if it's a good idea to add support for this only for the API. Setting only the data property may give a false impression that we actually support those types whereas we only expose the most basic properties and not all specific ones for those annotation types. Moreover, we don't actually render them with this patch, so even though they don't show up in the viewer they also don't cause a warning in the console anymore and may give more problems if https://github.com/mozilla/pdf.js/blob/master/src/display/annotation_layer.js#L266 is being called for unsupported annotation types in the display layer. I think it would be better to support annotation types in both the core and the display layer, which is how we have been implementing the existing types so far. It will take more time, but is also more complete and easier to verify/review since it's a small, self-contained part. Finally, it also allows us to add more tests, i.e., also reference tests on top of the unit tests.
This PR is, as-is, too large. If we were to choose that only supporting the types in the API is OK, then it will need to be split up for review. |
Yeah, that is true and I undestand your concern, but at least you will know they exist in pdf and you can do something with that in display layer (you can say ... ok this annotation is here, it is of this type and if it is markup annotation - it has this title). Even now, there are implemented annotations, which doesn't have 100% support of all features (which is why I did this PR - I was missing markup properties), so there is already precedence. Let's say, that every annotation must implement parsing of it's required properties (according to pdf spec) before it can be added into API. Then at least caret annotation is ok to add (it is also a markup annotation, so it's not just parsed subtype). Or we can add warning message into cases, where there is only the basic implementation for those annotations (movie, sound etc.)
I think this should not be a problem, since in code you mentioned, in switch above are explicitly named supported annotations and as default is this base class, which throws warning. It doesn't matter if subtype is undefined or not named in cases. It is well designed for this situation :).
There is opened issue #5283 which is asking for improved annotation API already, so I think people want to have api first and display later. But I understand, there might occur issues later when implementing display layer (they shouldn't, but we all know they might), which might need change of API and that may bring some breaking changes in the future, so I understand this concern as well ... but that happens ...
Yeah it is maybe too much all at once. So how should I split it?
And then create PR for carret annotation in api + in display to satisfy #10677 ? |
Thank you for replying. You actually made me remember that I already prepared most work for caret annotations (the test file is even already in the repository), but I simply forgot to finalize it. I just did and made #10723 for that so we have support for caret annotations in both the core and the display layer, so that's then already split off. I'm hoping to do free text annotations next, for which we already have an open pull request that needs to be finalized as well. After that I think we should check on a case-by-case basis how we can integrate the other parts of this patch. |
Nice, you're welcome :). What about splitting it even more … for the first step I would try to replace _preparePopup method in API for MarkupAnnotation base class . If it won't break tests, can we move on to adding other features then? |
The implementation of caret annotations in the core and display layer is now merged.
I looked into that and found that the specification defines that abstraction too, so it sounds like a good idea to do that to match the specification more closely. The advantage is that that's only refactoring, so it can be its own PR and not breaking existing tests is enough for that to be accepted. After that we can see what's best to do next. |
Ok, so I created pull request for that (#10728). |
What is next ? :) |
About creation date and modified date PR (Date PR) … there is some polishing work to be done and mainly, we need to decide what do we want to get..
Please correct me if I'm wrong at something. Thank you for your guidance :). |
The implementation of free text annotations in the core and display layer is now merged too. I have some ideas for how to approach the date parsing since I already wrote some code for that. I'll try to check if we can extract that so we can simply add the creation/modification date attributes. Most likely we can also show them in the display layer quite easily so we have that covered as well. While I look into that, could you share some more information about how Adobe Reader handles discussion threads for annotations (since that's a good reference implementation for us)? For example, are they shown in a sidebar or inline in the popup? If possible, could you make some screenshots (I work on Linux and always had issues getting Adobe Reader to run)? This will help to determine how we can display this information to the user. I found the specification for Implementing core layer support for this functionality seems OK to me as long as there is sufficient unit test coverage; display layer support is more difficult and can be done in a follow-up. |
Sure. Every MarkupAnnotation can have replies to it. Reply is always text annotation, which has RT=Reply and IRT is reference to annotation, to which this annotation replies (same as popup has parent reference). Reply text is stored in Contents and author in Title. Now it gets little complicated. So in previous example, let's say, we will place 'accepted' review state annotation (SA) to a Reply2 and we also mark it, we will get this: Let's say, I changed my mind and switched state to rejected, that will add a next state annotation: So there is always stored history of states to each annotation (even reply ones). Adobe enables to display whole history through properties menu (right click on annotation or reply). But in UI, it only shows last state per each user. Marked states behave even more interestingly (at least in Acrobat) and I don't know how to do that in pdf.js … Marked states are just check marks which are displayed on annotations or it's replies. Trick is, that Acrobat stores user name (it takes it from windows account and it cannot be changed by user) and it displays only marked states, which were made by current user. (It compares string name from it's settings with string name in title property of state annotation). User, who is running Acrobat and is seeing pdf file, can only see marked states that he made, but not marked states of other users which are stored in pdf file. Then there are Group annotations. Main difference between Group and Reply is, that Group can be of any other markup annotation type and has appearance, but reply is always text annotation and is visible only in discussion area of pdf viewer … I call reply annotations "modifiers" as they serve as extra meta-data providers for it's parent annotation … but groups are multiple annotations which behaves as one. I think, you got it right. It should be bundled together, but it is not that simple because of how replies are stored (linked list .. and I am not sure, that order of annotations is guaranteed … that parent is before child ). And with states coming into mix … there is more bundling then it seems … Generally speaking, the master markup annotation for display should have these subsets:
I hope I got it right, but this is how I understand this topic … :) Images will follow in next comment, now i need to reinstall Acrobat and restart PC ;) (I wanted to change language to english … but that's not that easy … it requires to install completely different version of acrobat :/ ). |
How Acrobat displays annotations: This is what Acrobat displays on hover - it is same with what pdf.js has. When annotation has no replies, it doesn't show bracket area "(3 replies)" at all. This is what it displays, when top annotation has states (it doesn't count marked states or states of replies at all) This is how it looks, when you click on annotation and "Comment" side panel is closed (hover is on state, so you can see what is in tooltip) This is how it looks, when you click on annotation and "Comment" side panel is opened This is how a review history looks: This is how it looks, when two users sets a multiple states (different states at top, same at bottom) Another cool feature is, that popup on hover follows mouse as it moves over annotation … I think that should be doable here too. Testing pdf here: annotations_acrobat_welcome.pdf |
Closing since most parts are now merged and the final part is now in a new pull request, so let's track it there from now on. Thank you for your efforts so far! |
Added parsing of IRT, RT, StateModel, State, CreationDate, ModifiedDate, RichText, IT, ... .
_preparePopup replaced with MarkupAnnotation base class.
Added classes for missing annotations, so all anotations have correct subtype in data property.
Date parsing is done like this: pdf date string -> ISO 8601 string ( parse pdf date with regex and construct ISO string from that) -> new Date constructor. There are also unit tests, which should cover all cases (I hope).
There are also some unit tests to test some of the newly parsed properties.
This should help others (and pdf.js too) to build better UI layer for displaying annotations, because worker thread will provide a lot more info about annotations.
As of issue #10677 : there is missing implementation of caret annotation element in viewer part, but that should be easy, as there is nothing special ... it should be same as strikeout annotation.
To make work strikeout + caret, there will be needed some changes in viewer part to correctly display master annotation(caret) popup on child annotation (strikeout) or some more changes in worker part.
Child annotation has it's own popup, but during parsing, popup annotation is grabing raw data from child annotation (not-yet-parsed), so the master content is not there yet :/. I'm affraid, it is not currently possible to access parsed data of other annotations during parsing process ... so it should be probably handled in viewer part or there should be added some after-process into worker part, which will handle this ... Or maybe, popup annot could look at parent, check if it is RT Group and if it is, then go for it's parent ?
What do you think guys?