Issue on 'for loop' with shapes inside? #61

fernandobrito · 2017-07-02T14:48:22Z

Hello,

When I have a "for loop" with a shape inside, and the loop runs more than once, Word claims that output is corrupted.

Sample input:

If loop runs only once, output is as expected (one blue rectangle). If loop is run more than once, Word 2016 complains:

But if I try to proceed and let Word recover the document, output is as expected (two rectangles). If I replace the figure with text, everything works fine.

I will try to investigate the issue soon. Just posting it here to force myself to come later with a following up and to ask if this is a known bug.

Thanks again for the great work on the gem.

fernandobrito · 2017-07-02T15:30:10Z

The error occurs because the for loop generates figures with duplicated ids. I will see if I can come up with a fix.

senny · 2017-07-03T09:24:57Z

@fernandobrito thanks for looking into this! You are definitely on the right track. The way sablon loops work is that they copy the contents for every iteration. If there are elements inside the loop that have to have unique properties, Word will most likely complain.

fernandobrito · 2017-07-03T18:49:51Z

Any suggestions on how to fix it? I needed something quick for a client, so I'm just appending a unique number to each node inside the loop body that has an "id" attribute. It solved my issue with figures.

However, I'm not sure if it is always safe to update all nodes with an "id" attribute or if other types of nodes may have different unique ids but use some other name on the node attribute.

fernandobrito@eebd895

stadelmanma · 2017-07-03T23:22:49Z

@fernandobrito I would assume id's are unique across the entire document but to really be sure you'd need to check the XML spec (5500 pages of light reading if you need something to do on a weekend). While gigantic it is very well organized and bookmarked so its not as bad as it seems. Another good reference for WordML and DrawingML: http://officeopenxml.com, they reference the 3rd edition XML spec but I think that is simply due to websites age.

Could you throw out a small excerpt of the corrupted version i.e. duplicated id's and a excerpt of your fixed version? Seeing what is actually going on will help me out a bit.

I think the best way to fix it is to add functionality that finds the max id value in use and increments it from that point forward. You'll need to check every element that has an id attribute not just the drawing related ones. I'd track the value on the environment instance. You may need to check other files such as footers, headers, etc. as well. Depending on the implementation the 'next' id values could just be stored in a hash with a key matching the unique attribute name.

fernandobrito · 2017-07-14T02:00:55Z

@stadelmanma thanks for the references.

I've done a quick search on the XML spec and it seems most figures/drawings have their unique ids described as an attribute either on the wp:docPr or on the wp:cNvPr tag. I couldn't really understand when which one is used.

The description for the id attribute (which should be a unique integer) is:

id (Unique Identifier)
Specifies a unique identifier for the current DrawingML object within the current
document. This ID can be used to assist in uniquely identifying this object so that it can
be referred to by other parts of the document

If multiple objects within the same document share the same id attribute value, then the
document shall be considered non-conformant.
https://msdn.microsoft.com/en-us/library/documentformat.openxml.drawing.wordprocessing.docproperties(v=office.15).aspx

Here you can find a small example of a rectangle inside a loop: https://github.com/fernandobrito/sablon/blob/eebd895954b157e659fbf180ea8312b027a05a69/test/fixtures/xml/figure_loop.xml#L30. Line 30 has a docPr element with an id attribute which will get duplicated when sablon make copies of the loop body, corrupting the output file.

I've started playing around with your idea of finding the highest id and then assigning new ids for each loop iteration (on this fork), but I just realized I did a big mistake. I started doing my work on top of the images support PR :(. When I have more time I will cherry-pick my changes on top of senny/sablon master so you can take a look and maybe provide some feedback.

By the way, I've been using a lot a tool from Microsoft: https://www.microsoft.com/en-us/download/details.aspx?id=30425. It lets me generate diffs between OOXML files and also validate them. Is there anywhere where I should add it for future contributors? Perhaps on a wiki page on this repo?

stadelmanma · 2017-07-20T23:57:06Z

@fernandobrito I completely missed the last bit of your comment about the MS tool. Sadly, I don't have access to a Windows computer to try it out.

Does it generate a diff like git showing you the line by line changes to each XML file in the document?

The validation part sounds extremely useful since MS Word tends to be pretty ambiguous when a docx gets corrupted. What kind of information does it generate when validating a document?

fernandobrito · 2017-08-01T08:22:18Z

@stadelmanma I had to install Windows on a virtual machine in order to use it.

Yes, the tool provides diffs, such as:
, but I think you can use normal diff tools or web tools such as: https://www.corefiling.com/opensource/xmldiff/.

Maybe the biggest win is the validation feature. It shows which lines are causing problems:

That's how I found about the duplicated id issue on figures.

stadelmanma · 2018-01-18T02:52:49Z

I think this is more easily fixable with the new DOM logic, I have a branch on my fork to work on it. All r:id attributes should be unmodified because that will break relationships defined in a *.rels file.

Additional notes on any elements that use the id attribute:

Table cell identifier (17.4.66), w:id, only unique within the table itself, optional, any string value. Probably can be ignored.
Annotation identifier (17.13.4.2) w:id, applies to comments it says that if more than one comment has the same value for this attribute then only one of them is ignored.
- This appears to be the same for all of the change tracking elements from 17.13.4 - 17.13.5. I think I should leave these id attributes alone.
- Changes tracking in a template doesn't make sense from a use case aside from providing the consumer with notes on what they may need to manually up date so I'm less worried about support this corner case.
The w:bookmarkStart and w:bookmarkEnd elements share a unique ID value.
- If I copy and change one I need to ensure the corresponding end element also gets changed.
- If both elements aren't present in the repeated content then I should just drop the clones
The w:permStart and w:permEnd elements follow the same convention as bookmarks.
The last relevant id is in cNcPR (19.3.1.12) which we already know needs to be unique.

Side note there is a <w:id> element but we shouldn't need to worry about this one (17.5.2.18).

stadelmanma · 2018-01-25T00:44:17Z

Implementation stages:

Auto-increment the id attribute in the wp:docPr and wp:cNvPr tags. I think I should find these tags for any namespace (https://stackoverflow.com/questions/4440451/how-to-ignore-namespaces-with-xpath)
- I'll need to test what happens when shapes are in footnotes, endnotes, headers, etc.
~~Handle the bookmark tags and migrate this logic to perm tags~~
- Several bookmarks with overlapping id's does not appear to corrupt the document. I will leave this to the end user to handle as bookmarks spanning a loop in a odd fashion shouldn't be the original intent

stadelmanma added the bug label Jul 20, 2017

stadelmanma mentioned this issue Jan 12, 2018

Implement dom for templates #85

Merged

3 tasks

stadelmanma self-assigned this Jan 24, 2018

stadelmanma mentioned this issue Jan 26, 2018

Fix unique ids in loops #89

Merged

3 tasks

stadelmanma closed this as completed in #89 Feb 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on 'for loop' with shapes inside? #61

Issue on 'for loop' with shapes inside? #61

fernandobrito commented Jul 2, 2017 •

edited

Loading

fernandobrito commented Jul 2, 2017

senny commented Jul 3, 2017

fernandobrito commented Jul 3, 2017

stadelmanma commented Jul 3, 2017 •

edited

Loading

fernandobrito commented Jul 14, 2017 •

edited

Loading

stadelmanma commented Jul 20, 2017

fernandobrito commented Aug 1, 2017

stadelmanma commented Jan 18, 2018

stadelmanma commented Jan 25, 2018 •

edited

Loading

Issue on 'for loop' with shapes inside? #61

Issue on 'for loop' with shapes inside? #61

Comments

fernandobrito commented Jul 2, 2017 • edited Loading

fernandobrito commented Jul 2, 2017

senny commented Jul 3, 2017

fernandobrito commented Jul 3, 2017

stadelmanma commented Jul 3, 2017 • edited Loading

fernandobrito commented Jul 14, 2017 • edited Loading

stadelmanma commented Jul 20, 2017

fernandobrito commented Aug 1, 2017

stadelmanma commented Jan 18, 2018

stadelmanma commented Jan 25, 2018 • edited Loading

fernandobrito commented Jul 2, 2017 •

edited

Loading

stadelmanma commented Jul 3, 2017 •

edited

Loading

fernandobrito commented Jul 14, 2017 •

edited

Loading

stadelmanma commented Jan 25, 2018 •

edited

Loading