New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Accept ("Preserve") DOCX Commented Changes In order to parse resulting text #566
Comments
Well, the short answer is there is no support yet for revision marks in The If you format the XML, properly indicating the hierarchical relationships, it's easier to see what's going on (also, you should trim out all those 'undefined' items, those are not meaningful or valid .docx XML). Also, I've removed all the redundant font formatting since that doesn't bear on the current question. <w:p>
<w:r>
<w:t>SOME CHANGE</w:t>
</w:r>
<w:ins w:author="Leroy Jim" w:date="2018-10-10T14:12:00Z" w:id="1480">
<w:r w:rsidR="00464094">
<w:t> AND ANOTHER CHANGE</w:t>
</w:r>
</w:ins>
<w:del w:author="Leroy Jim" w:date="2018-10-10T14:12:00Z" w:id="1481">
<w:r w:rsidDel="00464094">
<w:delText>A DELETE CHANGE HERE</w:delText>
</w:r>
</w:del>
<w:r>
<w:t AND MORE CHANGES HERE</w:t>
</w:r>
</w:p> Basically, you have a paragraph element If you want to process these to get a different view (like perhaps "with changes") you need to parse the |
Okay thank you that makes sense. So is there a way to access a list of |
You'll have to use |
Hi,
Thank you for this amazing library! I'm trying to read the the
.text
attribute ofparagraph
s and recently realized that the state of thattext
content can be defined by an array of DOCX commented changes.My question is how can I programmatically "accept" all commented changed in a DOCX file, so that the resulting text content has all of its diff (or versioned history) merged?
Some background:
What makes parsing commented DOCX files confusing is that in Microsoft word you might not know that comments exist because you can ignore the comments (allowing you to see the full merge of different author's changes), but when parsing with python-docx, unless the comments are "accepted" or "rejected", only the characters not affected by the "diff" (or series of changes) are defined in the
.text
attribute. So in order to see the resulting text, you must accept or reject all comments first, save the document, then parse it. Only then will the document you viewed in Word match the resulting text parsed.Here is a snippet of text (from a large document, I hope it's more helpful having a sample of!) that should read:
SOME CHANGE AND ANOTHER CHANGE AND MORE CHANGES HERE
But actually, that text is formed by a series of
<w:t></w:t>
objects, some additive, and one deletion defined as follows:So in order to parse the text, all comments must be accepted (marked with "preserve") as follows:
What would be the recipe for "preserving" all commented changes so that I get the expected result? Is there a shortcut to just "accept" all? Or will I need to edit the raw xml of this document?
Thank you, and here are related issues to handling comments that I found useful:
#483
#93
The text was updated successfully, but these errors were encountered: