how to get the comments and Corresponding content from a docx file ? #483

smile-kindred · 2018-03-21T02:36:56Z

<w:commentRangeStart w:id="0"/>
<w:r>...</w:r>
<w:commentRangeEnd w:id="0"/>

how to get <w:r>...</w:r> above

zhnzhang · 2022-07-23T14:21:46Z

Maybe you can use lxml.etree to have a try, like below:

from lxml import etree
import zipfile


ooXMLns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
docxFilePath = "Your Docx File Path."

docxZip = zipfile.ZipFile(docxFilePath)
documentXML = docxZip.read('word/document.xml')
et = etree.XML(documentXML)
commentRangeStarts = et.xpath('//w:commentRangeStart', namespaces=ooXMLns)

elem = commentRangeStarts[0].getnext()

getnext() will help you to get the element of <w:r>...</w:r>.

regstuff · 2023-05-14T11:15:35Z

For anyone coming to this from Google (like me), this worked for me:

def get_document_comments(docxFileName):
       comments_dict = {}
       comments_of_dict = {}
       docx_zip = zipfile.ZipFile(docxFileName)
       comments_xml = docx_zip.read('word/comments.xml')
       comments_of_xml = docx_zip.read('word/document.xml')
       et_comments = etree.XML(comments_xml)
       et_comments_of = etree.XML(comments_of_xml)
       comments = et_comments.xpath('//w:comment', namespaces=ooXMLns)
       comments_of = et_comments_of.xpath('//w:commentRangeStart', namespaces=ooXMLns)
       for c in comments:
          comment = c.xpath('string(.)', namespaces=ooXMLns)
          comment_id = c.xpath('@w:id', namespaces=ooXMLns)[0]
          comments_dict[comment_id] = comment
       for c in comments_of:
          comments_of_id = c.xpath('@w:id', namespaces=ooXMLns)[0]
          parts = et_comments_of.xpath(
            "//w:r[preceding-sibling::w:commentRangeStart[@w:id=" + comments_of_id + "] and following-sibling::w:commentRangeEnd[@w:id=" + comments_of_id + "]]",
            namespaces=ooXMLns)
          comment_of = ''
          for part in parts:
             comment_of += part.xpath('string(.)', namespaces=ooXMLns)
             comments_of_dict[comments_of_id] = comment_of
        return comments_dict, comments_of_dict

Courtesy: https://stackoverflow.com/a/75169632/3016570

evbo mentioned this issue Oct 31, 2018

How to Accept ("Preserve") DOCX Commented Changes In order to parse resulting text #566

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to get the comments and Corresponding content from a docx file ? #483

how to get the comments and Corresponding content from a docx file ? #483

smile-kindred commented Mar 21, 2018 •

edited

zhnzhang commented Jul 23, 2022

regstuff commented May 14, 2023

how to get the comments and Corresponding content from a docx file ? #483

how to get the comments and Corresponding content from a docx file ? #483

Comments

smile-kindred commented Mar 21, 2018 • edited

zhnzhang commented Jul 23, 2022

regstuff commented May 14, 2023

smile-kindred commented Mar 21, 2018 •

edited