Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract text associated with the comment #14

Closed
kguidonimartins opened this issue Apr 5, 2018 · 3 comments
Closed

extract text associated with the comment #14

kguidonimartins opened this issue Apr 5, 2018 · 3 comments
Assignees

Comments

@kguidonimartins
Copy link
Contributor

Very useful package! I really appreciate it! Thank you!

Is there a way to extract the text associated with the comments?

I did unzip the attached file test.docx, and I did explore the unzipped files.

The word/document.xml file have the following "marks":

<w:commentRangeStart w:id="1"/>
<w:r>
<w:rPr/>
<w:t xml:space="preserve">
Five quacking zephyrs jolt my wax bed. Flummoxed by job, kvetching W. zaps Iraq. Cozy sphinx waves quart jug of bad milk. A very bad quack might jinx zippy fowls. Few quips galvanized the mock jury box. Quick brown dogs jump over the lazy fox. The jay, pig, fox, zebra, and my wolves quack! Blowzy red vixens fight for a quick jump. Joaquin Phoenix was gazed by MTV for luck.
</w:t>
</w:r>
<w:commentRangeEnd w:id="1"/>

With the following associated comments in the word/comments.xml file:

<w:comment w:id="1" w:author="Unknown Author" w:date="2018-04-05T13:58:02Z" w:initials="">
<w:p>
<w:r>
<w:rPr>
<w:rFonts w:eastAsia="Noto Sans CJK SC Regular" w:cs="FreeSans" w:ascii="Liberation Serif" w:hAnsi="Liberation Serif"/>
<w:b w:val="false"/>
<w:bCs w:val="false"/>
<w:i w:val="false"/>
<w:iCs w:val="false"/>
<w:caps w:val="false"/>
<w:smallCaps w:val="false"/>
<w:strike w:val="false"/>
<w:dstrike w:val="false"/>
<w:outline w:val="false"/>
<w:shadow w:val="false"/>
<w:emboss w:val="false"/>
<w:imprint w:val="false"/>
<w:color w:val="auto"/>
<w:spacing w:val="0"/>
<w:w w:val="100"/>
<w:position w:val="0"/>
<w:sz w:val="20"/>
<w:szCs w:val="24"/>
<w:u w:val="none"/>
<w:vertAlign w:val="baseline"/>
<w:em w:val="none"/>
<w:lang w:bidi="hi-IN" w:eastAsia="zh-CN" w:val="en-US"/>
</w:rPr>
<w:t>All paragraph.</w:t>
</w:r>
</w:p>
</w:comment>

These things seem linked by the w:id="1" in both word/document.xml and word/comments.xml files.

It would be very interesting if your docx_extract_all_cmnts() function informs a tibble containing a column with the text associated with the comment.

test.docx.zip

@hrbrmstr hrbrmstr self-assigned this Apr 6, 2018
hrbrmstr added a commit that referenced this issue Apr 6, 2018
@hrbrmstr
Copy link
Owner

hrbrmstr commented Apr 6, 2018

Stellar idea! (thx for checking out the pkg and taking time to file an enhancement request!)

This is a first stab at accommodating the functionality. I added a parameter include_text to the docx_extract_all_cmnts() function. Pls let me know what additional features it shld have (if any) or if it fails to work in some other tests files you may have.

read_docx("~/Downloads/test.docx") %>% 
   docx_extract_all_cmnts(include_text = TRUE)
# A tibble: 4 x 6
  id    author         date                 initials comment_text                     word_src                            
  <chr> <chr>          <chr>                <chr>    <chr>                            <chr>                               
1 0     Unknown Author 2018-04-05T13:58:51Z ""       One word                         "How "                              
2 1     Unknown Author 2018-04-05T13:58:02Z ""       All paragraph.                   "Five quacking zephyrs jolt my wax …
3 2     Unknown Author 2018-04-05T13:58:22Z ""       One phrase inside the paragraph. "Brawny gods just flocked up to qui4 3     Unknown Author 2018-04-05T13:57:50Z ""       source                           from: http://www.blindtextgenerator

@hrbrmstr
Copy link
Owner

hrbrmstr commented Apr 6, 2018

Also, once we're done figuring out the best API for this, pls double-check your attribution in the DESCRIPTION file to make sure I copy/pasted the info right.

@kguidonimartins
Copy link
Contributor Author

Perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants