Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative way of Supporting for doc-files #23

Open
bedantaguru opened this issue Jul 23, 2019 · 0 comments
Open

Alternative way of Supporting for doc-files #23

bedantaguru opened this issue Jul 23, 2019 · 0 comments

Comments

@bedantaguru
Copy link

Thanks a lot for such a great package.

I was trying out docxtractr::read_docx on doc files in Windows 10 using LibreOffice Version: 6.2.5.2 (x64).

It was horribly slow (due to LibreOffice I guess) if I don't open LibreOffice (manually outside R). Once I close and run the same code in R again it's slow.

fn <- "rough/messy_files/doc.doc"
library(tictoc)

# LibreOffice never opened in after last PC-reboot
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 285.63 sec elapsed
# 4.7 min !

# LibreOffice open
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 1.1 sec elapsed

# LibreOffice closed after open
tic()
tmp <- docxtractr::read_docx(fn)
toc()
# 24.21 sec elapsed

It is ok for a single file but if you have bundles of files then definitely not a good thing.
I was thinking if any alternative way of supporting doc files can be given to users.

Like use of docx4j as mentioned in this repository. Then the system dependency (on LibreOffice) will go away and I believe that will be smoother also.

Ref #5

@bedantaguru bedantaguru mentioned this issue Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant