Chicago press full text issue #49

fangzhou-xie · 2020-06-14T01:00:40Z

Session Info

crminer_0.3.5.91

> library(crminer)
> link <- crm_links("10.1086/250113")
> link
$unspecified
<url> http://www.journals.uchicago.edu/doi/pdf/10.1086/250113

> ft <- crm_text(link, "pdf", overwrite_unspecified = T)
using cached file: /Users/xiefangzhou/Library/Caches/R/crminer/250113.pdf
date created (size, mb): 2020-06-12 22:59:56 (0)
Extracting text from pdf...
Error in poppler_pdf_info(loadfile(pdf), opw, upw) : PDF parsing failure.

Sorry for posting this, as this is clearly similar to #41 here and others, but this time it happens for U Chicago Press. The full-text link can be copied and pasted to a web browser and opened as a PDF file.

sckott · 2020-06-16T01:00:41Z

thanks! I'll have a look

sckott · 2020-06-16T01:04:03Z

i can't replicate the error. can you see if that pdf file is a valid pdf? or is it gibberish? non-pdf content in it?

fangzhou-xie · 2020-06-16T14:42:29Z

Thanks for your reply! I found out that the cached PDF file is zero-byte object and seems to be corrupted. After installing the newest version and removing all the cached files, it seems to work without any issue. I also succeeded with #46 as well.

Thank you so much!

sckott · 2020-06-16T17:37:17Z

great, glad it works

fangzhou-xie closed this as completed Jun 16, 2020

fangzhou-xie mentioned this issue Jun 16, 2020

Some crm_pdf/cr_text tests failing #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chicago press full text issue #49

Chicago press full text issue #49

fangzhou-xie commented Jun 14, 2020

sckott commented Jun 16, 2020

sckott commented Jun 16, 2020

fangzhou-xie commented Jun 16, 2020

sckott commented Jun 16, 2020

Chicago press full text issue #49

Chicago press full text issue #49

Comments

fangzhou-xie commented Jun 14, 2020

sckott commented Jun 16, 2020

sckott commented Jun 16, 2020

fangzhou-xie commented Jun 16, 2020

sckott commented Jun 16, 2020