Skip to content
This repository has been archived by the owner on May 10, 2022. It is now read-only.

Chicago press full text issue #49

Closed
fangzhou-xie opened this issue Jun 14, 2020 · 4 comments
Closed

Chicago press full text issue #49

fangzhou-xie opened this issue Jun 14, 2020 · 4 comments

Comments

@fangzhou-xie
Copy link

Session Info
crminer_0.3.5.91
> library(crminer)
> link <- crm_links("10.1086/250113")
> link
$unspecified
<url> http://www.journals.uchicago.edu/doi/pdf/10.1086/250113

> ft <- crm_text(link, "pdf", overwrite_unspecified = T)
using cached file: /Users/xiefangzhou/Library/Caches/R/crminer/250113.pdf
date created (size, mb): 2020-06-12 22:59:56 (0)
Extracting text from pdf...
Error in poppler_pdf_info(loadfile(pdf), opw, upw) : PDF parsing failure.

Sorry for posting this, as this is clearly similar to #41 here and others, but this time it happens for U Chicago Press. The full-text link can be copied and pasted to a web browser and opened as a PDF file.

@sckott
Copy link
Contributor

sckott commented Jun 16, 2020

thanks! I'll have a look

@sckott
Copy link
Contributor

sckott commented Jun 16, 2020

i can't replicate the error. can you see if that pdf file is a valid pdf? or is it gibberish? non-pdf content in it?

@fangzhou-xie
Copy link
Author

Thanks for your reply! I found out that the cached PDF file is zero-byte object and seems to be corrupted. After installing the newest version and removing all the cached files, it seems to work without any issue. I also succeeded with #46 as well.

Thank you so much!

@sckott
Copy link
Contributor

sckott commented Jun 16, 2020

great, glad it works

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants