Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse .xlsx content from crul response? #161

Closed
moldach opened this issue Jul 6, 2021 · 4 comments
Closed

How to parse .xlsx content from crul response? #161

moldach opened this issue Jul 6, 2021 · 4 comments

Comments

@moldach
Copy link

moldach commented Jul 6, 2021

I'm trying to figure out how to parse the response since it is an .xlsx EXCEL spreadsheet (and not JSON)

suppressMessages(library(crul))
suppressMessages(library(lubridate))
suppressMessages(library(openxlsx))

# Set ENV variables
to_date <- lubridate::as_date("2021-06-29")
from_date <- lubridate::as_date("2021-06-01")
version = 63973
token = [PRIVATE KEY HERE]

url <- paste0('https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=', from_date, '&to_date=', to_date, '&version=', version, '&token=', token)

(cc <- Async$new(
  url = url
))
(res <- cc$get())

Showing res:

[[1]]
<crul response> 
  url: https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-06-01&to_date=2021-06-29&version=63973&token=[PRIVATE]
  request_headers: 
  response_headers: 
    status: HTTP/2 200
    date: Tue, 06 Jul 2021 16:06:53 GMT
    content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
    content-length: 10815234
    x-powered-by: Express
    access-control-allow-origin: *
    content-disposition: attachment; filename=DATASET-V2-2021-06-01_2021-06-29.xlsx
    etag: W/"a50702-sV45umM0y/Nr9u0V3u0ml8yerK0"
    set-cookie: api_cookie=s%3AgDQh8H24Ajac8HLcr56ns-FHtfdbkTxW.VzXMnwrwHCXiWK4pcT7M1BFauGR9Apy5MW0tRm0qMo0; Path=/; HttpOnly
  params: 
    from_date: 2021-06-01
    to_date: 2021-06-29
    version: 63973
    token: PRIVATE
  status: 200

I thought I could try to adapt the write to disk section of the documentation in order to first save the response to a tempfile and then use {openxlsx} to read the data; however, this did not work:

x <- HttpClient$new(url = url)
f <- tempfile()
res <- x$get(disk = f)
openxlsx::read.xlsx(res$content)

Error: openxlsx can only read .xlsx files

@sckott
Copy link
Collaborator

sckott commented Jul 9, 2021

Try this:

library(crul)
url = "https://github.com/ycphs/openxlsx/blob/master/inst/extdata/namedRegions.xlsx?raw=true"
cc <- Async$new(url = url)
res <- cc$get()
class(res[[1]]$content)
writeBin(res[[1]]$content, (f=tempfile(fileext=".xlsx")))
openxlsx::read.xlsx(f)

the data in $content is of type raw, you can write it to a file first, then read with openxlsx

let me know if that works for yoyu

@moldach
Copy link
Author

moldach commented Jul 15, 2021

Thank you very much.
Ultimately I'm trying to adapt this from Async -> AsyncQueue and the url example you provided works when I try it for AsyncQueue like this:

> reqlist <- list(
  HttpRequest$new(url = "https://github.com/ycphs/openxlsx/blob/master/inst/extdata/namedRegions.xlsx?raw=true")$get(),
  HttpRequest$new(url = "https://github.com/ycphs/openxlsx/blob/master/inst/extdata/namedRegions.xlsx?raw=true")$get(),
  HttpRequest$new(url = "https://github.com/ycphs/openxlsx/blob/master/inst/extdata/namedRegions.xlsx?raw=true")$get()
)
> out <- AsyncQueue$new(.list = reqlist, req_per_min=3)
> out$request() # make requests
> out$responses()[[1]]
<crul response> 
  url: https://raw.githubusercontent.com/ycphs/openxlsx/master/inst/extdata/namedRegions.xlsx
  request_headers: 
  response_headers: 
    status: HTTP/2 200
    cache-control: max-age=300
    content-security-policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
    content-type: application/octet-stream
    etag: W/"485760db4f84748f4d4cfa4ea6fa6b5711ca08f7adf6424667f42f171301ee35"
    strict-transport-security: max-age=31536000
    x-content-type-options: nosniff
    x-frame-options: deny
    x-xss-protection: 1; mode=block
    x-github-request-id: 7CA6:7A92:39526:ACBFC:60F05459
    accept-ranges: bytes
    date: Thu, 15 Jul 2021 16:12:46 GMT
    via: 1.1 varnish
    x-served-by: cache-sjc10030-SJC
    x-cache: HIT
    x-cache-hits: 1
    x-timer: S1626365567.904596,VS0,VE0
    vary: Authorization,Accept-Encoding
    access-control-allow-origin: *
    x-fastly-request-id: 0e888ce59af5c93de05cbe1bc3dbdd42cf2c1df3
    expires: Thu, 15 Jul 2021 16:17:46 GMT
    source-age: 0
    content-length: 8209
  status: 200
> class(out$content()[[1]])
[1] "raw"
> writeBin(out$content()[[1]], (f=tempfile(fileext=".xlsx")))
> openxlsx::read.xlsx(f)
  This.is.C2     X2
1       Col1   Col2
2     Data11 Data12
3     Data21 Data22
4     Data31 Data32

However, when I try it on my URL(s) only the Async method works but not the AsyncQueue method? (I get a 504?)

Works

> url <- "https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-04-03&to_date=2021-06-02&version=63973&token=[TOKEN]"
> cc <- Async$new(url = url)
> res <- cc$get()
> class(res[[1]]$content)
[1] "raw"
> writeBin(res[[1]]$content, (f=tempfile(fileext=".xlsx")))
> openxlsx::read.xlsx(f)
   Target.ID Target     Action Genetic.Studies/Variants Genetic.Studies/Variants.(literature.term) Uniprot.ID Gene
1          5    ALK Inhibitors                     <NA>                                       <NA>     Q9UM73  ALK
2          5    ALK Inhibitors                     <NA>                                       <NA>     Q9UM73  ALK
...

Doesn't Work

> reqlist <- list(
    HttpRequest$new(url = "https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-04-03&to_date=2021-06-02&version=63973&token=[TOKEN]")$get(),
    HttpRequest$new(url = "https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-02-20&to_date=2021-04-03&version=63973&token=[TOKEN]")$get(),
    HttpRequest$new(url = "https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-01-01&to_date=2021-02-20&version=63973&token=[TOKEN]")$get()
)
> out <- AsyncQueue$new(.list = reqlist, req_per_min=3)
> out$request() # make requests
> out$responses()[[1]]
<crul response> 
  url: https://api.targetsafety.info/api/alertcenter/dataset/param?from_date=2021-04-03&to_date=2021-06-02&version=63973&token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjAwMDE2IiwibmFtZSI6IkdlbmVudGVjaCIsImRlcGFydG1lbnQiOiJHZW5lbnRlY2ggQVBJIn0.p01jxx7PekOUQLOGjvlhR3YWP2GdcD7_r6MM515bYmM
  request_headers: 
  response_headers: 
    status: HTTP/2 504
    server: awselb/2.0
    date: Thu, 15 Jul 2021 16:25:36 GMT
    content-type: text/html
    content-length: 132
  params: 
    from_date: 2021-04-03
    to_date: 2021-06-02
    version: 63973
    token: [TOKEN]
  status: 504
>  class(out$content()[[1]])
[1] "raw"
> writeBin(out$content()[[1]], (f=tempfile(fileext=".xlsx")))
> openxlsx::read.xlsx(f)
Error in file(con, "r") : invalid 'description' argument
In addition: Warning message:
In unzip(xlsxFile, exdir = xmlDir) : error 1 in extracting from zip file

Please let me know if you'd like DM privately to debug so I could share the token 💁🏼

@sckott
Copy link
Collaborator

sckott commented Oct 2, 2021

Sorry for the long delay. Yeah, can you send me the token so I can try to debug it? You can use the email associated with this pkg

@sckott
Copy link
Collaborator

sckott commented Nov 20, 2021

I get those 504 errors, but I do with every request. I think what's happening is intermittent server downtime, resulting in the 504s. I doubt this has anything to do with Async vs. AsyncQueue. Unless you can make a reproducible example.

By the way, you're token is above in #161 (comment), you should get a new one

@sckott sckott closed this as completed Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants