Skip to content

questions about caching with path #442

@mkoohafkan

Description

@mkoohafkan

I'm a little confused about if/how caching works when path is specified. Modifying the example in the documentation for req_cache():

library(httr2)

# create cache directory
td = file.path(tempdir(), "cache")
dir.create(td)

url <- paste0(
  "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/",
  "master/inst/extdata/penguins.csv"
)

# Here I set debug = TRUE so you can see what's happening
req <- request(url) |> req_cache(td, debug = TRUE)

# First request downloads the data
tf <- tempfile(fileext = ".csv")
resp <- req |> req_perform(path = tf)

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw\\file53f823441cef.csv"

# Second request retrieves it from the cache
tf2 <- tempfile(fileext = ".csv")
resp <- req |> req_perform(path = tf2)
## Found url in cache "d5d1ddd7f99f55dbc920c63f942804c0"
## Cached value is fresh; retrieving response from cache

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw/foo/d5d1ddd7f99f55dbc920c63f942804c0.body"

file.exists(tf2)
## [1] FALSE


# wait a while, cache is now stale
tf3 <- tempfile(fileext = ".csv")
 resp <- req |> req_perform(path = tf3)
## Found url in cache "d5d1ddd7f99f55dbc920c63f942804c0"
## Cached value is stale; checking for updates
## Cached value still ok; retrieving body from cache

toString(resp$body)
## [1] "C:\\Temp\\1\\RtmpExocAw\\file53f815c51993.csv"

file.exists(tf3)
## [1] TRUE

When the cached value is fresh, the response returns a different path, which makes sense since there is no guarantee that the path specified in the original call still exists when the request is made a second time. My questions are:

  1. Is the file downloaded once and written to both the path and cache? (Appears to be the case).
  2. Is there a way to tell the cache to use the same file extension as specified in path? I can see this possibly being an issue for some functions that expect a certain file extension.
  3. If the cache is stale, it appears to re-download the file. If it decides the cached value is still ok, it claims to retrieve the body from cache but actually provides path. It seems to do this for every subsequent request, i.e., the cache is not "refreshed" and it continues to think the cache is stale. Is this a bug?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions