Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_regions sends down CMAs that don't have census data #50

Closed
mountainMath opened this issue Aug 15, 2017 · 9 comments
Closed

list_regions sends down CMAs that don't have census data #50

mountainMath opened this issue Aug 15, 2017 · 9 comments

Comments

@mountainMath
Copy link
Owner

I noticed that the list_regions call sends down data for some CMAs that don't actually have census data attached to them. This is fixed on the server now, will take a day for the server side cache to expire. So we should all make sure we refresh the cached regions in 24 hours.

@dshkol
Copy link
Collaborator

dshkol commented Aug 15, 2017

Does it make sense to add a function to wipe caches? Or is that taken care by forcing the load function to redownload the data. Kind of a power user need.

@atheriel
Copy link
Collaborator

The existing use_cache = FALSE parameter will do just that. I think I mentioned it in the documentation, too.

@atheriel
Copy link
Collaborator

(So I think we can close this.)

@mountainMath
Copy link
Owner Author

I am starting to think that we need to strike a balance between caching and making sure data is up-to-date. For example, on the list_datasets call the user may not find out about new datasets that we might add. Even if it is slow-changing, that might cause problems. Also, the list_vectors will likely undergo changes as I clean up the server side data.

I see three reasons we do caching:

  1. to avoid unnecessarily spending API points (the reason for the limit on API points is point 2)
  2. to reduce server load
  3. to allow people to run and refine their analysis offline

Point 1. only applies to the load_data calls, and I think letting the user decided when to reload the data is good. For the other calls it would be just fine if we only make calls, say, once a day. That the user doesn't have to worry about refinements to the vector data. Or we use the cache-expiry headers from the http call to determine how long they stay fresh, that way we can up that time server side at a later stage when the calls become stable.

@atheriel
Copy link
Collaborator

One way to do this would be to cache the caching information from the server (e.g. the ETag) along with the object, and then send an If-Changed-Since request to the server. I believe this is what browsers do, but I'd have to think about how to do it in R.

@atheriel
Copy link
Collaborator

Alternatively, we could store the cache timestamp and force and an update if it gets too old.

@dshkol
Copy link
Collaborator

dshkol commented Aug 15, 2017

That could work. You could prompt the user with a note that their cached data is old and give them the option to reload, but that might be overkill.

@atheriel
Copy link
Collaborator

Actually, it's likely a simple warning would suffice, and be the least intrusive.

@atheriel
Copy link
Collaborator

This should be closed now that #54 is merged. If we want to have more discussion of cache invalidation, it should be in a separate issue I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants