Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File-based cache #41

Closed
UmanShahzad opened this issue May 19, 2021 · 5 comments · Fixed by #59
Closed

File-based cache #41

UmanShahzad opened this issue May 19, 2021 · 5 comments · Fixed by #59
Milestone

Comments

@UmanShahzad
Copy link
Contributor

UmanShahzad commented May 19, 2021

We should implement a file-based cache with a relatively low TTL so that users can save on their quota for repeatedly asking for the same data, especially in batch operations.

The TTL and such details should be configurable if possible.

The cache should be ignorable by a flag on any command that uses the cache.

Some nested cache management subcommands should exist to e.g. clear the cache.

@UmanShahzad UmanShahzad added this to the v3 milestone May 27, 2021
@eacp
Copy link
Contributor

eacp commented Sep 29, 2021

Would you like the cache to be stored at a specific location on disk?

Would you like to create a single file per ip (like 1.1.1.1.txt or 1.1.1.1.json) or a single cache file for ALL?

What can of serialization would you like for on-disk storage? JSON? Binary (gob)? Plaintext?

@UmanShahzad
Copy link
Contributor Author

It would most likely be an implementation of the cache interface from the IPinfo Go SDK using BoltDB or similar as a single file database, and supporting TTLs & LRU eviction.

How do you support TTLs+LRU on a file-based cache?

TTL: lazy deletion; each entry has an expiry date attached to it, and when the entry is retrieved and it is determined that the expiry has passed, it is deleted and a cache miss is assumed.

LRU: when the file reaches some configured limit, a job is run which scans through the database and deletes the X oldest entries, e.g. X=1000, or however many are needed to reach back down to e.g. 50% capacity.

As an additional help, on every Xth (e.g. X=10) use of the CLI, a job is run before the CLI does its main work, which goes through the entire database, even if it's not full, and evicts expired entries. This amortizes the slowdown of a full database scan before the database gets full and the scan is too noticeable.


Just an FYI that this is not easy to implement well, so if any contribution is given it'd go under tight scrutiny and may or may not be accepted. We wanna get this one done really well.

@UmanShahzad
Copy link
Contributor Author

UmanShahzad commented Sep 29, 2021

Another much simpler strategy, which wouldn't be too bad given the size of disks these days and how unnoticeable caches are to people (I'm shocked whenever I look at the size of cache folders that the apps I use get away with), we could just say: keep filling & using the cache until it's full (say 2GB max), then dump it and start again; also dump it forcefully every 24 hours as a "global" TTL.

@eacp
Copy link
Contributor

eacp commented Sep 29, 2021

@UmanShahzad I believe the second approach would be easier to implement. We just have to decide how to get the appropiate folder depending on the plattform and write json files to it. The lib can already parse JSON

@UmanShahzad
Copy link
Contributor Author

@eacp We shouldn't write JSON files, because it would require a full re-write. Some people have done that but it's a terrible idea past like 100MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants