-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for keeping local cache of recent data. #585
Conversation
@@ -1,140 +0,0 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is unused now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's create a separate cleanup PR with details on why files are not used. this change is clearly unrelated to caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is clearly unrelated to caching.
This file was being used to combine two execution plans ( remote and memory ), but now because of conditional addition of caching there are now three plans to combine. I had to refactor related code in the schema provider file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix commit message with details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the commit message with more details on why we added this and how it works.
Missing items
|
What does this mapping in the .cache.json file mean?
|
If I restart the server and change the cache size via |
@nitisht It's a mapping from store path to where the file is stored in the local cache. |
That is clear, but
|
I will do this in a separate PR. |
Let's finish in this PR |
This PR adds local cache / hot tier for Parseable. This option can be enabled by setting following env vars P_CACHE_DIR - Local Path for file cache P_CACHE_SIZE - Size for cache in human readable size ( mb/mib/gib/gb ) When these flags are set, sync flow will move the parquet from staging into the specified cache directory instead of deleting it. Any LRU cached entry is deleted to satisfy the cache constraint upon insertion. LocalCacheManager is responsible for updating and persisting this data structure.
version: "v1".to_string(), | ||
current_size: 0, | ||
capacity, | ||
files: Cache::new(100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 100 only here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backing hashlru is capacity based. The capacity is arbitrary and is increased when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who increases? user or the backing cache?
|
Description
This PR adds
local cache
/hot tier
for Parseable.This option can be enabled by setting following env vars
P_CACHE_DIR
- Local Path for file cacheP_CACHE_SIZE
- Size for cache in human readable size ( mb/mib/gib/gb )When these flags are set, sync flow will move the parquet from staging into the specified cache directory instead of deleting it. Any LRU cached entry is deleted to satisfy the cache constraint upon insertion. LocalCacheManager is responsible for updating and persisting this data structure.
This PR has: