-
Notifications
You must be signed in to change notification settings - Fork 127
internal/function: use cache for expensive language operations #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
internal/function/language.go
Outdated
enry "gopkg.in/src-d/enry.v1" | ||
"gopkg.in/src-d/go-mysql-server.v0/sql" | ||
) | ||
|
||
const defaultLanguageCacheSize = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we are trying to let apps adjust cache sizes (and pass them down to packages like go-git), so maybe in this case we can also think about how easily we can modify this cache in the future (at least size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this should be configurable, right @ajnavarro? Environment variable or CLI flag that passes down to here? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, it should be configurable. it should be a env var o cli flag, yep.
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Added configuration to the cache size via env var |
docs/using-gitbase/configuration.md
Outdated
@@ -12,6 +12,7 @@ | |||
| `GITBASE_INDEX_DIR` | directory to save indexes, default `/var/lib/gitbase/index` | | |||
| `GITBASE_TRACE` | enable jaeger tracing, default disabled | | |||
| `GITBASE_READONLY` | allow read queries only, disabling creating and deleting indexes, default disabled | | |||
| `GITBASE_LANGUAGE_CACHE_SIZE` | size of the cache for the `language` UDF. The size is the maximum number of elements kept in the cache | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
docs/using-gitbase/configuration.md
Outdated
@@ -12,6 +12,7 @@ | |||
| `GITBASE_INDEX_DIR` | directory to save indexes, default `/var/lib/gitbase/index` | | |||
| `GITBASE_TRACE` | enable jaeger tracing, default disabled | | |||
| `GITBASE_READONLY` | allow read queries only, disabling creating and deleting indexes, default disabled | | |||
| `GITBASE_LANGUAGE_CACHE_SIZE` | size of the cache for the `language` UDF. The size is the maximum number of elements kept in the cache | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add the default value here?
enry "gopkg.in/src-d/enry.v1" | ||
"gopkg.in/src-d/go-mysql-server.v0/sql" | ||
) | ||
|
||
const ( | ||
languageCacheSizeKey = "GITBASE_LANGUAGE_CACHE_SIZE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add that env variable as we are doing with others? check https://github.com/src-d/gitbase/blob/master/cmd/gitbase/command/server.go#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add it there we will have to pass it all the way down until the infinite. Besides, there are other examples of environment variables such as blob max size that are not there but where they're used
Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
Added default value to docs |
Closes #442
This adds a two queue lru cache to language UDF. Why a 2Q cache instead of a regular LRU? Because we do not only want the most recent but the most frequent as well to maximize the cache usage.
Since running enry on a file_path only is just marginally slower than using the cache, I decided to not use the cache unless the blob_content is being provided (we could use two caches, though) to not pollute the cache and use it only for the really expensive operations: reading stuff from blobs.
Note that the cache is per gitbase instance.
Results are pretty much the ones @ajnavarro describes in the issue.
Before:
Now: