Kurtosis package indexer is a backend services searching for Kurtosis packages in GitHub and storing them in memory. Right now it is consumed by Kurtosis Frontend to power Kurtosis Packages Catalog.
The service simply runs a job periodically to search for all Kurtosis Packages currently existing on GitHub.
- The background job runs every two hours. Results are stored in memory for now. I.e. restarting the service will re-run the job
- It searches for
kurtosis.yml
files on GitHub. It then checks thekurtosis.yml
file can be parsed, and there is a validmain.star
file next to it. Any folder not matching those criteria will be discarded
The searches run on GitHub need to be authenticated. There are two ways Kurtosis Package Indexer will authenticate itself
on GitHub.
Right now, the indexer first tries reading the GITHUB_USER_TOKEN
environment variable and if it's empty, it falls back
to the S3 bucket option.
This is the simplest. The indexer expects a valid GitHub token stored inside the environment variable GITHUB_USER_TOKEN
.
The indexer can also get the GitHub token from a file stored inside an S3 bucket.
The file storing the GitHub token should be named github-user-token.txt
and it should contain only the GitHub token
on as plain text.
To access this file, the indexer will require the following environment variables to be set:
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
for AWS authenticationAWS_BUCKET_REGION
andAWS_BUCKET_NAME
to identify the AWS S3 bucket. The user linked to the key above needs to
be able to doGetObject
on this bucketAWS_BUCKET_FOLDER
(optional) in case the filegithub-user-token.txt
is located inside a folder in this S3 bucket
The indexer consume some Kurtosis public metrics, just package run counts for now, in order to provide this information to indexer clients like the package catalog.
Snowflake is the Kurtosis metrics storage at the moment, and the indexer is using the [Go Snowflake client] gosnowflake to execute queries on it.
It's necessary to validate a user before executing any query on this storage, we are created a new service account and a new role for this purpose, you can access into the Kurtosis Snowflake account to get this information.
The indexer will require the following environment variables to be set:
KURTOSIS_SNOWFLAKE_ACCOUNT_IDENTIFIER
for identify the Kurtosis SF account using this format.KURTOSIS_SNOWFLAKE_DB
to specify the metrics db nameKURTOSIS_SNOWFLAKE_USER
the Kurtosis backend service account userKURTOSIS_SNOWFLAKE_PASSWORD
the Kurtosis backend service account passwordKURTOSIS_SNOWFLAKE_ROLE
the specific role to get access to the public metricsKURTOSIS_SNOWFLAKE_WAREHOUSE
the metrics warehouse name
The Kurtosis packages information are stored by default in-memory. Everytime the indexer is restarted, it re-runs the GitHub searches to fetch the latest information about the packages on GitHub.
There's also the option of persisting the data to a bolt key value store, so that
services can be restarted keeping the data intact. To use it, the environment variable BOLT_DATABASE_FILE_PATH
can
be set to point to a file on disk that bolt will use to store the data. If the indexer is being run in a container, a
persistent volume should be used to fully benefit from this feature.
Ultimately, to make the indexer fully stateless, data can also be stored in an external
ETCD key value store. Once the ETCD cluster is up and running, the indexer can be started with the
environment variable ETCD_DATABASE_URLS
set to the list of ETCD nodes URLs separated by a comma:
http://etcd.node.1:2379,http://etcd.node.2:2379,http://etcd.node.3:2379
.
The bolt db and the etcd db implementations were deprecated because these were not used in production so, we decided to deprecate them in order to simplify code maintenance.
The following arguments that can be passed to the package:
{
// Set to false if devving locally or in CI, this will not setup metrics reporting
// If set to true, snowflake fields must be set
"is_running_in_prod": "false",
// Token to authenticate github
// If empty, aws info will be used to retrieve token
"github_user_token": "",
// Optionally, a custom version of the indexer image can be used. Useful to run a dev version, like on CI
// If empty, will build a local image based on repo code
"kurtosis_package_indexer_version": "0.0.32",
// Snowflake fields for setting up metrics reporting if running in production
"snowflake_env": {
"kurtosis_snowflake_account_identifier": "<KURTOSIS_SNOWFLAKE_ACCOUNT_IDENTIFIER>",
"kurtosis_snowflake_db": "<KURTOSIS_SNOWFLAKE_DB>",
"kurtosis_snowflake_password": "<KURTOSIS_SNOWFLAKE_PASSWORD>",
"kurtosis_snowflake_role": "<KURTOSIS_SNOWFLAKE_ROLE>",
"kurtosis_snowflake_user": "<KURTOSIS_SNOWFLAKE_USER>",
"kurtosis_snowflake_warehouse": "<KURTOSIS_SNOWFLAKE_WAREHOUSE>"
},
// If it is expected that the service will get the Github user token from an S3 bucket, set aws fields
// `aws_bucket_user_folder` can remain empty if the file containing the token is at the root of the bucket
"aws_env": {
"aws_access_key_id": "<AWS_KEY_ID_TO_AUTHENTICATE>",
"aws_secret_access_key": "<AWS_SECRET_ACCESS_KEY_TO_AUTHENTICATE>",
"aws_bucket_region": "<AWS_BUCKET_REGION>",
"aws_bucket_name": "<AWS_BUCKET_NAME>",
"aws_bucket_user_folder": "<OPTIONAL_FOLDER_IN_AWS_BUCKET>"
}
}
Note that when running this package on Kurtosis cloud, the package will naturally use the AWS environment variable automatically provided to the package to fetch the GitHub token inside AWS S3.