Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish Docker images for the GC tool #8055

Merged
merged 7 commits into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/docker-sync/regsync.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ defaults:
retry: 15m
parallel: 2
sync:
# Server
- source: ghcr.io/projectnessie/nessie-unstable
target: quay.io/projectnessie/nessie-unstable
type: repository
Expand All @@ -35,3 +36,24 @@ sync:
- source: ghcr.io/projectnessie/nessie
target: docker.io/projectnessie/nessie
type: repository
# GC Tool
- source: ghcr.io/projectnessie/nessie-gc-unstable
target: quay.io/projectnessie/nessie-gc-unstable
type: repository
tags:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit the tags here. We needed those for Nessie to not mirror super-old versions.
Since you're already touching this file - can you change the regex for Nessie server above to something like 0[.][789]\\d[.].*|[123][.].* - to include everything like 0.7[0-9].... and newer - and 1.xxx as well.

allow:
- "latest.*"
- "0[.][567]\\d+[.].*"
- source: ghcr.io/projectnessie/nessie-gc-unstable
target: docker.io/projectnessie/nessie-gc-unstable
type: repository
tags:
allow:
- "latest.*"
- "0[.][567]\\d+[.].*"
- source: ghcr.io/projectnessie/nessie-gc
target: quay.io/projectnessie/nessie-gc
type: repository
- source: ghcr.io/projectnessie/nessie-gc
target: docker.io/projectnessie/nessie-gc
type: repository
33 changes: 32 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,7 @@ jobs:
DOCKER_VERSION="${VERSION%-SNAPSHOT}"
echo "DOCKER_VERSION=${DOCKER_VERSION}" >> ${GITHUB_ENV}
echo "DOCKER_IMAGE=localhost:5000/nessie-testing" >> ${GITHUB_ENV}
echo "DOCKER_GC_IMAGE=localhost:5000/nessie-gc-testing" >> ${GITHUB_ENV}

- name: Prepare Gradle build cache
uses: ./.github/actions/ci-incr-build-cache-prepare
Expand All @@ -625,7 +626,16 @@ jobs:
-a "${ARTIFACTS}" \
-g ":nessie-quarkus" \
-p "servers/quarkus-server" \
-d "Dockerfile-server" \
${DOCKER_IMAGE}

tools/dockerbuild/build-push-images.sh \
-a "${ARTIFACTS}" \
-g ":nessie-gc-tool" \
-p "gc/gc-tool" \
-d "Dockerfile-gctool" \
${DOCKER_GC_IMAGE}

rm -rf "${ARTIFACTS}"

- name: Cleanup buildx
Expand All @@ -639,6 +649,10 @@ jobs:
docker pull ${DOCKER_IMAGE}:latest-java
docker pull ${DOCKER_IMAGE}:${DOCKER_VERSION}
docker pull ${DOCKER_IMAGE}:${DOCKER_VERSION}-java
docker pull ${DOCKER_GC_IMAGE}:latest
docker pull ${DOCKER_GC_IMAGE}:latest-java
docker pull ${DOCKER_GC_IMAGE}:${DOCKER_VERSION}
docker pull ${DOCKER_GC_IMAGE}:${DOCKER_VERSION}-java
cat <<! >> $GITHUB_STEP_SUMMARY
## Docker images

Expand All @@ -647,7 +661,7 @@ jobs:
\`\`\`
!

- name: Check if Docker Java image works
- name: Check if Server Docker Java image works
run: |
docker run --detach --name nessie ${DOCKER_IMAGE}:latest-java
echo "Let Nessie Java Docker image run for one minute (to make sure it starts up fine)..."
Expand All @@ -672,6 +686,23 @@ jobs:
docker stop nessie
docker rm nessie

- name: Check if GC Tool Docker Java image works
run: |
if docker run --rm --name nessie-gc ${DOCKER_GC_IMAGE}:latest-java --help | grep -q "Usage: nessie-gc"; then
echo "## GC Tool Java Docker image smoke test: PASSED" >> $GITHUB_STEP_SUMMARY
echo "GC Tool Java Docker image smoke test: PASSED"
else
echo "GC Tool Java Docker image smoke test: FAILED" > /dev/stderr
cat <<! >> $GITHUB_STEP_SUMMARY
## GC Tool Java Docker image FAILED

\`\`\`
$(docker logs nessie-gc)
\`\`\`
!
exit 1
fi

nesqueit:
name: CI NesQuEIT
runs-on: ubuntu-22.04
Expand Down
9 changes: 9 additions & 0 deletions .github/workflows/release-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,12 +134,19 @@ jobs:
-a "${ARTIFACTS}" \
-g ":nessie-quarkus" \
-p "servers/quarkus-server" \
-d "Dockerfile-server" \
ghcr.io/projectnessie/nessie

# Add version to the openapi file name
cp api/model/build/generated/openapi/META-INF/openapi/openapi.yaml api/model/build/nessie-openapi-${RELEASE_VERSION}.yaml

cp gc/gc-tool/build/executable/nessie-gc gc/gc-tool/build/executable/nessie-gc-${RELEASE_VERSION}
tools/dockerbuild/build-push-images.sh \
-a "${ARTIFACTS}" \
-g ":nessie-gc-tool" \
-p "gc/gc-tool" \
-d "Dockerfile-gctool" \
ghcr.io/projectnessie/nessie-gc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snazy This would end up as a new repo in Quay and DockerHub as well. Do we need to create those repos upfront? Do we need to change regsync settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regsync.yml file needs to be changed, yes.

I think we don't need to create repos upfront. Let's see whether the snapshot-publish job has something to complain - I guess not.


echo "QUARKUS_UBER_JAR=${ARTIFACTS}/nessie-quarkus-${RELEASE_VERSION}-runner.jar" >> ${GITHUB_ENV}
echo "CLI_UBER_JAR=${ARTIFACTS}/nessie-quarkus-cli-${RELEASE_VERSION}-runner.jar" >> ${GITHUB_ENV}
Expand Down Expand Up @@ -219,6 +226,8 @@ jobs:
(\`chmod 744 nessie-gc-${RELEASE_VERSION}\` after download.)
Can also be run using \`java -jar nessie-gc-${RELEASE_VERSION}\`, because it is actually a Java archive.
Shell completion can be generated from the \`nessie-gc\` tool.
Nessie GC tool is also available as Docker image:
adutra marked this conversation as resolved.
Show resolved Hide resolved
\`docker run --rm ghcr.io/projectnessie/nessie-gc:${RELEASE_VERSION} --help\`.

The attached [\`nessie-helm-${RELEASE_VERSION}.tgz\`](${Q_HELM_CHART_URL}) is a packaged Helm chart, which can be downloaded and installed via Helm.
There is also the [Nessie Helm chart repo](https://charts.projectnessie.org/), which can be added and used to install the Nessie Helm chart.
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/snapshot-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,11 @@ jobs:
-g ":nessie-quarkus" \
-p "servers/quarkus-server" \
ghcr.io/projectnessie/nessie-unstable
tools/dockerbuild/build-push-images.sh \
-a "${ARTIFACTS}" \
-g ":nessie-gc-tool" \
-p "gc/gc-tool" \
-d "Dockerfile-gctool" \
ghcr.io/projectnessie/nessie-gc-unstable

rm -rf "${ARTIFACTS}"
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ as necessary. Empty sections will not end in the release notes.

### Highlights

* The Nessie GC tool is now published as a Docker image. See the [GC Tool documentation
page](https://projectnessie.org/features/gc) for more.

### Upgrade notes

### Breaking changes
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ build a docker image for testing purposes, simply run the following command:

```shell
./gradlew :nessie-quarkus:clean :nessie-quarkus:quarkusBuild
docker build -f ./tools/dockerbuild/docker/Dockerfile-jvm -t nessie-unstable:latest ./servers/quarkus-server
docker build -f ./tools/dockerbuild/docker/Dockerfile-server -t nessie-unstable:latest ./servers/quarkus-server
```

Check that your image is available locally:
Expand Down
8 changes: 1 addition & 7 deletions gc/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
# Nessie GC

See [here](../site/docs/features/gc-internals.md).


```shell
docker run --rm -e POSTGRES_USER=pguser -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=nessie_gc -p 5432:5432 postgres:14
gc/gc-tool/build/executable/nessie-gc create-sql-schema --jdbc-url jdbc:postgresql://127.0.0.1:5432/nessie_gc --jdbc-user pguser --jdbc-password mysecretpassword
```
See [here](../site/docs/features/gc).
2 changes: 1 addition & 1 deletion site/docs/features/_config
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ arrange:
- best-practices.md
- transactions.md
- management.md
- gc-internals.md
- gc.md
- security.md
- metadata_authorization.md
193 changes: 190 additions & 3 deletions site/docs/features/gc-internals.md → site/docs/features/gc.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,196 @@
# Nessie GC

Nessie GC is a tool to clean up orphaned files in a Nessie repository. It is designed to be run
periodically to keep the repository clean and to avoid unnecessary storage costs.

## Requirements

The Nessie GC tool is a standalone executable, but requires Java 11 or later to be available on the
host where it is running.

The Nessie GC tool requires a running Nessie server and a JDBC-compliant database. The Nessie server
must be reachable from the host where the GC tool is running. The JDBC-compliant database must also
be reachable from the host where the GC tool is running. The database is used to store the live
content sets and the deferred deletes.

!!! note
Although the GC tool can run in in-memory mode, it is recommended to use a persistent database
for production use. Any JDBC compliant database can be used, but it must be created and the
schema initialized before running the Nessie GC tool.

## Running locally

The Nessie GC tool can be downloaded from the [GitHub
Releases](https://github.com/projectnessie/nessie/releases) page, for example:

```shell
curl -L -o nessie-gc https://github.com/projectnessie/nessie/releases/download/nessie-0.76.6/nessie-gc-0.76.6
chmod +x nessie-gc
```

To see the available commands and options, run:

```shell
./nessie-gc --help
```

You should see the following output:

```text
Usage: nessie-gc [-hV] [COMMAND]
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Commands:
help Display help information about the specified
command.
mark-live, identify, mark Run identify-live-content phase of Nessie GC,
must not be used with the in-memory
contents-storage.
sweep, expire Run expire-files + delete-orphan-files phase
of Nessie GC using a live-contents-set
stored by a previous run of the mark-live
command, must not be used with the in-memory
contents-storage.
gc Run identify-live-content and expire-files +
delete-orphan-files.
list List existing live-sets, must not be used with
the in-memory contents-storage.
delete Delete a live-set, must not be used with the
in-memory contents-storage.
list-deferred List files collected as deferred deletes, must
not be used with the in-memory
contents-storage.
deferred-deletes Delete files collected as deferred deletes,
must not be used with the in-memory
contents-storage.
show Show information of a live-content-set, must
not be used with the in-memory
contents-storage.
show-sql-create-schema-script Print DDL statements to create the schema.
create-sql-schema JDBC schema creation.
completion-script Extracts the command-line completion script.
```

The following example assumes that you have a Nessie server running at `http://localhost:19120` and
a PostgreSQL instance running at `jdbc:postgresql://localhost:5432/nessie_gc` with user `pguser` and
password `mysecretpassword`.

Create the database schema if required:

```shell
./nessie-gc create-sql-schema \
--jdbc-url jdbc:postgresql://localhost:5432/nessie_gc \
--jdbc-user pguser \
--jdbc-password mysecretpassword
```

Now we can run the Nessie GC tool:

```shell
./nessie-gc gc \
--uri http://localhost:19120/api/v2 \
--jdbc \
--jdbc-url jdbc:postgresql://localhost:5432/nessie_gc \
--jdbc-user pguser \
--jdbc-password mysecretpassword
```

## Running with Docker

The tool is also available as a Docker image, hosted on [GitHub Container Registry]. Images are also
mirrored to [Docker Hub] and [Quay.io].

[GitHub Container Registry]: https://ghcr.io/projectnessie/nessie-gc
[Docker Hub]: https://hub.docker.com/r/projectnessie/nessie-gc
[Quay.io]: https://quay.io/repository/projectnessie/nessie-gc

See [Docker](../try/docker.md) for more information.

For testing purposes, let's create a JDBC datastore as follows:

```shell
docker run --rm -e POSTGRES_USER=pguser -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_DB=nessie_gc -p 5432:5432 postgres:16.2
```

Create the database schema if required:

```shell
docker run --rm ghcr.io/projectnessie/nessie-gc:latest create-sql-schema \
--jdbc-url jdbc:postgresql://127.0.0.1:5432/nessie_gc \
--jdbc-user pguser \
--jdbc-password mysecretpassword
```

Now we can run the Nessie GC tool:

```shell
docker run --rm ghcr.io/projectnessie/nessie-gc:latest gc \
--jdbc-url jdbc:postgresql://127.0.0.1:5432/nessie_gc \
--jdbc-user pguser \
--jdbc-password mysecretpassword
```

The GC tool has a great number of options, which can be seen by running `docker run --rm
ghcr.io/projectnessie/nessie-gc:latest --help`. The main command is `gc`, which is followed by
subcommands and options. Check the available subcommands and options by running `docker run --rm
ghcr.io/projectnessie/nessie-gc:latest gc --help`.

## Running with Kubernetes

The Nessie GC tool can be executed as a Job or a CronJob in a Kubernetes cluster.

The following example assumes that you have a Nessie deployment and a PostgreSQL instance, all
running in the same cluster and in the same namespace.

Create a secret for the database credentials:

```shell
kubectl create secret generic nessie-gc-credentials \
--from-literal=JDBC_URL=jdbc:postgresql://postgresql:5432/nessie_gc \
--from-literal=JDBC_USER=pguser \
--from-literal=JDBC_PASSWORD=mysecretpassword
```

Assuming that the Nessie service is reachable at `nessie:19120`, create the following Kubernetes job
to run the GC tool:

```shell
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: nessie-gc-job
spec:
template:
spec:
containers:
- name: nessie-gc
image: ghcr.io/projectnessie/nessie-gc
args:
- gc
- --uri
- http://nessie:19120/api/v2
- --jdbc
- --jdbc-url
- "\$(JDBC_URL)"
- --jdbc-user
- "\$(JDBC_USER)"
- --jdbc-password
- "\$(JDBC_PASSWORD)"
envFrom:
- secretRef:
name: nessie-gc-credentials
restartPolicy: Never
EOF
```

# Nessie GC Internals

_aka Nessie-aware delete-orphan-files_
The rest of this document describes the internals of the Nessie GC tool and is intended for
developers who want to understand how the tool works.

Consists of a `gc-base` module, which contains the general base functionality to access a
repository to identify the live contents, to identify the live files, to list the existing files
The GC tool consists of a `gc-base` module, which contains the general base functionality to access
a repository to identify the live contents, to identify the live files, to list the existing files
and to purge orphan files.

Modules that supplement the `gc-base` module:
Expand Down
2 changes: 1 addition & 1 deletion site/docs/features/management.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ downloaded from the [release page on GitHub](https://github.com/projectnessie/ne
recommended as the storage for the live-content-sets.

!!! info
Information about the internals of Nessie GC can be found [here](./gc-internals.md).
Information about Nessie GC can be found [here](./gc).

## Nessie GC tool

Expand Down
2 changes: 1 addition & 1 deletion site/docs/guides/minikube.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ Then, build the Docker image and install it in your Minikube node as follows:

```bash
eval $(minikube docker-env)
docker build -f ./tools/dockerbuild/docker/Dockerfile-jvm -t nessie-test:latest ./servers/quarkus-server
docker build -f ./tools/dockerbuild/docker/Dockerfile-server -t nessie-test:latest ./servers/quarkus-server
```

By running `eval $(minikube docker-env)`, you are setting the Docker environment variables to point
Expand Down