Ensure that you have the following:
- Helm 3.3 or greater must be installed and configured on your machine.
- Kubectl 1.18 or newer must be installed on your machine.
- Access to a Kubernetes cluster such as Kind as a cluster administrator.
Install trino and minio using the following commands.
cd trino-iceberg-minio/
docker-compose up -d
cd ..
Then, create a bucket with name iceberg
in minio using these instructions
Fybrik Quick Start (v1.1), without the section of Install modules
.
kubectl apply -f trino-module.yaml -n fybrik-system
kubectl create namespace fybrik-notebook-sample
kubectl config set-context --current --namespace=fybrik-notebook-sample
Replace the values of endpoint
, bucket
, and object_key
in sample_asset/asset-iceberg
.yaml file according to your created asset. Then, add the asset to the internal catalog using the following command:
kubectl apply -f sample_assets/asset-iceberg.yaml -n fybrik-notebook-sample
The asset has been marked as a finance
data and the columns a
and d
have been marked with PII
tag.
Replace the values for access_key
and secret_key
in sample_asset/secret-iceberg.yaml
file with the values from the object storage service that you used and run:
kubectl apply -f sample_assets/secret-iceberg.yaml -n fybrik-notebook-sample
Register a policy. The example policy removes columns tagged as PII
from datasets marked as finance
.
kubectl -n fybrik-system create configmap sample-policy --from-file=sample_assets/sample-policy.rego
kubectl -n fybrik-system label configmap sample-policy openpolicyagent.org/policy=rego
while [[ $(kubectl get cm sample-policy -n fybrik-system -o 'jsonpath={.metadata.annotations.openpolicyagent\.org/policy-status}') != '{"status":"ok"}' ]]; do echo "waiting for policy to be applied" && sleep 5; done
kubectl apply -f fybrikapplication.yaml
Run the following command to wait Wait for the fybrik module:
while [[ ($(kubectl get fybrikapplication my-notebook -o 'jsonpath={.status.ready}') != "true") || ($(kubectl get jobs my-notebook-fybrik-notebook-sample-trino-module -n fybrik-blueprints -o 'jsonpath={.status.conditions[0].type}') != "Complete") ]]; do echo "waiting for FybrikApplication" && sleep 5; done
The module runs a python code that registers the asset in trino and applies the policy to create a virtual dataset. The user can use the following username to connect to trino:
"name": "user1"
For example, you can run trino docker container and run queries. First, check the docker container name of trino (the docker container with the image trinodb/trino:latest
). Then, Run the following command to run trino server:
docker ps | grep trinodb/trino:latest
docker container exec -it <trino_container_name> trino --user user1
Check the tables that user1
can see. It should be only the view1
:
show tables from iceberg.icebergtrino;
You can run a query to select from the created view. It should return only allowed columns according to the policies:
select * from iceberg.icebergtrino.view1;
In the output we see only columns (b, c) but not (a, d) because they have a PII
tag.
You can login into trino as admin
user using the following command (after exiting from trino container):
docker container exec -it <trino_container_name> trino --user admin
The admin user can see the original table which is logs
table:
show tables from iceberg.icebergtrino;
The command show tables
should return the original table logs
and the created view view1
.
You can run a query to select from logs
table. It should return all the columns.
select * from iceberg.icebergtrino.logs;
In the output we should see columns (a, b, c, d).
When you're finished experimenting with a sample, you can clean up as follows.
- Deleting the view using
DROP
commandsdrop view iceberg.icebergtrino.view1;
. - Deleting the iceberg table must be done by
admin
user:docker container exec -it <trino_container_name> trino --user admin drop table iceberg.icebergtrino.logs;
- Clean the docker containers:
cd trino-iceberg-minio/ docker-compose down cd ..
- Delete the
fybrik-notebook-sample
namespace:kubectl delete namespace fybrik-notebook-sample
- Delete the policy created in the
fybrik-system
namespace:NS="fybrik-system"; kubectl -n $NS get configmap | awk '/sample/{print $1}' | xargs kubectl delete -n $NS configmap