Operate First is a concept of bringing pre-release open source software to a production cloud environment. The Mass Open Cloud (MOC) is a production cloud resource where projects are run. Deploying Fybrik on Operate First is the first step to integrating Fybrik with Open Data Hub, making Fybrik more easily accessible to data scientists. For further questions about Operate First or contributing to Operate First GitHub repositories, join the Slack channel here
Accessing the MOC Smaug cluster
The Smaug cluster is where all user workloads are deployed. A deployment of Open Data Hub (ODH) is also managed on the Smaug cluster. This is valuable to Fybrik since this ODH Deployment includes JupyterHub.
Your GitHub username must be in this file to get access to the Fybrik user group on the Smaug cluster and login successfully. Create a PR in the operate-first/apps repository with your GitHub username added to group.yaml if you would like to be added as a user.
You can access the Smaug cluster with this OpenShift console login link. Click on operate-first to login with GitHub authentication. Once logged into the OpenShift console, you can use this link to get an oc login command with a token that will let you login to the Smaug OpenShift cluster from your terminal.
In the Operate First environment, cluster-scoped resource manifests must be added to the operate-first/apps repository to be deployed on the Smaug cluster because of security reasons. This integration/operate-first directory contains the raw YAML files of the cluster scoped resources deployed for Fybrik, mainly the custom resource definitions (CRDs)used by Fybrik. These files are generated from the Helm charts in charts/fybrik.
If the Helm chart has been updated, follow the steps below to generate the new yaml files:
- Install yq and Helm in this repo. Run these commands from the root directory of this repo:
cd hack/tools
./install_yq.sh
./install_helm.sh- Go back to the
integration/operate-firstfolder and set up the Python environment there
cd integration/operate-first
pipenv install
pipenv shell- Run Makefile to generate new YAML files from the Helm charts:
make allAfter the cluster-scoped YAML files are generated, create a PR to the operate-first/apps repository in the cluster-scope/base directory with the YAML files in the subdirectories organized by resource type. Any resource added to base has also been added to kustomization.yaml in cluster-scope/overlays. Resources will only be deployed to the Smaug cluster if they are included in this kustomization.yaml file. Namespaces must also be added to this file to be created on the Smaug cluster. We currently have 3 namespaces that anyone in the Fybrik user group can access: fybrik-system, fybrik-blueprints, and fybrik-applications. More documentation about contributing to the operate-first/apps repository can be found here.
Operate First has an ArgoCD instance deployed on MOC that can be used to deploy OpenShift resources located on a Git Repository. Only namespace-scoped resources can be deployed with ArgoCD. Any cluster-scoped resource, such as CRDs or cluster roles, will be blocked by ArgoCD. The namespace-scoped resources required for Fybrik have been onboarded to ArgoCD by following these instructions and an ArgoCD project has been created for Fybrik. You can login to the ArgoCD instance with the same login method as above. We have deployed 2 ArgoCD applications which are automatically synced with the latest release of Fybrik. The fybrik and vault ArgoCD applications deployed on the Smaug cluster are in sync with the fybrik/charts repository.
The following are the ArgoCD application manifests which have been added to the operate-first/apps repository:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: fybrik
spec:
destination:
name: smaug
namespace: fybrik-system
source:
path: charts/fybrik
repoURL: 'https://github.com/fybrik/charts'
targetRevision: HEAD
helm:
parameters:
# Disable deploying Fbrik cluster scoped resources
- name: clusterScoped
value: "false"
# Only watch for FybrikApplication from fybrik-applications
- name: applicationNamespace
value: fybrik-applications
project: fybrikapiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: vault
spec:
project: fybrik
source:
repoURL: 'https://github.com/fybrik/charts'
path: charts/vault
targetRevision: HEAD
helm:
valueFiles:
- env/dev/vault-single-cluster-values.yaml
parameters:
# authDelegator enables a cluster role binding to be attached to the service account.
# The cluster role binding is already deployed in the smaug cluster and thus authDelegator can be disabled.
- name: vault.server.authDelegator.enabled
value: 'false'
- name: vault.global.openshift
value: 'true'
- name: vault.injector.enabled
value: 'false'
- name: vault.server.dev.enabled
value: 'true'
values: |
plugins:
vaultPluginSecretsKubernetesReader:
enabled: true
clusterScope: false
namespaces:
- fybrik-applications
- fybrik-system
modulesNamespace: "fybrik-blueprints"
destination:
namespace: fybrik-system
name: smaug- Follow the steps in Fybrik notebook sample to prepare a dataset to be accessed by the notebook, register the dataset in a data catalog, and define data access policies. Make sure to use the
fybrik-applicationsnamespace instead of thefybrik-notebook-samplenamespace sincefybrik-applicationshas already been created on the Smaug cluster. - Access the JupyterHub instance deployed on the Smaug OpenShift cluster here and login with the above method.
- Start a notebook server using the Elyra Notebook Image or any image of your choosing
- Create a notebook and insert a new notebook cell with the Python code in Step 2 of Read the dataset from the notebook. Make sure to change the
assettofybrik-applications/paysim-csvinstead offybrik-notebook-sample/paysim-csv