# Integration of lakeFS with Dagster

## Use Case: Isolating Dagster job run and atomic promotion to production

## Prerequisites

###### This Notebook requires connecting to a lakeFS Server.
###### To spin up lakeFS quickly - use the Playground (https://demo.lakefs.io) which provides lakeFS server on-demand with a single click;
###### Or, alternatively, refer to lakeFS Quickstart doc (https://docs.lakefs.io/quickstart/installing.html).

## Setup Task: Change your lakeFS credentials

In [None]:
lakefsEndPoint = '<lakeFS Endpoint URL>' # e.g. 'https://username.aws_region_name.lakefscloud.io'
lakefsAccessKey = '<lakeFS Access Key>'
lakefsSecretKey = '<lakeFS Secret Key>'

## Setup Task: You can change lakeFS repo name (it can be an existing repo or provide another repo name)

In [None]:
repo = "dagster-existing-dag-repo"

## Setup Task: Versioning Information

In [None]:
sourceBranch = "main"
newBranch = "dagster_demo_existing_dag"

## Setup Task: Storage Information
#### Change the Storage Namespace to a location in the bucket you’ve configured. The storage namespace is a location in the underlying storage where data for this repository will be stored.

In [None]:
storageNamespace = 's3://<S3 Bucket Name>/' # e.g. "s3://username-lakefs-cloud/"

## Setup Task: Import Python packages

In [None]:
from dagster import execute_job, RunConfig
from jobs.Existing_DAG.lakefs_wrapper_dag import lakefs_wrapper_dag, LakeFSOpConfig

## Setup Task: Set environment variables

In [None]:
import os
os.environ["LAKEFS_ENDPOINT"] = lakefsEndPoint
os.environ["LAKEFS_CREDENTIALS_ACCESS_KEY_ID"] = lakefsAccessKey
os.environ["LAKEFS_CREDENTIALS_SECRET_ACCESS_KEY"] = lakefsSecretKey

## Working with the lakeFS Python client API

###### Note: To learn more about lakeFS Python integration visit https://docs.lakefs.io/integrations/python.html

In [None]:
%xmode Minimal
if not 'client' in locals():
    import lakefs_client
    from lakefs_client import models
    from lakefs_client.client import LakeFSClient

    # lakeFS credentials and endpoint
    configuration = lakefs_client.Configuration()
    configuration.username = lakefsAccessKey
    configuration.password = lakefsSecretKey
    configuration.host = lakefsEndPoint

    client = LakeFSClient(configuration)
    print("Created lakeFS client.")

## Create Repository - Optional if repository exists

In [None]:
client.repositories.create_repository(
    repository_creation=models.RepositoryCreation(
        name=repo,
        storage_namespace=storageNamespace,
        default_branch=sourceBranch))

## You can review [lakeFS Wrapper DAG](./jobs/Existing_DAG/lakefs_wrapper_dag.py) and [Dagster ETL DAG](./jobs/Existing_DAG/lakefs_tutorial_taskflow_api_etl.py) programs.

## Execute lakeFS Wrapper DAG

In [None]:
job_result = lakefs_wrapper_dag.execute_in_process(
    run_config=RunConfig(
        {
            "create_etl_branch": LakeFSOpConfig(repo=repo, sourceBranch=sourceBranch, newBranch=newBranch),
            "trigger_existing_dag": LakeFSOpConfig(repo=repo, sourceBranch=sourceBranch, newBranch=newBranch),
            "commit_etl_branch": LakeFSOpConfig(repo=repo, sourceBranch=sourceBranch, newBranch=newBranch),
            "merge_etl_branch": LakeFSOpConfig(repo=repo, sourceBranch=sourceBranch, newBranch=newBranch),
        }
    )
)

## More Questions?

###### Join the lakeFS Slack group - https://lakefs.io/slack