# [Integration of lakeFS with Airflow](https://docs.lakefs.io/integrations/airflow.html)

## Use Case: Troubleshooting production issues

## Prerequisites

###### This Notebook requires connecting to a lakeFS Server.
###### To spin up lakeFS quickly - use the Playground (https://demo.lakefs.io) which provides lakeFS server on-demand with a single click;
###### Or, alternatively, refer to lakeFS Quickstart doc (https://docs.lakefs.io/quickstart/installing.html).

## Setup Task: Change your lakeFS credentials

In [None]:
lakefsEndPoint = '<lakeFS Endpoint URL>' # e.g. 'https://username.aws_region_name.lakefscloud.io'
lakefsAccessKey = '<lakeFS Access Key>'
lakefsSecretKey = '<lakeFS Secret Key>'

## Setup Task: You can change lakeFS repo name (it can be an existing repo or provide another repo name)

In [None]:
repo = "new-dag-repo"

## Setup Task: Versioning Information

In [None]:
sourceBranch = "main"
newBranch = "airflow_demo_new_dag"
newPath = "partitioned_data"

## Setup Task: Storage Information - Optional on Playground
#### Change the Storage Namespace to a location in the bucket you’ve configured. The storage namespace is a location in the underlying storage where data for this repository will be stored.

In [None]:
storageNamespace = 's3://<S3 Bucket Name>/' # e.g. "s3://username-lakefs-cloud/"

## Setup Task: Run additional [Setup](./Airflow/New_DAG/Setup.ipynb) tasks here

In [None]:
%run ./airflow/New_DAG/Setup.ipynb

## Create Repository - Optional on Playground or if repository exists

In [None]:
client.repositories.create_repository(
    repository_creation=models.RepositoryCreation(
        name=repo,
        storage_namespace=storageNamespace,
        default_branch=sourceBranch))

## You can review [lakeFS New DAG](./airflow/dags/lakefs_new_dag.py) program.

## Set the fileName Airflow variable. This file is used by the [lakeFS New DAG](./airflow/dags/lakefs_new_dag.py).

In [None]:
fileName = "lakefs_test.csv"
! airflow variables set fileName $fileName

## Find Airflow admin password and copy the password

In [None]:
! cat ./airflow/standalone_admin_password.txt

## Visualize [lakeFS New DAG Graph](http://127.0.0.1:8080/dags/lakefs_new_dag/graph) in Airflow UI. Login by using username admin and password received in the previous command.

## Trigger lakeFS New DAG

In [None]:
! airflow dags unpause lakefs_new_dag
! airflow dags trigger lakefs_new_dag

## Visualize [lakeFS New DAG Graph](http://127.0.0.1:8080/dags/lakefs_new_dag/graph).
### Toggle Auto Refresh switch in DAG Graph to see the continuous progress of the workflow.
### Click on any task box, then click on Log button and search for "lakeFS URL" (this URL will take you to applicable branch/commit/data file).

## Once the lakeFS New DAG finishes in around 5 minutes, you can use the latest or new file. This file has bad data, and it will cause workflow to fail.

In [None]:
fileName = "lakefs_test_latest_file.csv"
! airflow variables set fileName $fileName

## Trigger demo workflow again by using the latest file

In [None]:
! airflow dags trigger lakefs_new_dag

## Visualize [lakeFS New DAG Graph](http://127.0.0.1:8080/dags/lakefs_new_dag/graph) for the new run with the latest file.

### Task "etl_task3" will fail in this case. Click on "etl_task3" task box, then click on Log button and search for "Exception". You will notice following exception:
### "Partition column _c4 not found in schema struct<_c0:string,_c1:string,_c2:string,_c3:string>"

### This exception happens because column "_c4" (or 5th column) is missing in the latest file.

## More Questions?

###### Join the lakeFS Slack group - https://lakefs.io/slack