<img src="./images/logo.svg" alt="lakeFS logo" width=300/> 

# Using multiple [Lua hooks]((https://docs.lakefs.io/hooks/)) in lakeFS (similar to GitHub Actions)

Use Cases:

1. Don't allow PII data
2. Don't allow unintended schema changes

## Config

**_If you're not using the provided lakeFS server and MinIO storage then change these values to match your environment_**

### lakeFS endpoint and credentials

In [1]:
lakefsEndPoint = 'http://lakefs:8000' # e.g. 'https://username.aws_region_name.lakefscloud.io' 
lakefsAccessKey = 'AKIAIOSFOLKFSSAMPLES'
lakefsSecretKey = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

### Object Storage

In [2]:
storageNamespace = 's3://example' # e.g. "s3://bucket"

---

## Setup

**(you shouldn't need to change anything in this section, just run it)**

In [3]:
repo_name = "schema-and-pii-validation-example"

### Create lakeFSClient

In [4]:
import lakefs_client
from lakefs_client.models import *
from lakefs_client.client import LakeFSClient

# lakeFS credentials and endpoint
configuration = lakefs_client.Configuration()
configuration.username = lakefsAccessKey
configuration.password = lakefsSecretKey
configuration.host = lakefsEndPoint

lakefs = LakeFSClient(configuration)

#### Verify lakeFS credentials by getting lakeFS version

In [5]:
print("Verifying lakeFS credentials…")
try:
    v=lakefs.config.get_config()
except:
    print("🛑 failed to get lakeFS version")
else:
    print(f"…✅lakeFS credentials verified\n\nℹ️lakeFS version {v['version_config']['version']}")

Verifying lakeFS credentials…
…✅lakeFS credentials verified

ℹ️lakeFS version0.104.0


### Define lakeFS Repository

In [6]:
from lakefs_client.exceptions import NotFoundException

try:
    repo=lakefs.repositories.get_repository(repo_name)
    print(f"Found existing repo {repo.id} using storage namespace {repo.storage_namespace}")
except NotFoundException as f:
    print(f"Repository {repo_name} does not exist, so going to try and create it now.")
    try:
        repo=lakefs.repositories.create_repository(repository_creation=RepositoryCreation(name=repo_name,
                                                                                                storage_namespace=f"{storageNamespace}/{repo_name}"))
        print(f"Created new repo {repo.id} using storage namespace {repo.storage_namespace}")
    except lakefs_client.ApiException as e:
        print(f"Error creating repo {repo_name}. Error is {e}")
        os._exit(00)
except lakefs_client.ApiException as e:
    print(f"Error getting repo {repo_name}: {e}")
    os._exit(00)

Repository schema-and-pii-validation-example does not exist, so going to try and create it now.
Created new repo schema-and-pii-validation-example using storage namespace s3://example/schema-and-pii-validation-example


### Set up Spark

In [7]:
from pyspark.sql.types import ByteType, IntegerType, LongType, StringType, StructType, StructField

In [8]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("lakeFS / Jupyter") \
        .config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
        .config("spark.hadoop.fs.s3a.endpoint", lakefsEndPoint) \
        .config("spark.hadoop.fs.s3a.path.style.access", "true") \
        .config("spark.hadoop.fs.s3a.access.key", lakefsAccessKey) \
        .config("spark.hadoop.fs.s3a.secret.key", lakefsSecretKey) \
        .config("spark.jars.packages", "io.delta:delta-core_2.12:2.3.0") \
        .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
        .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
        .config("spark.delta.logStore.class", "org.apache.spark.sql.delta.storage.S3SingleDriverLogStore") \
        .getOrCreate()
spark.sparkContext.setLogLevel("INFO")

spark

In [9]:
mainBranch = "main"
schemaValidationBranch1stAttempt = "schema_validation_branch_1st_attempt"
schemaValidationBranch2ndAttempt = "schema_validation_branch_2nd_attempt"
schemaChangeBranch = "schema_change_branch"

---

# Main demo starts here 🚦 👇🏻

## Setup and Configure Hook

### Configure hooks in the repository

* Upload [Hooks config YAML file](./LuaHooks/pre-merge-schema-validation.yaml) for schema validation to check for any schema changes before data is merged to main branch
* Hooks config file must be uploaded to "_lakefs_actions" prefix

In [10]:
hooks_config_yaml = "pre-merge-schema-and-pii-validation.yaml"
hooks_prefix = "_lakefs_actions"
with open(f'./hooks/{hooks_config_yaml}', 'rb') as f:
    lakefs.objects.upload_object(repository=repo.id, 
                                 branch=mainBranch, 
                                 path=f'{hooks_prefix}/{hooks_config_yaml}', 
                                 content=f
                                )

### Upload script

The script (`hooks/parquet_schema_change.lua`) checks for any schema changes

In [11]:
lua_script_file_names = ["parquet_schema_validator.lua", "parquet_schema_change.lua"]
lua_scripts_path = "scripts"

In [12]:
for fn in lua_script_file_names:
    with open(f'./hooks/{fn}', 'rb') as f:
        lakefs.objects.upload_object(repository=repo.id, 
                                    branch=mainBranch, 
                                    path=f'{lua_scripts_path}/{fn}', 
                                    content=f
                                    )

### Commit changes to the lakeFS repo and attach some metadata

In [13]:
lakefs.commits.commit(
    repository=repo.id,
    branch=mainBranch,
    commit_creation=CommitCreation(
        message='Added hooks config file validation scripts'))

{'committer': 'everything-bagel',
 'creation_date': 1689579956,
 'id': '37faec094cabb3c0c1b9aafdb56508b186a1b2682c9fd8e82b1915c77beddc2b',
 'message': 'Added hooks config file validation scripts',
 'meta_range_id': '',
 'metadata': {},
 'parents': ['7d3d1d36617495a0feeb8a79e513f878b96ce38a4bbd61fdd984a2250d583c5a']}

# ETL Job Starts

## Create a new branch which will be used to ingest data

In [14]:
lakefs.branches.create_branch(
    repository=repo.id, 
    branch_creation=BranchCreation(
        name=schemaValidationBranch1stAttempt, source=mainBranch))

'37faec094cabb3c0c1b9aafdb56508b186a1b2682c9fd8e82b1915c77beddc2b'

## For this demo - we'll be utilizing a dataset - [Orion Star - Sports and outdoors RDBMS dataset](https://www.kaggle.com/datasets/chethanp11/orion-star-sports-and-outdoors-rdbms-dataset) from [Kaggle](https://www.kaggle.com/).

## Define [CUSTOMER.csv](../data/samples/OrionStar/CUSTOMER.csv) data file schema

#### Notice that 1st column, "user_id" is not allowed as blocked PII columns

In [15]:
customersSchema = StructType([
  StructField("user_id", IntegerType(), False), # "user_id" is not allowed as blocked PII columns.
  StructField("Country", StringType(), False),
  StructField("Gender", StringType(), False),
  StructField("Personal_ID", IntegerType(), True),
  StructField("Customer_Name", StringType(), False),
  StructField("Customer_FirstName", StringType(), False),
  StructField("Customer_LastName", StringType(), False),
  StructField("Birth_Date", StringType(), False),
  StructField("Customer_Address", StringType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Street_Number", IntegerType(), False),
  StructField("Customer_Type_ID", IntegerType(), False)
])

## Define [ORDER_FACT.csv](../data/samples/OrionStar/ORDER_FACT.csv) data file schema

#### Notice that 1st column "user_id" is not allowed as blocked PII columns

In [16]:
ordersSchema = StructType([
  StructField("user_id", IntegerType(), False), # "user_id" is not allowed as blocked PII columns.
  StructField("Employee_ID", IntegerType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Order_Date", StringType(), False),
  StructField("Delivery_Date", StringType(), False),
  StructField("Order_ID", LongType(), True),
  StructField("Order_Type", ByteType(), False),
  StructField("Product_ID", LongType(), False),
  StructField("Quantity", ByteType(), False),
  StructField("Total_Retail_Price", StringType(), False),
  StructField("CostPrice_Per_Unit", StringType(), False),
  StructField("Discount", LongType(), False)
])

## Create Customers delta table in the new branch (using [CUSTOMER.csv](./data/samples/OrionStar/CUSTOMER.csv) file)

In [17]:
customersTablePath = f"s3a://{repo.id}/{schemaValidationBranch1stAttempt}/tables/customers"
df = spark.read.csv('/data/OrionStar/CUSTOMER.csv',header=True,schema=customersSchema)
df.write.format("delta").mode("overwrite").save(customersTablePath)
df.show(10)

+-------+-------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|user_id|Country|Gender|Personal_ID|    Customer_Name|Customer_FirstName|Customer_LastName|Birth_Date|    Customer_Address| Street_ID|Street_Number|Customer_Type_ID|
+-------+-------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|      4|     US|     M|       null|    James Kvarniq|             James|          Kvarniq| 27JUN1974|      4382 Gralyn Rd|9260106519|         4382|            1020|
|      5|     US|     F|       null|Sandrina Stephano|          Sandrina|         Stephano| 09JUL1979|    6468 Cog Hill Ct|9260114570|         6468|            2020|
|      9|     DE|     F|       null|   Cornelia Krahl|          Cornelia|            Krahl| 27FEB1974|   Kallstadterstr. 9|3940106659|            9|            2020|
|   

## Create Orders delta table in the new branch (using [ORDER_FACT.csv](./data/samples/OrionStar/ORDER_FACT.csv) file)

In [18]:
ordersTablePath = f"s3a://{repo.id}/{schemaValidationBranch1stAttempt}/tables/orders"
df = spark.read.csv('/data/OrionStar/ORDER_FACT.csv',header=True,schema=ordersSchema)
df.write.format("delta").mode("overwrite").save(ordersTablePath)
df.show(10)

+-------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|user_id|Employee_ID| Street_ID|Order_Date|Delivery_Date|  Order_ID|Order_Type|  Product_ID|Quantity|Total_Retail_Price|CostPrice_Per_Unit|Discount|
+-------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|     63|     121039|9260125492| 11JAN2003|    11JAN2003|1230058123|         1|220101300017|       1|            $16.50|             $7.45|    null|
|      5|   99999999|9260114570| 15JAN2003|    19JAN2003|1230080101|         2|230100500026|       1|           $247.50|           $109.55|    null|
|     45|   99999999|9260104847| 20JAN2003|    22JAN2003|1230106883|         2|240600100080|       1|            $28.30|             $8.55|    null|
|     41|     120174|1600101527| 28JAN2003|    28JAN2003|1230147441|         1|240600100010|       2|     

## Commit changes and attach some metadata

In [19]:
lakefs.commits.commit(
    repository=repo.id,
    branch=schemaValidationBranch1stAttempt,
    commit_creation=CommitCreation(
        message='Added customers and orders Delta tables!', 
        metadata={'using': 'python_api'}))

{'committer': 'everything-bagel',
 'creation_date': 1689580024,
 'id': '82c80ffa943caf3f6495e3a0b2d74e86b32dfa542bb0606249b28c7a141e7f3d',
 'message': 'Added customers and orders Delta tables!',
 'meta_range_id': '',
 'metadata': {'using': 'python_api'},
 'parents': ['37faec094cabb3c0c1b9aafdb56508b186a1b2682c9fd8e82b1915c77beddc2b']}

## Merge new branch to the main branch.

#### 🛑🛑 Merge will fail because Delta tables have blocked column i.e. user_id.  Review the error message.

In [20]:
lakefs.refs.merge_into_branch(
    repository=repo.id,
    source_ref=schemaValidationBranch1stAttempt, 
    destination_branch=mainBranch)

ApiException: (412)
Reason: Precondition Failed
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Request-Id': 'fb257460-4d71-41bd-a680-f8e9c6f701a4', 'Date': 'Mon, 17 Jul 2023 07:47:05 GMT', 'Content-Length': '401'})
HTTP response body: {"message":"update branch main: pre-merge hook aborted, run id '5hvqk222tk3c76sjuklg': 1 error occurred:\n\t* hook run id '0000_0000' failed on action 'pre merge checks on main branch' hook 'check_blocked_pii_columns': runtime error: [string \"lua\"]:47: Column is not allowed: 'user_id': type: INT32 in path: tables/customers/part-00000-ce44ff32-60a2-473b-b865-e54ca873885a-c000.snappy.parquet\n\n"}



The error will look like this: 
    
```
(412)
Reason: Precondition Failed
```
    
```
update branch main: pre-merge hook aborted, run id '5ir9j6ol1aas77gat540': 1 error occurred:
    * hook run id '0000_0000' failed on action 'pre merge checks on main branch' 
                                          hook 'check_blocked_pii_columns': 
    runtime error: [string \"lua\"]:47: Column is not allowed: 'user_id': type: INT32 
    in path: tables/customers/part-00000-bfe440bc-787d-4b4f-a8f8-0b0761e13536-c000.snappy.parquet
```

---


## Let's attempt to ingest data again without any PII columns

#### Create a new branch for 2nd attempt

In [21]:
lakefs.branches.create_branch(
    repository=repo.id, 
    branch_creation=BranchCreation(
        name=schemaValidationBranch2ndAttempt, source=mainBranch))

'37faec094cabb3c0c1b9aafdb56508b186a1b2682c9fd8e82b1915c77beddc2b'

## Change "user_id" column to "Customer_ID" in the schema

In [22]:
customersSchema = StructType([
  StructField("Customer_ID", IntegerType(), False), # Change "user_id" column to "Customer_ID"
  StructField("Country", StringType(), False),
  StructField("Gender", StringType(), False),
  StructField("Personal_ID", IntegerType(), True),
  StructField("Customer_Name", StringType(), False),
  StructField("Customer_FirstName", StringType(), False),
  StructField("Customer_LastName", StringType(), False),
  StructField("Birth_Date", StringType(), False),
  StructField("Customer_Address", StringType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Street_Number", IntegerType(), False),
  StructField("Customer_Type_ID", IntegerType(), False)
])

In [23]:
ordersSchema = StructType([
  StructField("Customer_ID", IntegerType(), False), # Change "user_id" column to "Customer_ID"
  StructField("Employee_ID", IntegerType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Order_Date", StringType(), False),
  StructField("Delivery_Date", StringType(), False),
  StructField("Order_ID", LongType(), True),
  StructField("Order_Type", ByteType(), False),
  StructField("Product_ID", LongType(), False),
  StructField("Quantity", ByteType(), False),
  StructField("Total_Retail_Price", StringType(), False),
  StructField("CostPrice_Per_Unit", StringType(), False),
  StructField("Discount", LongType(), False)
])

## Create Customers delta table in the new branch (using [CUSTOMER.csv](./data/samples/OrionStar/CUSTOMER.csv) file)

In [24]:
customersTablePath = f"s3a://{repo.id}/{schemaValidationBranch2ndAttempt}/tables/customers"
df = spark.read.csv('/data/OrionStar/CUSTOMER.csv',header=True,schema=customersSchema)
df.write.format("delta").mode("overwrite").save(customersTablePath)
df.show(10)

+-----------+-------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|Customer_ID|Country|Gender|Personal_ID|    Customer_Name|Customer_FirstName|Customer_LastName|Birth_Date|    Customer_Address| Street_ID|Street_Number|Customer_Type_ID|
+-----------+-------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|          4|     US|     M|       null|    James Kvarniq|             James|          Kvarniq| 27JUN1974|      4382 Gralyn Rd|9260106519|         4382|            1020|
|          5|     US|     F|       null|Sandrina Stephano|          Sandrina|         Stephano| 09JUL1979|    6468 Cog Hill Ct|9260114570|         6468|            2020|
|          9|     DE|     F|       null|   Cornelia Krahl|          Cornelia|            Krahl| 27FEB1974|   Kallstadterstr. 9|3940106659|            

## Create Orders delta table in the new branch (using [ORDER_FACT.csv](./data/samples/OrionStar/ORDER_FACT.csv) file)

In [25]:
ordersTablePath = f"s3a://{repo.id}/{schemaValidationBranch2ndAttempt}/tables/orders"
df = spark.read.csv('/data/OrionStar/ORDER_FACT.csv',header=True,schema=ordersSchema)
df.write.format("delta").mode("overwrite").save(ordersTablePath)
df.show(10)

+-----------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|Customer_ID|Employee_ID| Street_ID|Order_Date|Delivery_Date|  Order_ID|Order_Type|  Product_ID|Quantity|Total_Retail_Price|CostPrice_Per_Unit|Discount|
+-----------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|         63|     121039|9260125492| 11JAN2003|    11JAN2003|1230058123|         1|220101300017|       1|            $16.50|             $7.45|    null|
|          5|   99999999|9260114570| 15JAN2003|    19JAN2003|1230080101|         2|230100500026|       1|           $247.50|           $109.55|    null|
|         45|   99999999|9260104847| 20JAN2003|    22JAN2003|1230106883|         2|240600100080|       1|            $28.30|             $8.55|    null|
|         41|     120174|1600101527| 28JAN2003|    28JAN2003|1230147441|         1

## Commit changes and attach some metadata

In [26]:
lakefs.commits.commit(
    repository=repo.id,
    branch=schemaValidationBranch2ndAttempt,
    commit_creation=CommitCreation(
        message='Added customers and orders Delta tables without any PII columns!', 
        metadata={'using': 'python_api'}))

{'committer': 'everything-bagel',
 'creation_date': 1689580078,
 'id': '33da22dfdee479f9bf5bf0aa4360d32b1086056b9260bac64989f7cad93ba812',
 'message': 'Added customers and orders Delta tables without any PII columns!',
 'meta_range_id': '',
 'metadata': {'using': 'python_api'},
 'parents': ['37faec094cabb3c0c1b9aafdb56508b186a1b2682c9fd8e82b1915c77beddc2b']}

## Merge new branch to the main branch

#### Merge will succeed this time because there are no PII columns in the Delta tables

In [27]:
lakefs.refs.merge_into_branch(
    repository=repo.id,
    source_ref=schemaValidationBranch2ndAttempt, 
    destination_branch=mainBranch)

{'reference': 'f7b09abc3fecb329def104b8307f1f7734e42b01dc285ad77f0283daeaa93e6a'}

# Check for any schema changes next

## Create a new branch which will be used to ingest data

In [28]:
lakefs.branches.create_branch(
    repository=repo.id, 
    branch_creation=BranchCreation(
        name=schemaChangeBranch, source=mainBranch))

'f7b09abc3fecb329def104b8307f1f7734e42b01dc285ad77f0283daeaa93e6a'

## Change "Country" column to "Country_Name" in the schema

In [29]:
customersSchema = StructType([
  StructField("Customer_ID", IntegerType(), False),
  StructField("Country_Name", StringType(), False), # Column name changes from Country to Country_name
  StructField("Gender", StringType(), False),
  StructField("Personal_ID", IntegerType(), True),
  StructField("Customer_Name", StringType(), False),
  StructField("Customer_FirstName", StringType(), False),
  StructField("Customer_LastName", StringType(), False),
  StructField("Birth_Date", StringType(), False),
  StructField("Customer_Address", StringType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Street_Number", IntegerType(), False),
  StructField("Customer_Type_ID", IntegerType(), False)
])

## Change data type for column "Quantity" from ByteType to LongType

In [30]:
ordersSchema = StructType([
  StructField("Customer_ID", IntegerType(), False),
  StructField("Employee_ID", IntegerType(), False),
  StructField("Street_ID", LongType(), False),
  StructField("Order_Date", StringType(), False),
  StructField("Delivery_Date", StringType(), False),
  StructField("Order_ID", LongType(), True), 
  StructField("Order_Type", ByteType(), False),
  StructField("Product_ID", LongType(), False),
  StructField("Quantity", LongType(), False), # Data type changes from ByteType() to LongType()
  StructField("Total_Retail_Price", StringType(), False),
  StructField("CostPrice_Per_Unit", StringType(), False),
  StructField("Discount", LongType(), False)
])

## Create Customers delta table in the new branch (using [CUSTOMER.csv](./data/samples/OrionStar/CUSTOMER.csv) file)

In [31]:
customersTablePath = f"s3a://{repo.id}/{schemaChangeBranch}/tables/customers"
df = spark.read.csv('/data/OrionStar/CUSTOMER.csv',header=True,schema=customersSchema)
df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(customersTablePath)
df.show(10)

+-----------+------------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|Customer_ID|Country_Name|Gender|Personal_ID|    Customer_Name|Customer_FirstName|Customer_LastName|Birth_Date|    Customer_Address| Street_ID|Street_Number|Customer_Type_ID|
+-----------+------------+------+-----------+-----------------+------------------+-----------------+----------+--------------------+----------+-------------+----------------+
|          4|          US|     M|       null|    James Kvarniq|             James|          Kvarniq| 27JUN1974|      4382 Gralyn Rd|9260106519|         4382|            1020|
|          5|          US|     F|       null|Sandrina Stephano|          Sandrina|         Stephano| 09JUL1979|    6468 Cog Hill Ct|9260114570|         6468|            2020|
|          9|          DE|     F|       null|   Cornelia Krahl|          Cornelia|            Krahl| 27FEB1974|   Kallstadter

## Create Orders delta table in the new branch (using [ORDER_FACT.csv](./data/samples/OrionStar/ORDER_FACT.csv) file)

In [32]:
ordersTablePath = f"s3a://{repo.id}/{schemaChangeBranch}/tables/orders"
df = spark.read.csv('/data/OrionStar/ORDER_FACT.csv',header=True,schema=ordersSchema)
df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(ordersTablePath)
df.show(10)

+-----------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|Customer_ID|Employee_ID| Street_ID|Order_Date|Delivery_Date|  Order_ID|Order_Type|  Product_ID|Quantity|Total_Retail_Price|CostPrice_Per_Unit|Discount|
+-----------+-----------+----------+----------+-------------+----------+----------+------------+--------+------------------+------------------+--------+
|         63|     121039|9260125492| 11JAN2003|    11JAN2003|1230058123|         1|220101300017|       1|            $16.50|             $7.45|    null|
|          5|   99999999|9260114570| 15JAN2003|    19JAN2003|1230080101|         2|230100500026|       1|           $247.50|           $109.55|    null|
|         45|   99999999|9260104847| 20JAN2003|    22JAN2003|1230106883|         2|240600100080|       1|            $28.30|             $8.55|    null|
|         41|     120174|1600101527| 28JAN2003|    28JAN2003|1230147441|         1

## Commit changes and attach some metadata

In [33]:
lakefs.commits.commit(
    repository=repo.id,
    branch=schemaChangeBranch,
    commit_creation=CommitCreation(
        message='Added customers and orders Delta tables with schema changes!', 
        metadata={'using': 'python_api'}))

{'committer': 'everything-bagel',
 'creation_date': 1689580084,
 'id': 'af57899b3eaa5cb0378c0117bc85ea00a629b57af458c1c4068c988762bf8e48',
 'message': 'Added customers and orders Delta tables with schema changes!',
 'meta_range_id': '',
 'metadata': {'using': 'python_api'},
 'parents': ['f7b09abc3fecb329def104b8307f1f7734e42b01dc285ad77f0283daeaa93e6a']}

## Merge new branch to the main branch

#### Merge will fail because schema changed. Review the error message.

In [34]:
lakefs.refs.merge_into_branch(
    repository=repo.id,
    source_ref=schemaChangeBranch, 
    destination_branch=mainBranch)

ApiException: (412)
Reason: Precondition Failed
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Request-Id': '6380ac28-6936-431c-9ed9-35517d446952', 'Date': 'Mon, 17 Jul 2023 07:48:04 GMT', 'Content-Length': '666'})
HTTP response body: {"message":"update branch main: pre-merge hook aborted, run id '5hvqjj22tk3c76sjulf0': 1 error occurred:\n\t* hook run id '0000_0001' failed on action 'pre merge checks on main branch' hook 'check_schema_changes': runtime error: [string \"lua\"]:109: Schema changed for 'tables/customers/part-00000-f67b5e71-7a9a-4c84-b66b-45738a1ef779-c000.snappy.parquet'. Column name changed. Original column name was 'Country' and new column name is 'Country_Name'. Schema changed for 'tables/orders/part-00000-e1fda77b-6436-4bc7-bc10-b56be703c365-c000.snappy.parquet'. Data type for column 'Quantity' changed. Original data type was 'INT32' and new data type is 'INT64'. \n\n"}



Error will look like this: 
    
```
(412)
Reason: Precondition Failed
```

```
update branch main: pre-merge hook aborted, run id '5ir9htgl1aas77gat5eg': 1 error occurred:
    * hook run id '0000_0001' failed on action 'pre merge checks on main branch' hook 'check_schema_changes': 
runtime error: [string \"lua\"]:109: 
    Schema changed for 'tables/customers/part-00000-dbbb3d72-f911-4bdf-ad8f-b6f073be5df1-c000.snappy.parquet'. 
    Column name changed. Original column name was 'Country' and new column name is 'Country_Name'. 
    
    Schema changed for 'tables/orders/part-00000-6020f550-c445-4e01-b72b-91d542850b5e-c000.snappy.parquet'. 
    Data type for column 'Quantity' changed. Original data type was 'INT32' and new data type is 'INT64'
```

## You can also review all Actions in lakeFS UI

👉🏻 http://localhost:8000/repositories/schema-and-pii-validation-example/actions

![Actions UI](./images/LuaHooks/Actions.png)

## Click on any Run ID to review Action details in lakeFS UI

#### Click on "pre merge checks on main branch" Action on left panel. Expand multiple sections on right panel to see logs and error messages.

![Action Details UI](./images/LuaHooks/ActionDetails.png)

## More Questions?

###### Join the lakeFS Slack group - https://lakefs.io/slack