## 🧪 Databricks Constraints – Hands-On Mini-Lab

This exercise helps students understand enforced and informational constraints in Databricks Delta tables. We'll:

- Explore NOT NULL and CHECK (enforced) constraints
- Show primary key (PK) and foreign key (FK) informational constraints (not enforced)
- Demonstrate what happens when you add a constraint to a table with existing data


### 1. Create a Delta Table Without Constraints

```sql
CREATE OR REPLACE TABLE employees (
  id INT,
  name STRING,
  salary INT,
  dept STRING
);
```


### 2. Insert Sample Data, Including Some Problematic Rows

```sql
INSERT INTO employees VALUES (1, 'Alice', 100000, 'HR');
INSERT INTO employees VALUES (2, NULL, 90000, 'IT');    -- NULL name (should trigger NOT NULL constraint)
INSERT INTO employees VALUES (3, 'Bob', -5000, 'Sales'); -- Negative salary (should trigger CHECK constraint)
INSERT INTO employees VALUES (4, 'Carol', 80000, NULL);
INSERT INTO employees VALUES (4, 'Eve', 70000, 'HR');    -- Duplicate id for PK test
```


### 3. Add a NOT NULL Constraint

Try making the `name` column NOT NULL **after data already exists**:

```sql
ALTER TABLE employees ALTER COLUMN name SET NOT NULL;
```

- **Question:** What error do you get? What does this tell you about NOT NULL enforcement and existing data?


### 4. Add a CHECK Constraint

Try adding a constraint that salary >= 0:

```sql
ALTER TABLE employees ADD CONSTRAINT chk_salary_nonnegative CHECK (salary >= 0);
```

- **Question:** Does this command succeed or fail, and why?


### 5. Try Adding a NOT NULL or CHECK Constraint That Can Succeed

Clean up the data so all names are NOT NULL and salaries are >= 0, then try again:

```sql
DELETE FROM employees WHERE name IS NULL OR salary < 0;

ALTER TABLE employees ALTER COLUMN name SET NOT NULL;
ALTER TABLE employees ADD CONSTRAINT chk_salary_nonnegative CHECK (salary >= 0);
```

- **Question:** What happens now? Can you insert new rows breaking the constraint?


### 6. Test Constraint Enforcement by Inserting Invalid Data

Try these:

```sql
INSERT INTO employees VALUES (5, NULL, 85000, 'Finance');      -- will NOT work (name NULL)
INSERT INTO employees VALUES (6, 'Dave', -100, 'IT');          -- will NOT work (salary < 0)
```

- **Question:** What error messages do you see?


### 7. Add Informational PRIMARY KEY and FOREIGN KEY Constraints

(Add these after cleaning for earlier constraints.)

```sql
ALTER TABLE employees ADD CONSTRAINT emp_pk PRIMARY KEY (id);
-- Suppose you have another table
to_departments(id INT, name STRING);
ALTER TABLE employees ADD CONSTRAINT emp_dept_fk FOREIGN KEY (dept) REFERENCES to_departments(name);
```

- **Bonus:** Insert another `id = 1` into employees. Does Databricks stop you? Why or why not?


### 8. Explore Constraints Metadata

Check how Databricks shows constraints:

```sql
DESCRIBE DETAIL employees;
SHOW TBLPROPERTIES employees;
```

- **Question:** Which constraints are marked as "enforced"? Which as "not enforced"?


## Reflection/Discussion Questions

- What happens if the data in a table violates a constraint you try to add?
- Are NOT NULL and CHECK enforced for existing and new data?
- Are PRIMARY KEY and FOREIGN KEY constraints enforced by Databricks?
- Why is documenting intention still valuable even if PK/FK aren't enforced?

---

## 🗝️ Answer/Instructor Key

**Step 3:** Adding `NOT NULL` to `name` fails if existing NULLs are present. Databricks checks all current data before adding the constraint and raises an error if any rows already violate it ().[^2][^3][^7]

**Step 4:** Adding a CHECK constraint (salary >= 0) fails if any existing row has a negative salary ().[^3][^7][^2]

**Step 5:** If you remove bad data first, you can successfully add `NOT NULL` and `CHECK` constraints. Afterward, Databricks will block any inserts/updates violating these constraints.

**Step 6:** Inserting a NULL into `name` or a negative salary after the constraint is added causes the write to fail—the transaction errors out with a clear constraint violation message.

**Step 7:** **PRIMARY KEY and FOREIGN KEY constraints are informational only on Databricks** ():[^1][^4][^6][^2]

- PK: You can still insert duplicate `id` values. Databricks does NOT stop you from violating this "constraint."
- FK: You can still insert `dept` values not present in the referenced table. These constraints are not enforced—they just declare intent for tools/optimizers.

**Step 8:** `DESCRIBE DETAIL` and `SHOW TBLPROPERTIES` will show NOT NULL and CHECK constraints as "enforced." PK/FK will appear as "NOT ENFORCED" or "INFORMATIONAL."

### Key Learning:

- **NOT NULL** and **CHECK**: Enforced at write time for all data (old and new).
- **PK/FK**: Metadata/documentation only, not enforced on data.
- **Databricks won’t let you add an enforced constraint if existing data violates it.**

***
