
# MCQs: Databricks Security & Governance

***

**Flashcard 1**
Q: What does RBAC stand for in Databricks and what are its key components?
A: Role-Based Access Control. Key components:

- Principals (users, groups, service principals)
- Securable objects (catalogs, schemas, tables, functions, volumes)
- Privileges (SELECT, CREATE, MODIFY, etc.)
- Roles or groups that bundle privileges

***

**Flashcard 2**
Q: How does privilege inheritance work in Unity Catalog?
A: Privileges granted at higher levels (catalog) cascade down to schemas and tables. More specific grants override general ones. Explicit DENY at any level always takes precedence.

***

**Flashcard 3**
Q: Which SQL command creates a group and which one adds a user to it?
A:

- Create group: `CREATE GROUP group_name;`
- Add user: `ALTER GROUP group_name ADD USER 'user@example.com';`

***

**Flashcard 4**
Q: What’s the difference between an enforced constraint and a metadata-only constraint?
A:

- Enforced (NOT NULL, CHECK): Validated at write time by Delta Engine; violations block writes.
- Metadata-only (PRIMARY KEY, FOREIGN KEY, UNIQUE): Recorded for governance; not enforced at runtime.

***

**Flashcard 5**
Q: How do you define a CHECK constraint on a Delta table?
A:

```sql
ALTER TABLE table_name
ADD CONSTRAINT chk_name CHECK (column_name condition);
```


***

**Flashcard 6**
Q: What happens when you add a NOT NULL or CHECK constraint to a table that already has violating data?
A: The `ALTER TABLE` fails; Delta checks existing data and refuses to add the constraint until all rows satisfy it.

***

**Flashcard 7**
Q: How do you capture data lineage automatically in Unity Catalog for a SQL table creation?
A: Use `CREATE TABLE ... AS SELECT ...`; Unity Catalog records source and target tables in its lineage graph automatically.

***

**Flashcard 8**
Q: Which system table shows table-level lineage in Databricks?
A: `system.access.table_lineage`

***

**Flashcard 9**
Q: What SQL function would you use to view column-level lineage?
A: Query `system.access.column_lineage`

***

**Flashcard 10**
Q: What is the purpose of watermarking in Structured Streaming?
A: Watermarks define how late data can arrive before being considered “too late.” They bound state size by allowing old state to be cleaned up after the watermark passes.

***

**Flashcard 11**
Q: How do you set a watermark on an event-time column?
A:

```python
df.withWatermark("event_time", "10 minutes")
```


***

**Flashcard 12**
Q: What does `.dropDuplicates(["order_id","order_timestamp"])` do when used with a watermark?
A: Deduplicates streaming rows by `order_id` and `order_timestamp` within the watermark window, discarding duplicates and late events beyond the watermark.

***

**Flashcard 13**
Q: What is the difference between encryption at rest and encryption in transit?
A:

- At rest: Data is encrypted on disk (e.g., EBS, S3, Delta files)
- In transit: Data is encrypted over the network (e.g., TLS/SSL for cluster communication, JDBC)

***

**Flashcard 14**
Q: How do you enable encryption at rest for a Databricks AWS cluster’s EBS volumes?
A: Set `aws_attributes.ebs_volume_encrypted = true` and provide a KMS key via `ebs_volume_kms_key_id` in the cluster configuration.

***

**Flashcard 15**
Q: How does `foreachBatch` in Structured Streaming differ from `writeStream.table`?
A: `foreachBatch` lets you run arbitrary batch logic (e.g., merges, multi-sink writes) on each micro-batch. `writeStream.table` writes to a Delta table directly without custom logic.

***

**Flashcard 16**
Q: What does `trigger(availableNow=True)` do in a streaming query?
A: Processes all currently available data once as a micro-batch, then stops the query (backfill mode).

***

**Flashcard 17**
Q: Name two advantages of using UDFs in Databricks.
A:

- Extend built-in functionality with custom logic
- Handle specialized business rules (e.g., encryption, parsing) not covered by native functions

***

**Flashcard 18**
Q: Why is it best practice to use groups rather than individual user grants?
A: Groups simplify management and ensure consistency. New users inherit group permissions automatically without individual grant changes.

***

**Flashcard 19**
Q: How can you monitor access to sensitive PII tables?
A: Query `system.access.audit` for `SELECT` events on tables tagged as containing PII, filter by `user_identity`, `action_name`, and table name.

***

**Flashcard 20**
Q: What is Zero Trust, and how can it be implemented in Databricks?
A: Zero Trust means verify every access request. Implement by combining IP allow lists, MFA checks, role-based privileges, risk-based policies, and network isolation (VPC, private endpoints).

