In [None]:
1. Core Concepts and Architecture (Fundamentals)

These questions test the candidate's understanding of what UC is and where it sits in the architecture.

1. What is Unity Catalog (UC)?

* Explain the primary goal and value proposition of Unity Catalog compared to the legacy Hive Metastore.

2. The Three-Level Namespace:

* Describe the three-level namespace structure introduced by UC. How is a table referenced in a UC-enabled environment?

3. Metastore and Cloud Storage:

* How does the UC Metastore interact with cloud storage (e.g., ADLS, S3, GCS)? Where is the actual data stored, and what does the Metastore manage?

* What is the Managed Storage location, and what is its purpose?

4. **Security Model:**

* Explain the UC security model. Which objects can you grant permissions on, and what is the concept of a "securable object"?

5.Data Isolation:

* How does UC fundamentally improve data isolation and prevent configuration drift compared to workspace-level access controls?

In [1]:
Explain the primary goal and value proposition of Unity Catalog compared to the legacy Hive Metastore.
* The primary goal and value proposition of Unity Catalog (UC) compared to the legacy Hive Metastore are centered around achieving 
centralized governance, simplified security, and improved data quality across an organization's entire Databricks environment.

SyntaxError: unterminated string literal (detected at line 3) (797090281.py, line 3)

* Describe the three-level namespace structure introduced by UC. How is a table referenced in a UC-enabled environment?
1. The three-level namespace provides a hierarchical and unified way to organize and reference data objects across all Databricks workspaces 
linked to a single UC Metastore.  
The Structure The three levels are:  
**Catalog (Highest Level):** Represents the primary unit of data organization, often aligned with software environments (dev, staging, prod), business units, or major projects.
Catalogs are the first unit of isolation for data and permissions.

2. Schema (or Database) (Middle Level):Contained within a Catalog.Used for logical grouping of data assets, typically by data quality stage 
(e.g., bronze, silver, gold) or by specific application domains.Table or View (Lowest Level):The actual data object (Delta table, view, or 
volume) contained within a Schema.

3. How a Table is ReferencedIn a UC-enabled environment, all tables must be referenced using their fully qualified name:$$\text{catalog.schema.table}$$Example:
To reference a customer table in the production environment's gold layer, the name would be:prod.gold.customer_dim

**Objects You Can Grant Permissions On
The permissions are managed across the three-level namespace hierarchy and other related objects:**

* Metastore: The root object. Grants here apply across all Catalogs (e.g., CREATE CATALOG).
* Catalog: The primary unit of isolation. Grants here apply to all contained schemas and tables (e.g., CREATE SCHEMA, USE CATALOG).
* Schema (or Database): Contains tables and views. Grants here apply to all contained data assets (e.g., CREATE TABLE, USE SCHEMA).
* Table/View: The data assets themselves (e.g., SELECT, MODIFY, DELETE).
* Volumes: For non-tabular data access (e.g., READ VOLUME, WRITE VOLUME).
* External Locations/Storage Credentials: Control over mounting and accessing external cloud storage paths.

* Data Isolation

In [None]:
1. Centralized Identity and Policy (Data Isolation)
The fundamental improvement is the shift in the security boundary:

* Legacy Model (Workspace-Level): Access controls were tied to the individual workspace and often relied on DBFS mounts 
(which used cloud credentials). A user might have different permissions or roles in Workspace A versus Workspace B, and 
data access was granted at the infrastructure level (the mount point).

* Unity Catalog Model (Account-Level): UC establishes one central Metastore for your entire Databricks account.

* Identity: Access is managed through central Account-Level Users and Groups.

* Policy: Security policies (the GRANT statements) are defined once on the Catalog or Schema object and are enforced uniformly across every workspace that connects to that Metastore. This means if a user is granted SELECT access on prod.gold.table_a, that permission is valid regardless of which workspace they log into.

2. Eliminating Configuration Drift
"Configuration drift" refers to the state where the security settings, data schemas, or mount points become inconsistent across different 
workspaces. UC prevents this by:

Three-Level Namespace Enforcement: UC requires all data to be accessed using the fully qualified name (catalog.schema.table). This structure formalizes the logical organization of data, making it impossible for different workspaces to reference the "same" data path under different names or security contexts.

Decoupling Storage from Compute: In the legacy model, permissions were often managed via DBFS mounts, which mixed compute configuration 
with security settings. UC eliminates this complexity by introducing External Locations and Storage Credentials.

UC acts as the single source of truth for data access. It holds the secure credentials necessary to access the cloud storage.

Since workspaces cannot use direct cloud paths (like s3://...) or old DBFS mounts to access UC-managed data, they are
forced to adhere to the central policies defined in the Metastore, preventing local overrides or unmanaged backdoor access paths.

In summary, by shifting the control plane from the individual workspace to the central account Metastore, Unity Catalog enforces consistency and uniformity, making the entire data environment auditable and eliminating the "snowflake" problem of differing security settings between workspaces.

2. **Administration and Setup (Configuration)**

In [None]:
These questions focus on the practical deployment and configuration of UC.

**Admin Roles:**

* What is the difference between the Metastore Admin and the Account Admin in the context of Unity Catalog?

**External Locations and Credentials:**

* What are External Locations and Storage Credentials? Why are they mandatory for external tables in UC, and what security principle do they enforce?
External Locations and Storage Credentials
1. What are External Locations?

An External Location is a named, securable object in Unity Catalog that maps a specific, well-defined path in your cloud storage 
(e.g., an S3 bucket or an ADLS container) to a human-readable identifier.

Role: It acts as the "root" directory that governs where users and processes can access data outside of UC's managed storage.

2. What are Storage Credentials?

Storage Credentials are also securable objects in Unity Catalog that encapsulate the long-lived, sensitive authentication information needed to access the cloud storage path defined by the External Location.

* Role: These credentials typically take the form of an AWS IAM Role, an Azure Service Principal, or a similar cloud entity that provides the necessary read/write permissions to the underlying storage.

Why They are Mandatory for External Tables in UC
External Locations and Storage Credentials are mandatory for creating External Tables in Unity Catalog because they enforce the principle of Secure Storage Delegation.

* Mandatory: When you define an External Table, you must reference an existing External Location, which in turn relies on a Storage Credential.

The Problem They Solve: Without them, Databricks would either need to rely on the cluster's broad instance profile (legacy, insecure) or require users to manually input sensitive cloud credentials (unmanageable, insecure).

* Security Principle Enforced
The primary security principle enforced by this mechanism is Separation of Duties and Least Privilege Access (or Credential Passthrough):

Separation of Duties: The Metastore Admin creates the secure Storage Credentials. A different user (the Data Engineer) then creates the External Location referencing those credentials. A third user (the Data Analyst) is only granted CREATE TABLE permissions on the External Location, never seeing the underlying sensitive credential.

* Least Privilege Access: When a user queries an External Table, UC uses the associated, securely managed Storage Credential (IAM Role/Service Principal) to generate temporary, short-lived tokens that grant the requesting cluster just enough permission to read/write the necessary files. This prevents users and processes from having persistent, broad access to the entire cloud storage bucket.
**Catalogs and Schemas:**

* When designing a new data platform, how would you decide whether to create a new Catalog versus a new Schema (within an existing Catalog)? Give examples (e.g., environment separation, business domains).

**Cluster Access Modes:**

* What are the two primary Unity Catalog cluster access modes, and which one is strictly required for production workloads utilizing UC (i.e., Single User or Shared)?

**Data Sharing:**

* Explain Delta Sharing and its relationship to Unity Catalog. How does UC enable external data sharing without replicating the data?

Delta Sharing and Unity Catalog
* Delta Sharing is an open protocol developed by Databricks that enables secure, real-time sharing of data stored in Delta Lake tables 
with external organizations or platforms. It is designed to overcome the limitations of traditional data sharing methods 
(like FTP, replication, or proprietary APIs).

* Relationship with Unity Catalog (UC)
Unity Catalog is the governance layer that provides the necessary control plane for Delta Sharing to function. UC manages the metadata, 
identity, and audit logs required to make sharing secure and simple:

* Identity and Security: UC registers the external Recipients (the organizations or entities receiving the data). It issues secure, 
time-bound tokens or uses cloud identity mechanisms to authenticate the recipient.

* Centralized Definition: UC manages the definition of the Shares. A Share is a logical grouping of tables (datasets) that you intend to 
share.

* Auditing and Lineage: All data access requests made by external recipients are logged and audited within Unity Catalog, leveraging its comprehensive lineage and activity tracking capabilities.

How UC Enables Sharing Without Replicating Data
* Delta Sharing's greatest value proposition is its ability to share data in place without replication. 
This is achieved through the following steps, all governed by UC:

* Direct Cloud Access: When a recipient requests data (e.g., runs a SQL query), the Delta Sharing server (managed by Databricks/UC) 
securely authenticates the request.

* Metadata Exchange: The server sends the recipient only the metadataâ€”specifically, the secure, pre-signed, temporary URLs pointing directly to the underlying Parquet/Delta files in the provider's cloud storage.

* Client-Side Read: The recipient's system (whether it's a Databricks environment, a competitive cloud platform, or an open-source client 
like Pandas) uses these temporary URLs to read the data files directly from the provider's cloud storage.

* No Copying: Since the recipient reads directly from the source files, there is no need to copy, export, or move the data, saving storage costs, reducing latency, and ensuring the recipient always sees the most up-to-date version of the Delta table.

* In summary, Unity Catalog acts as the central authorization and control authority that dictates who can access what data via the open 
Delta Sharing protocol, making data sharing governed, simple, and zero-copy.