Creating tables in a Lakehouse (such as in Microsoft Fabric) involves a few steps, depending on whether you’re creating them from scratch, from files, or from existing datasets.

# Step 1 – Open or Create a Lakehouse

1. Go to Microsoft Fabric Portal → Sign in with your Microsoft 365 account.
2. On the left menu, click Workspaces → Open your desired workspace.
3. Click New → Select Lakehouse (you’ll find it under Data Engineering).

Give it a name → Click Create.

- Once it opens, you’ll see two main panels:

- Tables – for structured tables stored in Delta format.

# Step 2 – Understand the Table Types

In a Lakehouse, you can have:

Managed Tables

- Stored in the Tables section.

- Backed by Delta Lake (ACID transactions).

- Can be queried directly via SQL endpoint.

External Tables

- Reference data in the Files folder (or external OneLake/ADLS location).

- Useful for large raw datasets without moving them.

# Step 3 – Ways to Create a Table

### A. Create from the UI (File or Blank Table)

In the Lakehouse, click New Table → Choose From file or Blank Table.

From File:

- Select a file from:

    - Local computer
     
    - OneLake
     
    - Other cloud storage (via shortcut)

- Supported formats: CSV, TSV, Parquet, JSON, Delta.

- Choose:

    - Table name
    
    - Delimiter (for CSV/TSV)
    
    - Whether the first row is headers

    - Schema mapping (data types)

- Click Create.

Blank Table:

- Define table name.

- Add columns:

    - Name
     
    - Data type (STRING, INT, DATE, etc.)
    
    - Nullable or not

- Click Create.

### B. Create via SQL Endpoint

- In the Lakehouse, click SQL Endpoint (top right).
- 
- Run SQL DDL commands:

CREATE TABLE SalesData (
    SaleID INT,
    ProductName STRING,
    Quantity INT,
    SaleDate DATE
)
USING DELTA;


- USING DELTA ensures ACID and versioning support.

- This creates a managed table in the Tables folder.

### C. Create via Notebook (PySpark / SparkSQL)

1. Click New Notebook in the Lakehouse.

2. Run:

###### Sample data
data = [(1, "Laptop", 5, "2025-08-15"),
        (2, "Mouse", 10, "2025-08-16")]
columns = ["SaleID", "ProductName", "Quantity", "SaleDate"]

###### Create Spark DataFrame
df = spark.createDataFrame(data, columns)

###### Save as table
df.write.format("delta").saveAsTable("SalesData")

### Or using Spark SQL:

CREATE TABLE SalesData

USING DELTA

AS SELECT 1 AS SaleID, 'Laptop' AS ProductName, 5 AS Quantity, DATE('2025-08-15') AS SaleDate;


# Step 4 – Verify the Table

- In the Tables pane → Your new table will appear.

- Click it → View Schema and Sample Data.

- You can query it in:

    - SQL Endpoint

    - Notebooks

    -  Power BI (direct Lakehouse connection)

# Step 5 – Best Practices

- Use Delta format for all analytical workloads.

- Keep schema consistent — mismatches will cause query failures.

- For ingestion automation, use:

    - Dataflows Gen2

    - Pipelines

    - Notebook scripts

- Avoid storing raw files directly in Tables folder; use the Files area for staging.

# Step 6 – Example Full Workflow

1. Create Lakehouse → Upload a CSV (sales.csv) into Files.

2. Open Notebook → Load CSV as DataFrame:

df = spark.read.format("csv").option("header", "true").load("Files/sales.csv")

3. Write it as a managed table:

df.write.format("delta").saveAsTable("SalesData")

4. Switch to SQL Endpoint:

SELECT * FROM SalesData WHERE Quantity > 5;

