Step-by-step, detailed guide on how to create a Data Factory pipeline in Microsoft Fabric — including all the prerequisites, components, and configurations. I'll also include an example use case to make it clear.

# 1) Prerequisites & setup

1. Fabric workspace & role

You need a Fabric workspace (trial or capacity) and permissions to create/edit items (Admin, Member, or Contributor typically). See Fabric workspace roles/permissions. 

2. Connections (replace “linked services”)

In Fabric, connections are managed centrally under Settings → Manage connections and gateways. You’ll create reusable connections here (SQL, ADLS Gen2, Blob, Salesforce, etc.).

3. Credentials & secret storage

Connections support auth types like OAuth2 and Service Principal. Fabric also supports Azure Key Vault–stored secrets for connection authentication (preview).

4. On-premises / private network sources

If your source is on-prem or in a private network, set up On-premises data gateway or Virtual network data gateway.

# 2) Create your first pipeline

1. In the Fabric portal, switch to the Data Factory experience.

2. New → Data pipeline. Give it a descriptive name (e.g., pl_ingest_sales_daily).

3. You’ll land on the pipeline canvas (Activities panel on the left; Properties pane below).

Microsoft’s “Create your first pipeline” quickstart shows the exact UI and where things live

# 3) Add a Copy activity (fastest: Copy Assistant)

Most pipelines start by moving data. The Copy Assistant wizard builds a Copy activity for you:

1. On the canvas, select Copy data → Use copy assistant. 

2. Source

    Pick your source type, then choose an existing connection or Create new connection. Select the objects (tables/files). 

3. Destination

    Choose your destination (e.g., Lakehouse, Data Warehouse, ADLS, Blob, etc.). Map to a table or file/folder path. 


4. Review → Finish to drop the activity on the canvas. You can fine-tune in the activity tabs (Source, Destination, Mapping, Settings). 

Key Copy settings to know (Settings tab):

    Throughput (aka intelligent throughput / DIUs), parallelism, fault tolerance, staging, logging, compression. Fabric documents each knob and the JSON equivalents. 


Gateway rule: if you copy between two on-prem data stores, both ends must use the same gateway (otherwise stage in the cloud in two hops)

# 4) Transform data (optional)

After (or before) Copy, add transformation activities:

    Notebook (Spark on Fabric) for Python/Scala data prep. 


    Dataflow Gen2 for low-code transformations. (Listed under Data transformation activities.) 

    SQL Script / Stored Procedure to push work down to SQL engines. 


Add these from Activities → Data transformation; configure in the properties pane. See Fabric’s Activity overview for what’s available (including HDInsight, Spark Job Definition).

# 5) Orchestrate with control flow

Fabric has a rich set of control activities (very similar to ADF):

    - If Condition, Switch, ForEach, Until, Wait, Set/Append Variable, Get Metadata, Lookup, Web/Webhook, Invoke pipeline, etc. 

Typical orchestration pattern:

    - Lookup list of tables/files → ForEach over the list → inside loop Copy (and branch If on row counts), finally Invoke pipeline to modularize big workflows.

# 6) Parameterization & dynamic content (must-have)

Define pipeline parameters (top of the canvas) and use them across activities (e.g., file paths, table names).

In activity fields, choose Add dynamic content to reference @pipeline().parameters.MyParam, variables, or system values.

For Copy to Workspace items (Lakehouse/Data Warehouse/KQL DB), Fabric shows how to pass the object ID via parameters (handy for promoting between workspaces/environments). 
Microsoft Learn

#### Pro tip – common expressions:

Build a dated folder: 
    - @concat('/landing/sales/date=', formatDateTime(utcNow(),'yyyy-MM-dd'), '/')

Use Lookup result: 
    - @activity('GetTables').output.value

Use ForEach item: 
    - @item()

# 7) Scheduling vs. event-based triggers
### A) Wall-clock schedules (every N minutes/hours/days/weeks)

In the pipeline editor, select Home → Schedule and define frequency, start/end dates, and time zone. (You can add multiple schedules per pipeline in the latest experience; schedule definitions live with the pipeline.) 

The Pipeline runs concept article shows the Schedule button and on-demand runs. 

Note: Historically Fabric required an end date. If you want “run forever,” set an end date far in the future. Check your tenant’s current experience and docs for exact behavior. 

### B) Event triggers (file-driven, job events, workspace events)

Fabric now supports storage/event triggers wired through Real-Time Intelligence (formerly Data Activator) and eventstreams.

In your pipeline, select Home → Trigger to open Set alert and pick OneLake or Azure Blob events, filter by folder/file patterns, then create the trigger (a Reflex item is created in your workspace). 

Using event data inside your pipeline:
Built-in trigger parameters provide file/folder from the event. Example:

@pipeline()?.TriggerEvent?.FileName
Use the Trigger parameters tab in Expression Builder to insert them. (The ?. handles nulls when you manually run without an event.)


# 8) Run & monitor

Manual run: Home → Run (Fabric prompts to Save and run). Watch the Output pane for activity progress. 

### Monitor:

Open the pipeline and select Run history (or go to Pipeline runs in Data Factory docs). You’ll see Succeeded/Failed, duration, rows read/written, throughput, etc. Drill into Activity run details for errors and metrics.

# 9) Worked example (copy Azure SQL → Lakehouse, then transform)

1. Connections

Create two connections: Azure SQL DB (Service Principal/OAuth2) and Lakehouse (Workspace) under Manage connections and gateways. Test both.

2. Pipeline

New → Data pipeline → Copy Assistant.

Source: Azure SQL Database → select Sales.SalesOrderHeader & Sales.SalesOrderDetail. 

Destination: Lakehouse → Tables → create tables sales_order_header and sales_order_detail. Review mappings. 

Settings:

Start with Intelligent throughput = Auto and Parallelism left default; enable Fault tolerance if you expect bad rows. Consider Staging only for tricky cross-region or large loads. 

3. Transform (optional)

Add a Notebook activity after Copy to clean/enrich data (e.g., dedupe, cast types), writing to a curated Lakehouse table. 

4. Orchestrate

Wrap the two Copy activities in a ForEach driven by a parameterized table list, or use Invoke pipeline to keep ingestion modular. 

5. Parameterize

Add pipeline parameters like Environment, LandingPath, LakehouseId. Use Add dynamic content in Copy Source/Destination to compose paths and pass the Lakehouse object ID. 

6. Schedule or event

Schedule daily at 06:00 with your time zone. Or create an event trigger for file arrival to kick off the pipeline. 

7. Run & monitor

Save and run, check Output; open Run details to confirm rows read/written and throughput.


# 10) Performance & reliability tips

- Throughput & parallelism: Start with Auto throughput; raise only if bottlenecked and the source/destination can handle it. 

- Partitioning & pushdown: For large SQL sources, use Query/Stored procedure and partition predicates; for files, use folder partitioning. 

- Fault tolerance & logging: Enable skip incompatible rows and logging for messy files. 

- Gateway constraints: One copy activity can only use one on-prem gateway—stage if you need to bridge two different gateways. 

- Security: Prefer Service principal/managed identities and Key Vault-stored secrets (preview) over user creds.

- Item limits: By default up to 120 activities per pipeline (includes inner activities). If you’re approaching that, break into child pipelines and Invoke pipeline.

# 11) Where to find “what is possible” at a glance

- Activity overview (all movement, transform, control flow activities + general settings like timeout/retry/secure input or output). 

- Copy activity (assistant, mappings, settings, parameters). 

- Schedule pipelines and Pipeline runs (on-demand vs. scheduled). 

- Event triggers (OneLake/Azure Blob, Reflex integration, trigger parameters). 
 
- Monitoring runs (UI walkthrough). 

- Connections & gateways (central management).

Create a full, detailed blueprint for creating a Data Factory pipeline in Microsoft Fabric with a real-world example — including step-by-step UI walkthrough, parameters, triggers, and a visual workflow diagram.

Let's assume this example scenario for clarity:

Use Case:
Ingest Sales data (CSV) from Azure Data Lake Storage Gen2 into a Microsoft Fabric Lakehouse every day at 2:00 AM.

After copying, run a PySpark notebook to transform data and create an aggregated table.

# 12. Prerequisites

Before building the pipeline:

✅ Microsoft Fabric Workspace → Create or select an existing workspace.

✅ Enable Data Factory → Go to Admin Portal → Capacity Settings → Enable Data Factory.

✅ Prepare Source & Destination:

    - Source: CSV files in Azure Data Lake Gen2.

    - Destination: Fabric Lakehouse.

✅ Set Up Connections:

    - Go to Settings → Manage connections.

Create a connection for:

    - Azure Data Lake Gen2 → Use OAuth2 or Service Principal.

    - Lakehouse → Select your workspace lakehouse.

# 13. Create a New Data Pipeline

Go to Microsoft Fabric → Data Factory experience.

- Click + New → Data pipeline.
 
- Name it: pl_ingest_sales_data.

- You’ll land on the pipeline canvas.

# 14. Add Activities to the Pipeline
### Step 14.1: Copy Data Activity

We’ll use the Copy Assistant for simplicity.

1. Click + → Copy data → Use Copy Assistant.

2. Select Source:

    - Type: Azure Data Lake Gen2.

    - Select your connection.

    - Choose the container & folder where CSV files are stored.

3. Select Destination:

    - Type: Fabric Lakehouse.

    - Choose your Lakehouse connection.

    - Select Tables and name it Sales_Raw.

4. Mapping:

    - Auto-map columns (you can manually map if needed).

5. Settings:

    - Enable Skip incompatible rows.

    - Enable Fault tolerance logging to capture rejected records.

6. Click Finish — your Copy Data activity will appear on the canvas.

### Step 14.2: Add a Notebook Activity (Transformation)

We’ll run a PySpark notebook to clean & aggregate data.

1. Drag Notebook activity from Activities → Data Transformation.

2. Configure:

    - Notebook path → Select your Fabric notebook (e.g., Sales_Aggregation.ipynb).
    
    - Parameters → Pass the Lakehouse table name as a parameter.

3. Connect the Copy Data activity → Notebook activity using a success dependency.


# 15. Parameterize the Pipeline

Make your pipeline dynamic:

1. Go to Pipeline → Parameters.

2. Add:

    - p_input_path → /sales/raw/
    
    - p_output_table → Sales_Raw

3. In Copy Data Source → File path:
    Use:
        @pipeline().parameters.p_input_path

4. In Notebook Activity:0
    Pass: 
        @pipeline().parameters.p_output_table


# 16. Add a Trigger
### Option 1: Scheduled Trigger (Daily)

1. Go to Pipeline → Add Trigger → New Trigger.

2. Name: Daily_Sales_Trigger.

3. Type: Schedule.

4. Start Time: 02:00 AM.

5. Recurrence: Every 1 Day.

### Option 2: Event Trigger (File Arrival)

- Trigger pipeline when a new CSV arrives in Azure Data Lake:

    - Choose Event-based trigger.

    - Select your Azure Data Lake connection.

    - Set folder path to /sales/raw/.

    - The pipeline runs automatically when a new file is dropped.

# 17. Validate and Publish

1. Click Validate → Fix any warnings or errors.

2. Click Publish all to make your pipeline live.

# 18. Monitor the Pipeline

1. Go to the Monitor tab in Data Factory.

2. You can see:

- Success / Failed runs.

- Execution duration.

- Rows read / written.

3. Drill into activity logs for troubleshooting.

# 19. Visual Workflow Diagram

Here's the high-level flow:

          ┌────────────────────┐
          │   Azure Data Lake  │
          │   (CSV Source)     │
          └────────┬───────────┘
                   │
           Copy Data Activity
                   │
                   ▼
          ┌────────────────────┐
          │   Fabric Lakehouse │
          │   (Raw Table)      │
          └────────┬───────────┘
                   │
          Notebook Activity
    (Data Cleaning & Aggregation)
                   │
                   ▼
          ┌────────────────────┐
          │ Lakehouse (Curated)│
          │ Sales_Aggregated   │
          └────────────────────┘

   Trigger → Daily at 2:00 AM / Event-driven


# 20. Best Practices

Use parameters for folder paths, table names, and environment configs.

Store secrets in Azure Key Vault instead of hardcoding credentials.

Enable retry policies for all critical activities.

Use parallelism for large datasets (Copy settings → Degree of Copy).

Always monitor pipelines after publishing to catch issues early.

# Summary Table
| **Step** | **Action**                     | **Outcome**                  |
| -------- | ------------------------------ | ---------------------------- |
| **1**    | Set up connections             | Ready to access data         |
| **2**    | Create pipeline                | Workspace pipeline created   |
| **3**    | Add Copy + Notebook activities | Ingest + transform data      |
| **4**    | Add parameters                 | Pipeline becomes reusable    |
| **5**    | Add trigger                    | Automates ingestion          |
| **6**    | Validate & publish             | Deploys to Fabric            |
| **7**    | Monitor                        | Check success & troubleshoot |
