In Microsoft Fabric, Dataflows (Gen2) connect to various data sources and perform transformations in [Power Query Online](https://learn.microsoft.com/en-us/power-query/power-query-ui). They can then be used in Data Pipelines to ingest data into a lakehouse or other analytical store, or to define a dataset for a Power BI report.

## Prerequisites ##
- Access to a Microsoft Fabric enabled tenant
- At least [**Contributor**](https://learn.microsoft.com/en-us/fabric/fundamentals/roles-workspaces) access to a workspace
- The workspace must be assigned to a [**Fabric capacity**](https://learn.microsoft.com/en-us/fabric/enterprise/licenses)
    - (hint: Workspace â†’ Workspace settings â†’ License info â†’ edit)

## Create Workspace ##
1) Navigate to the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric)
2) In the menu bar on the left, select Workspaces (the icon looks similar to ðŸ—‡)
3) Create a new workspace with a name of your choice, selecting a licensing mode that includes Fabric capacity (Trial, Premium, or Fabric).

## Create Lakehouse ##
1) Inside the workspace, click **+ New**.
2) Select Lakehouse under the **Data Engineering** section.
3) Click **Create**.

Once created, you Lakehouse will have:
- Tables folder â†’ structured Delta tables
- Files folder â†’ unstructured / raw data
- Built-in SQL endpoint for querying
- Auto-created Power BI semantic model

## [Ingest data to your lakehouse using Dataflow (Gen2)](https://learn.microsoft.com/en-us/fabric/data-factory/create-first-dataflow-gen2) ##
Define a **dataflow** that encapsulates an extract, transform, and load (ETL) process
1) In the home page for your lakehouse, select **Get data â†’ New Dataflow Gen2**.
    - In a while **Power Query editor** for your new dataflow opens
2) Select **Import from a Text/CSV file**, and create a new **data source** with the following settings:
- **Link to file**: Selected
- **File path or URL**: https://raw.githubusercontent.com/MicrosoftLearning/dp-data/main/orders.csv
- **Connection**: Create new connection
- **Connection name**: Specify a unique name
- **data gateway**: (none)
- **Authentication kind**: Anonymous
3) Transform the data
- Add column â†’ Custom column
    - **New column name:** MonthNo
    - **Data type:** Whole Number
    - **Formula:** Date.Month([OrderDate]) 
4) Add **data destination**
- Home â†’ Query â†’ Add data destination
- Disable **Use automatic settings** option
- Select **Append** and then **Save settings**
- **View** â†’ **Diagram view**
- **Home** â†’ **Save & run**

## Add a dataflow to a pipeline ##
You can include a dataflow as an **activity** in a pipeline. Pipelines are used to **orchestrate** data ingestion and processing activities, enabling you to combine dataflows with other kinds of operation in a single, **scheduled** process. Pipelines can be created in a few different experiences, including **Data Factory experience**.
1) Fabric-enabled workspace â†’ **+ New item** â†’ Data pipeline â†’ Create
2) Pipeline activity â†’ Dataflow
3) Settings â†’ from Dataflow drop-down list (select the data flow you created previously)
4) Home â†’ ðŸ–« (Save) icon â†’ â–· Run

**Tip:** 
- In Power BI Desktop, you can connect directly to the data transformations done with your dataflow by using the Power BI dataflows (Legacy) connector. You can also make additional transformations, publish as a new dataset, and distribute with intended audience for specialized datasets.
- You can split dataflows by data slices (like region or year), and once you have a central trusted dataflow, analysts can build smaller, customized models from it without re-creating the data from scratch.
- In Fabric, a global dataflow provides standardized enterprise data, and horizontal partitioning lets us split it by region, year, or business unit. Analysts then build specialized semantic models from the global dataflow instead of pulling from raw systems, which improves performance, governance, and consistency.
- Dataflows Gen2 simplifies ingestion and transformation in Fabric by writing curated data directly to OneLake and enabling reuse across teams. It works best for batch-based, low-code ETL and semantic model preparation, but itâ€™s not suitable for real-time processing, very large-scale transformations, or complex DevOps-driven workloads.