# Tutorial: Extract and Calculate PowerBI Measures from Jupyter Notebook
This notebook illustrates how to use SemPy to calculate measures in PowerBI datasets.

### In this tutorial, you learn how to:
- Apply domain knowledge to formulate hypotheses about functional dependencies in a dataset;
- Get familiarized with components of Semantic Link's Python library ([SemPy](https://learn.microsoft.com/en-us/python/api/semantic-link-sempy)) that helps to bridge the gap between AI and BI. These components include:
    - FabricDataFrame - a pandas-like structure enhanced with additional semantic information;
    - Useful functions that allow you to fetch PowerBI datasets, including raw data, configuration and measures.

### Prerequisites

* A [Microsoft Fabric subscription](https://learn.microsoft.com/fabric/enterprise/licenses). Or sign up for a free [Microsoft Fabric (Preview) trial](https://learn.microsoft.com/fabric/get-started/fabric-trial).
* Sign in to [Microsoft Fabric](https://fabric.microsoft.com/).
* Go to the Data Science experience in Microsoft Fabric.
* Select **Workspaces** from the left navigation pane to find and select your workspace. This workspace becomes your current workspace.
* Download the _Retail Analysis Sample PBIX.pbix_ dataset from the [fabric-samples GitHub repository](https://github.com/microsoft/fabric-samples/blob/09cb40f1ffe0a7cfec67ec0ba2fcfdc95ba750a8/docs-samples/data-science/datasets/Customer%20Profitability%20Sample.pbix) and upload it to your workspace.
* Open your notebook. You have two options:
    * [Import this notebook into your workspace](https://learn.microsoft.com/en-us/fabric/data-engineering/how-to-use-notebook#import-existing-notebooks). You can import from the Data Science homepage.
    * Alternatively, you can create [a new notebook](https://learn.microsoft.com/fabric/data-engineering/how-to-use-notebook#create-notebooks) to copy/paste code into cells.
* In the Lakehouse explorer section of your notebook, add a new or existing lakehouse to your notebook. For more information on how to add a lakehouse, see [Attach a lakehouse to your notebook](https://learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-prepare-system#attach-a-lakehouse-to-the-notebooks).

## Set up the notebook

In this section, you'll set up a notebook environment with the necessary modules and data.

First, install `SemPy` from PyPI using pip magic command:

In [1]:
%pip install semantic-link

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, -1, Finished, Available)

Collecting semantic-link
  Downloading semantic_link-0.3.1-py3-none-any.whl (8.1 kB)
Collecting semantic-link-functions-phonenumbers==0.3.1
  Downloading semantic_link_functions_phonenumbers-0.3.1-py3-none-any.whl (4.3 kB)
Collecting semantic-link-sempy==0.3.1
  Downloading semantic_link_sempy-0.3.1-py3-none-any.whl (2.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m65.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting semantic-link-functions-meteostat==0.3.1
  Downloading semantic_link_functions_meteostat-0.3.1-py3-none-any.whl (4.5 kB)
Collecting semantic-link-functions-holidays==0.3.1
  Downloading semantic_link_functions_holidays-0.3.1-py3-none-any.whl (4.2 kB)
Collecting semantic-link-functions-validators==0.3.1
  Downloading semantic_link_functions_validators-0.3.1-py3-none-any.whl (4.7 kB)
Collecting semantic-link-functions-geopandas==0.3.1
  Downloading semantic_link_functions_geopandas-0.3.1-py3-none-any.whl (4.0 kB)
Co




Then, perform necessary imports of modules that you'll need later on: 

In [2]:
import sempy.fabric as fabric

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 9, Finished, Available)

We can connect to the PowerBI Workspace. 

In [3]:
fabric.list_datasets()

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 10, Finished, Available)

Unnamed: 0,Dataset Name,Dataset ID,Created Timestamp,Last Update
0,Customer Profitability Sample,5efc976e-272a-4f9d-a20f-8f27f4036428,2014-07-22 03:50:22,0001-01-01 00:00:00
1,Retail Analysis Sample PBIX,52f903b2-aa50-45ea-809d-d47800494464,2014-05-30 20:16:22,0001-01-01 00:00:00


In this example, we use the Retail Analysis Sample PBIX:

In [4]:
dataset = "Retail Analysis Sample PBIX"

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 11, Finished, Available)

## List Workspace Measures

Start by listing measures in the dataset using SemPy's `list_measures` function as follows:

In [5]:
fabric.list_measures(dataset)

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 12, Finished, Available)

Unnamed: 0,Table Name,Measure Name,Measure Expression,Measure Data Type
0,Store,Average Selling Area Size,AVERAGE([SellingAreaSize]),Double
1,Store,New Stores,"CALCULATE(COUNTA([Store Type]), FILTER(ALL(Sto...",Int64
2,Store,New Stores Target,14,Int64
3,Store,Total Stores,COUNTA([StoreNumberName]),Int64
4,Store,Open Store Count,COUNTA([OpenDate]),Int64
5,Store,Count of OpenDate,COUNTA('Store'[OpenDate]),Int64
6,Sales,Regular_Sales_Dollars,SUM([Sum_Regular_Sales_Dollars]),Double
7,Sales,Markdown_Sales_Dollars,SUM([Sum_Markdown_Sales_Dollars]),Double
8,Sales,TotalSales,[Regular_Sales_Dollars]+[Markdown_Sales_Dollars],Double
9,Sales,TotalSalesLY,"CALCULATE([TotalSales], Sales[ScenarioID]=2)",Double


## Evaluate Measures

### Evaluate a Raw Measure

In the code below, we use SemPy's function `evaluate_measure` to calculate preconfigured measure that is called "Average Selling Area Size" (its underlying formula can be seen in the output of the cell above) by supplying its name in `measure` parameter:

In [6]:
fabric.evaluate_measure(dataset, measure="Average Selling Area Size")

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 13, Finished, Available)

Unnamed: 0,Average Selling Area Size
0,24326.923077


### Evaluate a Measure with GroupBy Columns

We can group the measure output by certain columns by supplying additional parameter `groupby_columns`:

In [7]:
fabric.evaluate_measure(dataset, measure="Average Selling Area Size", groupby_columns=[("Store", "Chain"), ("Store", "DistrictName")])

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 14, Finished, Available)

Unnamed: 0,Chain,DistrictName,Average Selling Area Size
0,Fashions Direct,FD - District #1,43888.888889
1,Fashions Direct,FD - District #2,47777.777778
2,Fashions Direct,FD - District #3,50000.0
3,Fashions Direct,FD - District #4,50500.0
4,Lindseys,LI - District #1,10384.615385
5,Lindseys,LI - District #2,10909.090909
6,Lindseys,LI - District #3,10333.333333
7,Lindseys,LI - District #4,12500.0
8,Lindseys,LI - District #5,11785.714286


In the above example we group by columns "Chain" and "DistrictName" of "Store" table in the dataset.

### Evaluate a Measure with Filters

We can also add filters to specify specific column values the result should be in. Parameter `filters` helps with that and can be used as follows:

In [12]:
fabric.evaluate_measure(dataset, \
                        measure="Total Units Last Year", \
                        groupby_columns=[("Store", "Territory")], \
                        filters={("Store", "Territory"): ["PA", "TN", "VA"], ("Store", "Chain"): ["Lindseys"]})

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 19, Finished, Available)

Unnamed: 0,Territory,Total Units Last Year
0,PA,11309
1,TN,81663
2,VA,160863


Note that "Store" is the name of the table, "Territory" is the name of the column", and "PA" is one of the values which are allowed by the filter.

### Across Multiple Tables

These groups can span multiple tables in the dataset.

In [9]:
fabric.evaluate_measure(dataset, measure="Total Units Last Year", groupby_columns=[("Store", "Territory"), ("Sales", "ItemID")])

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 16, Finished, Available)

Unnamed: 0,Territory,ItemID,Total Units Last Year
0,DE,18049,1
1,DE,18069,1
2,DE,18079,1
3,DE,18085,1
4,DE,18087,3
...,...,...,...
178636,WV,244167,13
178637,WV,244223,4
178638,WV,244242,2
178639,WV,244246,2


### Evaluate Multiple Measures

The function `evaluate_measure` allows you to supply multiple identifies of measures, and output the calculated values in the same dataframe as in the self-explanatory example below:

In [10]:
fabric.evaluate_measure(dataset, measure=["Average Selling Area Size", "Total Stores"], groupby_columns=[("Store", "Chain"), ("Store", "DistrictName")])

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 17, Finished, Available)

Unnamed: 0,Chain,DistrictName,Average Selling Area Size,Total Stores
0,Fashions Direct,FD - District #1,43888.888889,9
1,Fashions Direct,FD - District #2,47777.777778,9
2,Fashions Direct,FD - District #3,50000.0,9
3,Fashions Direct,FD - District #4,50500.0,10
4,Lindseys,LI - District #1,10384.615385,13
5,Lindseys,LI - District #2,10909.090909,11
6,Lindseys,LI - District #3,10333.333333,15
7,Lindseys,LI - District #4,12500.0,14
8,Lindseys,LI - District #5,11785.714286,14


## Use XMLA Connector

The default dataset client is backed by Power BI's REST APIs. If there are any issues running queries with this client, it is possible to switch the backend to Power BI's XMLA interface using `use_xmla=True`. The SemPy parameters remain the same for measure calculation with XMLA.

In [11]:
fabric.evaluate_measure(dataset, \
                        measure=["Average Selling Area Size", "Total Stores"], \
                        groupby_columns=[("Store", "Chain"), ("Store", "DistrictName")], \
                        filters={("Store", "Territory"): ["PA", "TN", "VA"], ("Store", "Chain"): ["Lindseys"]}, \
                        use_xmla=True)

StatementMeta(, 8e64bec4-3b6e-4168-8043-e36dc0cc6b75, 18, Finished, Available)

Unnamed: 0,Chain,DistrictName,Average Selling Area Size,Total Stores
0,Lindseys,LI - District #2,11000,10
1,Lindseys,LI - District #5,12000,5
2,Lindseys,LI - District #1,10000,1


## Next step

Try using the learned ways of detecting data quality issues on your data and get amazed with the discoveries!

## Related content

Check out other tutorials for Semantic Link / SemPy:
1. Analyze Functional Dependencies in a PowerBI Sample Dataset
1. Discover Relationships in SYNTHEA dataset Using Semantic Link
1. Discover Relationships in a PowerBI Dataset Using Semantic Link
1. Clean Data with Functional Dependencies