# Lab 2: Semantic Link


## Step 1: Setup your notebook

### Select Lakehouse (FC_Workshop)
First, add the Lakehouse you created from the prior lab exercise.

![image-alt-text](https://synapseaisolutionsa.blob.core.windows.net/public/Fabric-Conference/add-lakehouse.png)

### Select environment or install within session
![Select Environment and then select your environment from the list](https://synapseaisolutionsa.blob.core.windows.net/public/Fabric-Conference/AttachEnv2.png)

In [None]:
# Install the library or use the myEnv created from earlier
%pip install semantic-link

In [None]:
# make sure we can use %%dax
%load_ext sempy

## Step 2: Import Semantic Link

Semantic link is a feature that allows you to establish a connection between semantic models and Synapse Data Science in Microsoft Fabric.

![Overview of semantic link](https://learn.microsoft.com/en-us/fabric/data-science/media/semantic-link-overview/data-flow-with-semantic-link.png)

With semantic link, you can use semantic models from Power BI in the Data Science experience to perform tasks such as in-depth statistical analysis and predictive modeling with machine learning techniques. The output of your data science work can be stored in OneLake using Apache Spark and ingested into Power BI using Direct Lake.

You can learn more about Semantic Link functions using [What is semantic link?](https://learn.microsoft.com/en-us/fabric/data-science/semantic-link-overview).

In [None]:
import sempy.fabric as fabric
from sempy.relationships import plot_relationship_metadata
from sempy.fabric._client._tools import import_pbix_sample

## Step 3: Let's explore our model

In [None]:
# Load our Churn dataset - a PBIX sample has been pre-configured with the relationships and semantic info
dataset = 'Churn'
import_pbix_sample([dataset])

In [None]:
df_relationships = fabric.list_relationships(dataset)
plot_relationship_metadata(df_relationships)

In [None]:
fabric.list_measures(dataset)

In [None]:
# Outstanding balances per country

df_balance_by_geography = fabric.evaluate_measure(
    dataset,
    ["Balance"],
    ["Customers[Geography]"])

df_balance_by_geography.set_index('Geography').plot.bar()

In [None]:
df_measures = fabric.evaluate_measure(
    dataset,
    ["Number Of Products", "Last Credit Score", "Balance"],
    ["Customers[CustomerId]"])

df_measures


In [None]:
df_customer = fabric.read_table(dataset, "Customers")
df_customer

In [None]:
df_account = fabric.read_table(dataset, "Accounts")
df_account

In [None]:
# merge all data
df_churn = df_customer.merge(df_measures).merge(df_account)
df_churn

Semantic data frames provide convenience methods to write data to a lakehouse.

In [None]:
df_churn.to_lakehouse_table('ChurnFromSemanticLink', mode = "overwrite")

# Exercise 1: Analyze data with Pandas plots

You can learn more about how to use plotting functions in Pandas using [this documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html).

In this exercise, you will:
- Plot the average credit score by geography and tenure.
- Analyze the impact on tenure based on geography.

To do this, you will:
1. TODO: Generate a code snippet that creates `df_credit_score_by_geo_tenure`. To do this, start with the DataFrame named `df_churn`. Then, follow these steps:
   * **Group the data** by the 'Geography' and 'Tenure' columns. This is done using the `groupby` method. *Hint:* By setting `as_index=False`, you ensure that the grouping columns are not used as the index in the resulting DataFrame.
   * **Calculate the mean** of the 'Last Credit Score' for each group. This is achieved by selecting the 'Last Credit Score' column and applying the `mean` function.
   * **Pivot the result** to reformat the data so that 'Tenure' becomes the index, 'Geography' becomes the column labels, and the values are the mean 'Last Credit Score'. This is done using the `pivot` method, where `index='Tenure'`, `columns='Geography'`, and `values='Last Credit Score'`.
1. Plot the dataframe as a bar chart. This step is provided below.
1. Customize the bar chart to visualize credit scores above or below a common average (e.g. using the bottom argument). This step is provided below.


In [None]:
# Modify this code using the instructions above
df_credit_score_by_geo_tenure = # Complete

# Do not change this code
baseline = 650

(df_credit_score_by_geo_tenure - baseline).plot.bar(bottom=baseline) 

# Exercise 2: Leverage DAX for computations

Repeat the exercise 1, but this time perform the computation using DAX.

In this exercise, you will do the following:
1. Add the appropriate DAX query to compute the average credit score per Geography per Tenure. You can use the ```%%dax``` notebook magic to experiment within the notebook.
1. Pivot the data using Pandas to get a dataframe like this:

    ![image-alt-text](https://synapseaisolutionsa.blob.core.windows.net/public/Fabric-Conference/semantic-link-pivot-df.png)

In [None]:
%%dax Churn

# Write DAX Query here

In [None]:
df = fabric.evaluate_dax(dataset, 
"""
UPDATE WITH DAX QUERY ABOVE
)
""")

df.head()

In [None]:
df.pivot(index='Customers[Tenure]', columns='Customers[Geography]', values='[Average Credit Score]')