#### Report Usage Datasets 

##### Data ingestion strategy:
<mark style="background: #88D5FF;">**N/A**</mark>

##### Related pipeline:

**Ext_Load_PBI_Report_Usage_E2E**

##### Source:

**Table** from FUAM_Ext_Lakehouse table **gold_table_name** variable

##### Target:

**1 Data pipeline** in related_pipeline 
- **odata_response_json** variable value returned by notebook

In [1]:
import json
from pyspark.sql import SparkSession
from notebookutils import mssparkutils # type: ignore

print("Successfully imported all packages for this notebook.")

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 3, Finished, Available, Finished)

Successfully imported all packages for this notebook.


In [2]:
#
# Create the Spark session
#
app_name = "GetReportUsageDatasets"

# Get the current Spark session
spark = SparkSession.builder \
    .appName(app_name) \
    .getOrCreate()

print(f"Spark session {app_name} has been created successfully.")

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 4, Finished, Available, Finished)

Spark session GetReportUsageDatasets has been created successfully.


In [3]:
## Parameters
display_data = True

## Variables
gold_table_name = "workspace_datasets"

print("Successfully configured all paramaters for this run.")

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 5, Finished, Available, Finished)

Successfully configured all paramaters for this run.


In [4]:
# Read from the Lakehouse table
df = spark.read.table(gold_table_name)

# Filter by DatasetName
# filtered_df = df.filter(df["DatasetName"] == "Report Usage Metrics Model") # Old usage report format
filtered_df = df.filter(df["DatasetName"] == "Usage Metrics Report") # New usage report format

# Select only needed columns
selected_df = filtered_df.select("WorkspaceId", "DatasetId")

print(f"Gold layer table {gold_table_name} has been read and filtered successfully.")

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 6, Finished, Available, Finished)

Gold layer table workspace_datasets has been read and filtered successfully.


In [5]:
if display_data:
    display(selected_df)

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 7, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 69e40266-3139-4dd8-97a9-49c39d417f9b)

In [6]:
# Collect the values to a list of dicts
results = selected_df.rdd.map(lambda row: {"WorkspaceId": row["WorkspaceId"], "DatasetId": row["DatasetId"]}).collect()

# Wrap the results in the desired format
odata_response = {
    "@odata.context": "https://wabi-us-east2-d-primary-redirect.analysis.windows.net/v1.0/myorg/admin/$metadata#groups",
    "@odata.count": len(results),
    "value": results
}

# Convert the response to a JSON string
odata_response_json = json.dumps(odata_response)
if display_data:
   print(odata_response_json)

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 8, Finished, Available, Finished)

{"@odata.context": "https://wabi-us-east2-d-primary-redirect.analysis.windows.net/v1.0/myorg/admin/$metadata#groups", "@odata.count": 4, "value": [{"WorkspaceId": "43DA5FAA-1C21-4314-BD0A-C15AE4887314", "DatasetId": "0270E8BE-EA7A-47DC-B698-6FD6A8D1A372"}, {"WorkspaceId": "3012B677-8AE6-499E-9273-150DF5D9D8D2", "DatasetId": "28678A20-198B-4FA5-8CB2-D211F273AF85"}, {"WorkspaceId": "3AC7CE42-AE74-4E7D-8AC3-5CE8358A30DF", "DatasetId": "A4C91678-F53C-479B-B3D2-BF573EF36660"}, {"WorkspaceId": "08D4AB9A-6333-461F-85D4-BA3F7A7330F6", "DatasetId": "F82EA03D-E70E-4675-82FB-83F256673C03"}]}


In [7]:
type(odata_response_json)

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 9, Finished, Available, Finished)

str

In [8]:
print("Successfully created the result JSON which will be returned to the pipeline.")

# Send result to pipeline variable
mssparkutils.notebook.exit(odata_response_json)

StatementMeta(, 21cab598-3261-4c79-b5a7-a3adf230523b, 10, Finished, Available, Finished)

Successfully created the result JSON which will be returned to the pipeline.
ExitValue: {"@odata.context": "https://wabi-us-east2-d-primary-redirect.analysis.windows.net/v1.0/myorg/admin/$metadata#groups", "@odata.count": 4, "value": [{"WorkspaceId": "43DA5FAA-1C21-4314-BD0A-C15AE4887314", "DatasetId": "0270E8BE-EA7A-47DC-B698-6FD6A8D1A372"}, {"WorkspaceId": "3012B677-8AE6-499E-9273-150DF5D9D8D2", "DatasetId": "28678A20-198B-4FA5-8CB2-D211F273AF85"}, {"WorkspaceId": "3AC7CE42-AE74-4E7D-8AC3-5CE8358A30DF", "DatasetId": "A4C91678-F53C-479B-B3D2-BF573EF36660"}, {"WorkspaceId": "08D4AB9A-6333-461F-85D4-BA3F7A7330F6", "DatasetId": "F82EA03D-E70E-4675-82FB-83F256673C03"}]}