## Install Dependencies
Install the nuget packages required for the Entity Store SDK.

In [1]:
#r "nuget:Azure.Identity,1.1.1"
#r "nuget:Azure.Storage.Blobs,12.6.0"

StatementMeta(patc6sx8767zd5k, 44, 1, Finished, Available)

Installing package Azure.Storage.Blobs, version 12.6.0...Installing package Azure.Identity, version 1.1.1...Installed package Azure.Storage.Blobs version 12.6.0Installed package Azure.Identity version 1.1.1

## Configuration
Provide connection details for the data lake and sql pool.

In [4]:
using Microsoft.Spark.Extensions.Azure.Synapse.Analytics.Utils;

dynamic dataLakeConnectionInfo = new {
    StorageAccountName = "",
    Container = "",
    EntityStoreMetadataPath = "",
};

dynamic sqlPoolConnectionInfo = new {
    PoolName = "",
    Schema = "",
    Database = "",
};


var sqlPoolConnectionString = TokenLibrary.GetSecret("<KEYVAULT-NAME>", "SECRET-NAME-SQL");

string adlsConnectionString = TokenLibrary.GetSecret("<KEYVAULT-NAME>", "SECRET-NAME-ADLS");



StatementMeta(patc6sx8767zd5k, 44, 4, Finished, Available)



## Dynamics Entity Store SDK

In [5]:
using System.IO;
using System.IO.Compression;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Spark;

class EntityStoreSdk {

    private string SqlConnectionString {get; set;}
    private Microsoft.Spark.Sql.SparkSession SparkSession {get; set;}

    public EntityStoreSdk(string sqlConnectionString, Microsoft.Spark.Sql.SparkSession sparkSession)
    {
        this.SqlConnectionString = sqlConnectionString;
        this.SparkSession = sparkSession;
    }

    public JObject ReadMetadata(MemoryStream memoryStream)
    {
        using (var zip = new ZipArchive(memoryStream, ZipArchiveMode.Read))
        {
            // Finds the root measurement metadata
            var entryList = zip.Entries.ToList();
            var measurementEntry = entryList.FirstOrDefault(e => e.FullName == "measurement.json");

            if (measurementEntry == null)
            {
                throw new Exception($"Cannot find measurement metadata file 'measurement.json' in the root folder of the file ");
            }

            using (var stream = measurementEntry.Open())
            {
                var serializer = new JsonSerializer();

                using (var sr = new StreamReader(stream))
                using (var jsonTextReader = new JsonTextReader(sr))
                {
                    dynamic measurementMetadata = JObject.Load(jsonTextReader);

                    Console.WriteLine($"Processing measurement '{measurementMetadata.Label}' ({measurementMetadata.Name})");

                    return measurementMetadata;
                }
            }
        }

        throw new Exception("Could not process the aggregate measurement.");
    }

    public string ConvertReservedWords(dynamic dimensionAttribute)
    {
        HashSet<string> reservedWords = new HashSet<string>()
        {
            "KEY",
            "DATE",
        };

        if (reservedWords.Contains(dimensionAttribute.ToString().ToUpper()))
        {
            return $"{dimensionAttribute}_";
        }

        return dimensionAttribute.ToString();
    }

    public DataFrame QueryTable(string query)
    {
        var df = this.SparkSession.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", this.SqlConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query)   // use dbtable for the entire table
              .Option("isolationLevel", "READ_UNCOMMITTED")
              .Load();

        return df;
    }
}

StatementMeta(patc6sx8767zd5k, 44, 5, Finished, Available)



## Load Aggregate Measurement Metadata

In [6]:
using global::Azure.Identity;
using global::Azure.Storage.Blobs;

var entityStoreSdk = new EntityStoreSdk(sqlPoolConnectionString, spark);
JObject measureMetadata = null;

BlobServiceClient blobServiceClient = new BlobServiceClient(adlsConnectionString);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(dataLakeConnectionInfo.Container);

BlobClient blobClient = containerClient.GetBlobClient(dataLakeConnectionInfo.EntityStoreMetadataPath);
if (await blobClient.ExistsAsync())
{
      Console.WriteLine("Downloading Entity Store Metadata...");

      using (var memoryStream = new MemoryStream())
      {
            blobClient.DownloadTo(memoryStream);
            measureMetadata = entityStoreSdk.ReadMetadata(memoryStream);
      }
}

StatementMeta(patc6sx8767zd5k, 44, 6, Finished, Available)

Downloading Entity Store Metadata...
Processing measurement 'RLX BI Ledger Cube' (RLXBILedgerCube)


## Display Temp Variable

In [7]:
display(measureMetadata["MeasureGroups"]);

StatementMeta(patc6sx8767zd5k, 44, 7, Finished, Available)

[{"Name":"BudgetRegisterEntry","Table":"BudgetRegisterEntryEntity","Attributes":[{"Name":"AmountType","NameField":"AmountType","KeyFields":[{"DimensionField":"AmountType"}]},{"Name":"BudgetCode","NameField":"BudgetCode","KeyFields":[{"DimensionField":"BudgetCode"}]},{"Name":"BudgetType","NameField":"BudgetType","KeyFields":[{"DimensionField":"BudgetType"}]},{"Name":"Comment","NameField":"Comment","KeyFields":[{"DimensionField":"Comment"}]},{"Name":"EntryNumber","NameField":"EntryNumber","KeyFields":[{"DimensionField":"EntryNumber"}]},{"Name":"TransactionStatus","NameField":"RLXTransactionStatus","KeyFields":[{"DimensionField":"RLXTransactionStatus"}]}],"CalculatedMeasures":[],"Dimensions":[{"Name":"BudgetModel","DimensionName":"RLXBIBudgetModelTableDim","UseTableRelations":0,"DimensionRelations":[{"Name":"RLXBIBudgetModelView","DimensionAttribute":"PBIKey","Constraints":[{"Name":"RLXBudgetModelPBIKey","Field":"PBIKey","RelatedField":"RLXBudgetModelPBIKey"}]}]},{"Name":"Company","Dimens

## Query Azure Synapse Table

The example below illustrates querying an Azure Synapse Table from Spark using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [8]:
var query = "select top(100) * from Weather";

var df = spark.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", sqlPoolConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query) //use dbtable for the entire table
              .Load();

display(df.Head(15));


## Join Fact and Dimension Tables in Azure Synapse (Data Cooking)

In the example below, we'll traverse the Measurement Metadata to join fact and dimensions tables using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [15]:
var entityStoreSdk = new EntityStoreSdk(sqlPoolConnectionString, spark);

try {
    // Iterate over each MeasureGroup/FactTable.
    foreach (var measureGroup in measureMetadata["MeasureGroups"])
    {
        var selectMeasureGroupnQuery = $"SELECT ";
        foreach (var attribute in measureGroup["Attributes"])
        {
            selectMeasureGroupnQuery += $"MG.{attribute["KeyFields"][0]["DimensionField"]}, ";
        }

        var joinQuery = string.Empty;
        int count = 1;
        foreach (var dimension in measureGroup["Dimensions"])
        {
            foreach (var dimensionRelation in dimension["DimensionRelations"])
            {
                selectMeasureGroupnQuery += $"MG.{dimensionRelation["Constraints"][0]["RelatedField"]}, ";

                joinQuery += $" INNER JOIN {measureMetadata["Name"]}_{dimension["Name"]} AS D{count} ON MG.{dimensionRelation["Constraints"][0]["RelatedField"]} = D{count}.{entityStoreSdk.ConvertReservedWords(dimensionRelation["DimensionAttribute"])}";
                count++;
            }
        }

        selectMeasureGroupnQuery = selectMeasureGroupnQuery.Remove(selectMeasureGroupnQuery.Length - 2);
        selectMeasureGroupnQuery += $" FROM [DBO].[{measureGroup["Table"]}] AS MG";
        selectMeasureGroupnQuery += joinQuery;

        Console.WriteLine($"Executing query: {selectMeasureGroupnQuery.ToString()}\n");

        var starSchemaDf = entityStoreSdk.QueryTable(selectMeasureGroupnQuery.ToString());

        display(starSchemaDf);

        // Implicit write
        /*starSchemaDf.Write().Format("com.microsoft.cdm")
        .Option("storage", dataLakeConnectionInfo.StorageAccountName + ".dfs.core.windows.net")
        .Option("manifestPath", dataLakeConnectionInfo.Container + $"/{measureGroup["Table"].ToString()}/default.manifest.cdm.json")
        .Option("entity", $"{measureGroup["Table"].ToString()}")
        .Option("format", "csv")
        .Mode(SaveMode.Overwrite)
        .Save();*/
    }
}
catch ( Exception ex ) {
    Console.WriteLine($"Error processing aggregate measurement: {measureMetadata["Name"].ToString()} \n {ex.Message}");
}


StatementMeta(patc6sx8767zd5k, 44, 15, Finished, Available)

Executing query: SELECT MG.AmountType, MG.BudgetCode, MG.BudgetType, MG.Comment, MG.EntryNumber, MG.RLXTransactionStatus, MG.RLXBudgetModelPBIKey, MG.LegalEntityId, MG.CurrencyCode, MG.RLXMainAccountKey, MG.Date, MG.RLXBrandKey, MG.RLXCostCentreKey, MG.RLXEmployeeKey, MG.RLXIntercoKey, MG.RLXInternalOrderKey, MG.RLXNatureKey, MG.RLXSectionKey, MG.RLXDimension1Key, MG.RLXDimension2Key, MG.RLXDimension3Key, MG.RLXDimension4Key, MG.RLXDimension5Key, MG.RLXDimension6Key, MG.RLXDimension7Key, MG.RLXDimension8Key, MG.RLXDimension9Key, MG.RLXDimension10Key FROM [DBO].[BudgetRegisterEntryEntity] AS MG INNER JOIN RLXBILedgerCube_BudgetModel AS D1 ON MG.RLXBudgetModelPBIKey = D1.PBIKey INNER JOIN RLXBILedgerCube_Company AS D2 ON MG.LegalEntityId = D2.Company INNER JOIN RLXBILedgerCube_Currency AS D3 ON MG.CurrencyCode = D3.Currency INNER JOIN RLXBILedgerCube_MainAccount AS D4 ON MG.RLXMainAccountKey = D4.MainAccountRecId INNER JOIN RLXBILedgerCube_PostingDate AS D5 ON MG.Date = D5.Date_ INNER JO

## Store Results in Azure Synapse SQL (Dedicated Server)

Azure Synapse team recommends using the ["synapsesql" connector](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) to read/write data from Azure Synapse SQL.
This connector is currently only available in Scala, so in order to use it we'll have to first store the Dataframe as a temp view (Hive Table),
then use the connector in Scala to write the temp table into Azure Synapse like the cells below.

Please notice that you must grant "Storage Blob Data Contributor" right in the storage account used by the Azure Synapse workspace.
See the [documentation](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) for more details.



In [9]:
df.CreateOrReplaceTempView("DestinationWeatherTable");

In [10]:
%%spark
val scala_df = spark.sqlContext.sql ("select * from DestinationWeatherTable")

scala_df.write.
option(Constants.SERVER, "synapsews-dev-westus2-c30d4.database.windows.net").
synapsesql("SampleSQL.dbo.DestinationWeatherTable", Constants.INTERNAL)

## Store Results in the Data Lake in CDM Format

You can use CDM Spark Connector to write the contents of the DataFrame using [CDM Spark](https://github.com/Azure/spark-cdm-connector) connector:

1. Download the latest CDM Spark connector jar e.g. spark-cdm-connector-assembly-0.19.1.jar
2. Upload the package in the workspace following these [instructions](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-scala-packages).
3. Install the package in the Spark Pool in the notebook (or in all Spark Pools by default)




In [None]:
// See examples in https://github.com/Azure/spark-cdm-connector/blob/master/samples/SparkCDMsample.scala

// Implicit write
df.Write().Format("com.microsoft.cdm")
  .Option("storage", dataLakeConnectionInfo.StorageAccountName + ".dfs.core.windows.net")
  .Option("manifestPath", dataLakeConnectionInfo.Container + "/nestedImplicit/default.manifest.cdm.json")
  .Option("entity", "NestedExampleImplicit")
  .Option("format", "parquet")
  .Mode(SaveMode.Append)
  .Save();