## Install Dependencies

In [1]:
#r "nuget:Azure.Identity,1.1.1"
#r "nuget:Azure.Storage.Blobs,12.6.0"

StatementMeta(patc6sx8767zd5k, 13, 1, Finished, Available)

Installing package Azure.Identity, version 1.1.1...Installing package Azure.Storage.Blobs, version 12.6.0...Installed package Azure.Identity version 1.1.1Installed package Azure.Storage.Blobs version 12.6.0

## Configuration

In [2]:
dynamic dataLakeConnectionInfo = new {
    StorageAccountName = "",
    Container = "",
    EntityStoreMetadataPath = "",
    TenantId = "",
    AppId = "",
    AppKey = ""
};

dynamic sqlPoolConnectionInfo = new {
    PoolName = "",
    Schema = "",
    Database = "",
    Username = "",
    Password = "",
};

var sqlPoolConnectionString = $"jdbc:sqlserver://{sqlPoolConnectionInfo.PoolName}.sql.azuresynapse.net:1433;database={sqlPoolConnectionInfo.Database};user={sqlPoolConnectionInfo.Username};password={sqlPoolConnectionInfo.Password};encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;";



StatementMeta(patc6sx8767zd5k, 13, 2, Finished, Available)



## Dynamics Entity Store SDK

In [15]:
using System.IO;
using System.IO.Compression;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Spark;

class EntityStoreSdk {

    private string SqlConnectionString {get; set;}
    private Microsoft.Spark.Sql.SparkSession SparkSession {get; set;}

    public EntityStoreSdk(string sqlConnectionString, Microsoft.Spark.Sql.SparkSession sparkSession)
    {
        this.SqlConnectionString = sqlConnectionString;
        this.SparkSession = sparkSession;
    }

    public JObject ReadMetadata(MemoryStream memoryStream)
    {
        using (var zip = new ZipArchive(memoryStream, ZipArchiveMode.Read))
        {
            // Finds the root measurement metadata
            var entryList = zip.Entries.ToList();
            var measurementEntry = entryList.FirstOrDefault(e => e.FullName == "measurement.json");

            if (measurementEntry == null)
            {
                throw new Exception($"Cannot find measurement metadata file 'measurement.json' in the root folder of the file ");
            }

            using (var stream = measurementEntry.Open())
            {
                var serializer = new JsonSerializer();

                using (var sr = new StreamReader(stream))
                using (var jsonTextReader = new JsonTextReader(sr))
                {
                    dynamic measurementMetadata = JObject.Load(jsonTextReader);

                    Console.WriteLine($"Processing measurement '{measurementMetadata.Label}' ({measurementMetadata.Name})");

                    return measurementMetadata;
                }
            }
        }

        throw new Exception("Could not process the aggregate measurement.");
    }

    public DataFrame QueryTable(string query)
    {
        var df = this.SparkSession.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", this.SqlConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query)   // use dbtable for the entire table
              .Option("isolationLevel", "READ_UNCOMMITTED")
              .Load();

        return df;
    }
}

StatementMeta(patc6sx8767zd5k, 13, 15, Finished, Available)



## Load Aggregate Measurement Metadata

In [16]:
using global::Azure.Identity;
using global::Azure.Storage.Blobs;

var entityStoreSdk = new EntityStoreSdk(sqlPoolConnectionString, spark);
JObject measureMetadata = null;

var tokenCredential = new ClientSecretCredential(
                dataLakeConnectionInfo.TenantId,
                dataLakeConnectionInfo.AppId,
                dataLakeConnectionInfo.AppKey);

var blobEndpoint = new Uri($"https://{dataLakeConnectionInfo.StorageAccountName}.dfs.core.windows.net");

BlobServiceClient blobServiceClient = new BlobServiceClient(blobEndpoint, tokenCredential);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(dataLakeConnectionInfo.Container);

BlobClient blobClient = containerClient.GetBlobClient(dataLakeConnectionInfo.EntityStoreMetadataPath);
if (await blobClient.ExistsAsync())
{
      Console.WriteLine("Downloading Entity Store Metadata...");

      using (var memoryStream = new MemoryStream())
      {
            blobClient.DownloadTo(memoryStream);
            measureMetadata = entityStoreSdk.ReadMetadata(memoryStream);
      }
}

StatementMeta(patc6sx8767zd5k, 13, 16, Finished, Available)

Downloading Entity Store Metadata...
Processing measurement 'RLX BI Ledger Cube' (RLXBILedgerCube)


In [18]:
display(measureMetadata["MeasureGroups"]);

StatementMeta(patc6sx8767zd5k, 13, 18, Finished, Available)

[{"Name":"BudgetRegisterEntry","Table":"BudgetRegisterEntryEntity","Attributes":[{"Name":"AmountType","NameField":"AmountType","KeyFields":[{"DimensionField":"AmountType"}]},{"Name":"BudgetCode","NameField":"BudgetCode","KeyFields":[{"DimensionField":"BudgetCode"}]},{"Name":"BudgetType","NameField":"BudgetType","KeyFields":[{"DimensionField":"BudgetType"}]},{"Name":"Comment","NameField":"Comment","KeyFields":[{"DimensionField":"Comment"}]},{"Name":"EntryNumber","NameField":"EntryNumber","KeyFields":[{"DimensionField":"EntryNumber"}]},{"Name":"TransactionStatus","NameField":"RLXTransactionStatus","KeyFields":[{"DimensionField":"RLXTransactionStatus"}]}],"CalculatedMeasures":[],"Dimensions":[{"Name":"BudgetModel","DimensionName":"RLXBIBudgetModelTableDim","UseTableRelations":0,"DimensionRelations":[{"Name":"RLXBIBudgetModelView","DimensionAttribute":"PBIKey","Constraints":[{"Name":"RLXBudgetModelPBIKey","Field":"PBIKey","RelatedField":"RLXBudgetModelPBIKey"}]}]},{"Name":"Company","Dimens

## Query Azure Synapse Table

The example below illustrates querying an Azure Synapse Table from Spark using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [8]:
var query = "select top(100) * from Weather";

var df = spark.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", sqlPoolConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query) //use dbtable for the entire table
              .Load();

display(df.Head(15));


StatementMeta(patc6sx8767zd5k, 12, 9, Finished, Available)

index,Schema,Values
0,Microsoft.Spark.Sql.Types.StructType,"[ 20061003, 12507, 0, 71.2 ]"
1,Microsoft.Spark.Sql.Types.StructType,"[ 20121230, 62697, 0.2, 34 ]"
2,Microsoft.Spark.Sql.Types.StructType,"[ 20011106, 78539, 0, 54 ]"
3,Microsoft.Spark.Sql.Types.StructType,"[ 20000711, 62665, 0, 80.5 ]"
4,Microsoft.Spark.Sql.Types.StructType,"[ 20070602, 70017, 0.09, 78.8 ]"
5,Microsoft.Spark.Sql.Types.StructType,"[ 20130828, 70017, 0.09, 78 ]"
6,Microsoft.Spark.Sql.Types.StructType,"[ 20040322, 269329, 0, 35.2 ]"
7,Microsoft.Spark.Sql.Types.StructType,"[ 20000818, 69970, 0, 71.9 ]"
8,Microsoft.Spark.Sql.Types.StructType,"[ 20100930, 188335, 0, 65.8 ]"
9,Microsoft.Spark.Sql.Types.StructType,"[ 20100927, 188335, 0, 62.4 ]"


## Join Fact and Dimension Tables in Azure Synapse

In the example below, we'll traverse the Measurement Metadata to join fact and dimensions tables using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [30]:
var measureGroup = measureMetadata["MeasureGroups"].First();

var factTableDf = entityStoreSdk.QueryTable(measureGroup["Table"].ToString());

var selectMeasureGroupDimensionQuery = new StringBuilder($"SELECT * FROM {measureGroup["Table"].ToString()}");
foreach (var dimension in measureGroup["Dimensions"])
{
    var dimensionRelation = dimension["DimensionRelations"]?.FirstOrDefault(); // gets the first relation
    selectMeasureGroupDimensionQuery.Append($" INNER JOIN {dimensionRelation["Name"].ToString()} ON {dimensionRelation["DimensionAttribute"].ToString()}");
}

Console.WriteLine($"Executing query: {selectMeasureGroupDimensionQuery.ToString()}");

var starSchemaDf = entityStoreSdk.QueryTable(selectMeasureGroupDimensionQuery.ToString());

display(starSchemaDf.Head(15));


StatementMeta(patc6sx8767zd5k, 13, 30, Finished, Available)

Executing query: SELECT * FROM BudgetRegisterEntryEntity INNER JOIN RLXBIBudgetModelView ON PBIKey  INNER JOIN Company ON Company  INNER JOIN BICurrencyView ON Currency  INNER JOIN RLXBIMainAccountView ON MainAccountRecId  INNER JOIN RLXBIDateDimensionValueView ON Date_  INNER JOIN RLXBIBrandView ON Key  INNER JOIN RLXBICostCentreView ON Key  INNER JOIN RLXBIEmployeeView ON Key  INNER JOIN RLXBIIntercoView ON Key  INNER JOIN RLXBIInternalOrderView ON Key  INNER JOIN RLXBINatureView ON Key  INNER JOIN RLXBISectionView ON Key  INNER JOIN RLXBIDimension1View ON Key  INNER JOIN RLXBIDimension2View ON Key  INNER JOIN RLXBIDimension3View ON Key  INNER JOIN RLXBIDimension4View ON Key  INNER JOIN RLXBIDimension5View ON Key  INNER JOIN RLXBIDimension6View ON Key  INNER JOIN RLXBIDimension7View ON Key  INNER JOIN RLXBIDimension8View ON Key  INNER JOIN RLXBIDimension9View ON Key  INNER JOIN RLXBIDimension10View ON Key 


## Store Results in Azure Synapse SQL (Dedicated Server)

Azure Synapse team recommends using the ["synapsesql" connector](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) to read/write data from Azure Synapse SQL.
This connector is currently only available in Scala, so in order to use it we'll have to first store the Dataframe as a temp view (Hive Table),
then use the connector in Scala to write the temp table into Azure Synapse like the cells below.

Please notice that you must grant "Storage Blob Data Contributor" right in the storage account used by the Azure Synapse workspace.
See the [documentation](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) for more details.



In [9]:
df.CreateOrReplaceTempView("DestinationWeatherTable");

StatementMeta(patc6sx8767zd5k, 12, 10, Finished, Available)



In [10]:
%%spark
val scala_df = spark.sqlContext.sql ("select * from DestinationWeatherTable")

scala_df.write.
option(Constants.SERVER, "synapsews-dev-westus2-c30d4.database.windows.net").
synapsesql("SampleSQL.dbo.DestinationWeatherTable", Constants.INTERNAL)

StatementMeta(patc6sx8767zd5k, 12, 11, Finished, Available)

scala_df: org.apache.spark.sql.DataFrame = [DateID: int, GeographyID: int ... 2 more fields]


## Store Results in the Data Lake in CDM Format

You can use CDM Spark Connector to write the contents of the DataFrame using [CDM Spark](https://github.com/Azure/spark-cdm-connector) connector.
