## Install Dependencies
Install the nuget packages required for the Entity Store SDK.

In [1]:
#r "nuget:Azure.Identity,1.1.1"
#r "nuget:Azure.Storage.Blobs,12.6.0"

## Configuration
Provide connection details for the data lake and sql pool.

In [2]:
dynamic dataLakeConnectionInfo = new {
    StorageAccountName = "",
    Container = "",
    EntityStoreMetadataPath = "",
    TenantId = "",
    AppId = "",
    AppKey = ""
};

dynamic sqlPoolConnectionInfo = new {
    PoolName = "",
    Schema = "",
    Database = "",
    Username = "",
    Password = "",
};


var sqlPoolConnectionString = $"jdbc:sqlserver://{sqlPoolConnectionInfo.PoolName}.sql.azuresynapse.net:1433;database={sqlPoolConnectionInfo.Database};user={sqlPoolConnectionInfo.Username};password={sqlPoolConnectionInfo.Password};encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.sql.azuresynapse.net;loginTimeout=30;";



## Dynamics Entity Store SDK

In [15]:
using System.IO;
using System.IO.Compression;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using Microsoft.Spark;

class EntityStoreSdk {

    private string SqlConnectionString {get; set;}
    private Microsoft.Spark.Sql.SparkSession SparkSession {get; set;}

    public EntityStoreSdk(string sqlConnectionString, Microsoft.Spark.Sql.SparkSession sparkSession)
    {
        this.SqlConnectionString = sqlConnectionString;
        this.SparkSession = sparkSession;
    }

    public JObject ReadMetadata(MemoryStream memoryStream)
    {
        using (var zip = new ZipArchive(memoryStream, ZipArchiveMode.Read))
        {
            // Finds the root measurement metadata
            var entryList = zip.Entries.ToList();
            var measurementEntry = entryList.FirstOrDefault(e => e.FullName == "measurement.json");

            if (measurementEntry == null)
            {
                throw new Exception($"Cannot find measurement metadata file 'measurement.json' in the root folder of the file ");
            }

            using (var stream = measurementEntry.Open())
            {
                var serializer = new JsonSerializer();

                using (var sr = new StreamReader(stream))
                using (var jsonTextReader = new JsonTextReader(sr))
                {
                    dynamic measurementMetadata = JObject.Load(jsonTextReader);

                    Console.WriteLine($"Processing measurement '{measurementMetadata.Label}' ({measurementMetadata.Name})");

                    return measurementMetadata;
                }
            }
        }

        throw new Exception("Could not process the aggregate measurement.");
    }

    public DataFrame QueryTable(string query)
    {
        var df = this.SparkSession.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", this.SqlConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query)   // use dbtable for the entire table
              .Option("isolationLevel", "READ_UNCOMMITTED")
              .Load();

        return df;
    }
}

## Load Aggregate Measurement Metadata

In [1]:
using global::Azure.Identity;
using global::Azure.Storage.Blobs;

var entityStoreSdk = new EntityStoreSdk(sqlPoolConnectionString, spark);
JObject measureMetadata = null;

var tokenCredential = new ClientSecretCredential(
                dataLakeConnectionInfo.TenantId,
                dataLakeConnectionInfo.AppId,
                dataLakeConnectionInfo.AppKey);

var blobEndpoint = new Uri($"https://{dataLakeConnectionInfo.StorageAccountName}.dfs.core.windows.net");

BlobServiceClient blobServiceClient = new BlobServiceClient(blobEndpoint, tokenCredential);
BlobContainerClient containerClient = blobServiceClient.GetBlobContainerClient(dataLakeConnectionInfo.Container);

BlobClient blobClient = containerClient.GetBlobClient(dataLakeConnectionInfo.EntityStoreMetadataPath);
if (await blobClient.ExistsAsync())
{
      Console.WriteLine("Downloading Entity Store Metadata...");

      using (var memoryStream = new MemoryStream())
      {
            blobClient.DownloadTo(memoryStream);
            measureMetadata = entityStoreSdk.ReadMetadata(memoryStream);
      }
}

## Query Azure Synapse Table

The example below illustrates querying an Azure Synapse Table from Spark using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [8]:
var query = "select top(100) * from Weather";

var df = spark.Read()
              .Format("com.microsoft.sqlserver.jdbc.spark")
              .Option("url", sqlPoolConnectionString)
              .Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
              .Option("query", query) //use dbtable for the entire table
              .Load();

display(df.Head(15));


## Join Fact and Dimension Tables in Azure Synapse

In the example below, we'll traverse the Measurement Metadata to join fact and dimensions tables using the [SQL Server connector](https://docs.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15) in the Entity Store SDK.

In [30]:
try {
    // Iterate over each MeasureGroup/FactTable.
    foreach (var measureGroup in measureMetadata["MeasureGroups"]) {
        var selectMeasureGroupDimensionQuery = new StringBuilder($"SELECT * FROM {measureGroup["Table"].ToString()}");
        foreach (var dimension in measureGroup["Dimensions"])
        {
            var dimensionRelation = dimension["DimensionRelations"]?.FirstOrDefault(); // gets the first relation
            selectMeasureGroupDimensionQuery.Append($" INNER JOIN {dimensionRelation["Name"].ToString()} ON {dimensionRelation["DimensionAttribute"].ToString()}");
        }

        Console.WriteLine($"Executing query: {selectMeasureGroupDimensionQuery.ToString()}");

        var starSchemaDf = entityStoreSdk.QueryTable(selectMeasureGroupDimensionQuery.ToString());

        display(starSchemaDf.Head(15));

        // Implicit write
        df.Write().Format("com.microsoft.cdm")
        .Option("storage", dataLakeConnectionInfo.StorageAccountName + ".dfs.core.windows.net")
        .Option("manifestPath", dataLakeConnectionInfo.Container + $"/{measureGroup["Table"].ToString()}/default.manifest.cdm.json")
        .Option("entity", $"{measureGroup["Table"].ToString()}")
        .Option("format", "csv")
        .Mode(SaveMode.Overwrite)
        .Save();
    }
}
catch ( Exception ex ) {
    Console.WriteLine($"Error processing measure group: {measureGroup["Name"].ToString()} \n {ex.Message}");
}


## Store Results in Azure Synapse SQL (Dedicated Server)

Azure Synapse team recommends using the ["synapsesql" connector](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) to read/write data from Azure Synapse SQL.
This connector is currently only available in Scala, so in order to use it we'll have to first store the Dataframe as a temp view (Hive Table),
then use the connector in Scala to write the temp table into Azure Synapse like the cells below.

Please notice that you must grant "Storage Blob Data Contributor" right in the storage account used by the Azure Synapse workspace.
See the [documentation](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/synapse-spark-sql-pool-import-export#use-pyspark-with-the-connector) for more details.



In [9]:
df.CreateOrReplaceTempView("DestinationWeatherTable");

In [10]:
%%spark
val scala_df = spark.sqlContext.sql ("select * from DestinationWeatherTable")

scala_df.write.
option(Constants.SERVER, "synapsews-dev-westus2-c30d4.database.windows.net").
synapsesql("SampleSQL.dbo.DestinationWeatherTable", Constants.INTERNAL)

## Store Results in the Data Lake in CDM Format

You can use CDM Spark Connector to write the contents of the DataFrame using [CDM Spark](https://github.com/Azure/spark-cdm-connector) connector:

1. Download the latest CDM Spark connector jar e.g. spark-cdm-connector-assembly-0.19.1.jar
2. Upload the package in the workspace following these [instructions](https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-scala-packages).
3. Install the package in the Spark Pool in the notebook (or in all Spark Pools by default)




In [None]:
// See examples in https://github.com/Azure/spark-cdm-connector/blob/master/samples/SparkCDMsample.scala

// Implicit write
df.Write().Format("com.microsoft.cdm")
  .Option("storage", dataLakeConnectionInfo.StorageAccountName + ".dfs.core.windows.net")
  .Option("manifestPath", dataLakeConnectionInfo.Container + "/nestedImplicit/default.manifest.cdm.json")
  .Option("entity", "NestedExampleImplicit")
  .Option("format", "parquet")
  .Mode(SaveMode.Append)
  .Save();