# 流式数据到Power BI的展示

## 创建流数据集

![08](https://i.loli.net/2021/02/22/EvWNOuCxjL9aynA.png)

## 配置流数据集的结构

数据结构要和dataframe里的列一致，列名和列类型一致。

![09](https://i.loli.net/2021/02/22/7ixZVLwsQAGaDFB.png)

## 获取流数据集的API

拷贝推送URL。

![10](https://i.loli.net/2021/02/22/K9LVGx5npDQ1JId.png)

## 下载databricks cli

下载链接：[databricks cli](https://docs.microsoft.com/en-us/azure/databricks/dev-tools/cli/)

此处建议用python的pip来安装。

## 配置databricks

In [None]:
databricks configure --token

输入adb的host和token进行注册。

## 创建加密url

**语法：**

In [None]:
databricks secrets create-scope --scope scope名称 --initial-manage-principal users

**例如：**

In [None]:
databricks secrets create-scope --scope powerbi --initial-manage-principal users

**语法：**

In [None]:
databricks secrets put --scope scope名称 --key key名称

**例如：**

In [None]:
databricks secrets put --scope powerbi --key skusensorstreamingapi

最后输入推送Url保存退出。

## 调用

导入PysparkPowerBIStreaming.PowerBIStreaming的whl包：

![11](https://i.loli.net/2021/02/22/8lvbdAyKtsxRYZa.png)

## 在python notebook中引入包

In [None]:
from PysparkPowerBIStreaming.PowerBIStreaming import PowerBIStreaming

…

创建与powerbi流数据集一样格式的pyspark.sql.dataframe.DataFrame对象

…

**实例化powerbi对象语法：**

In [None]:
powerbi = PowerBIStreaming(dbutils, "scope名称", "key名称")

**例如：**

In [None]:
powerbi = PowerBIStreaming(dbutils, "powerbi", "skusensorstreamingapi")

**运行：**

In [None]:
powerbi.sendBatch(DataFrame对象)

**例如：**

In [None]:
powerbi.sendBatch(df) 

## 关键代码

In [0]:
from pyspark.sql.functions import *

#read stream data from EventHub
connectionString = "Your Connection String"
conf = {}

conf['eventhubs.connectionString'] = connectionString
conf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

read_df  = spark.readStream.format("eventhubs").options(**conf).load()

df = read_df.withColumn("Body",read_df["body"].cast("string"))\
.withColumn("storeid",get_json_object("Body","$.storeid").cast("string"))\
.withColumn("timestamp",get_json_object("Body","$.timestamp").cast("string"))\
.withColumn("SKU",get_json_object("Body","$.SKU").cast("integer"))\
.select("storeid","timestamp","SKU")

display(df)

storeid,timestamp,SKU
store123,2020-12-24 03:42:17,15
store123,2020-12-24 03:42:27,13
store123,2020-12-24 03:42:37,8
store123,2020-12-24 03:42:47,6
store123,2020-12-24 03:42:58,2
store123,2020-12-24 03:43:08,20
store123,2020-12-24 03:43:18,16
store123,2020-12-24 03:43:28,13
store123,2020-12-24 03:43:38,10
store123,2020-12-24 03:43:48,9


In [0]:
#For Structure Streaming data + PBI direct query ==> Stream write data to Delta Lake
df.writeStream.format("delta").outputMode("append").option("checkpointLocation", "/delta/events/_checkpoints/sku").table("sku")

In [0]:
#For powerbi streaming 
from PysparkPowerBIStreaming.PowerBIStreaming import PowerBIStreaming, PowerBIBatchStreaming
powerbi = PowerBIStreaming(dbutils, "powerbipoc", "skusensorstreamingapi")

In [0]:
powerbi.sendBatch(df)

**注意：此处需要注意的是对于2.3.15及更高版本，需要在配置中加密连接字符串：**

In [None]:
conf['eventhubs.connectionString'] =  sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)

## 参考资料

https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/PySpark/structured-streaming-pyspark.md#event-hubs-configuration