# **Azure Synapse and Azure Open AI interaction**

[https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/using-openai-gpt-in-synapse-analytics/ba-p/3751815](https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/using-openai-gpt-in-synapse-analytics/ba-p/3751815)

[https://github.com/Azure/azure-openai-workshop](https://github.com/Azure/azure-openai-workshop)

In order to use the Azure OpenAI in Synapse Spark, we’ll be using three components. The setup of these components is out of scope for this article.

A Synapse Analytics workspace with a Spark Pool
An Azure OpenAI cognitive service with text-davinci-003 model deployed
Azure Key vault to store the OpenAI API key

In [1]:
%%configure -f
{
  "name": "synapseml",
  "conf": {
      "spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.10.2",
      "spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
      "spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
      "spark.yarn.user.classpath.first": "true"
  }
}

StatementMeta(, 12, -1, Finished, Available)

**Prompt function**

In [2]:
print("Create UDF 'restaurant_prompt_udf()) -> str'")

StatementMeta(OpenAI, 12, 1, Finished, Available)

Create UDF 'restaurant_prompt_udf()) -> str'


**prompt 1**
```
<|im_start|>system
Generate a json containing a restaurant positive, negative or neutral review. Use the following json structure: 
{ 
        "restaurant\": "",
        "review": ""     
}
<|im_end|>

In [20]:
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

@udf(returnType=StringType())
def restaurant_prompt_udf():
    return "Generate a json containing a restaurant positive, negative or neutral review. Use the following json structure:: \
\Generate real world restaurant reviews negative, positive or neutral \
\{ \
        \"restaurant\": \"\", \
        \"review\": \"\" \
\}"


StatementMeta(OpenAI, 12, 19, Finished, Available)

In [21]:
print("UDF created")

StatementMeta(OpenAI, 12, 20, Finished, Available)

UDF created


**Get the Azure Keyvault OpenAI Key**

[https://oai.azure.com/](https://oai.azure.com/)

In [24]:
from synapse.ml.core.platform import find_secret

# Fill in the following lines with your service information
service_name = "" # Name of your OpenAI service
deployment_name = "" # Name of your deployment in OpenAI
key = find_secret("", "")  # replace this with your secret and keyvault

NrOfReviews = 5 # Set number of Restaurantreviews

StatementMeta(OpenAI, 12, 23, Finished, Available)

**Generate Datas**

In [25]:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql.functions import col 

dfreviewid = df1 = spark.range(1,NrOfReviews + 1) \
    .withColumnRenamed("id", "reviewid") \
    .withColumn("prompt", restaurant_prompt_udf())

display(dfreviewid)

StatementMeta(OpenAI, 12, 24, Finished, Available)

SynapseWidget(Synapse.DataFrame, 6a2200b2-b4f3-49d0-b102-31cda1423642)

**Create the OpenAI Spark client**

In [26]:
from synapse.ml.cognitive import OpenAICompletion

OpenAICompletion = (
    OpenAICompletion()
    .setSubscriptionKey(key)
    .setDeploymentName(deployment_name)
    .setUrl("https://{}.openai.azure.com/".format(service_name))
    .setMaxTokens(2048)
    .setPromptCol("prompt")
    .setErrorCol("error")
    .setOutputCol("response")
)

StatementMeta(OpenAI, 12, 25, Finished, Available)

This section can be modify with the batch prompt approach, with an array of string

In [27]:
from pyspark.sql.functions import col

dfreview = OpenAICompletion.transform(dfreviewid) \
    .select(col('reviewID'), col('response.choices.text').getItem(0).alias('reviewobject'))\
    .cache()

display(dfreview)

StatementMeta(OpenAI, 12, 26, Finished, Available)

SynapseWidget(Synapse.DataFrame, 6de79ae1-df83-4281-a3f7-ba2b4661de0f)

In [28]:
from pyspark.sql.types import StructType, StructField, StringType
from pyspark.sql.functions import col, from_json

schema = StructType([ \
        StructField("restaurant", StringType(), False), \
        StructField("review", StringType(), False) \
        ])


df = dfreview.withColumn("json",from_json(col("reviewobject"), schema))\
    .select(col("reviewID"), col("json.*"))

display(df)

StatementMeta(OpenAI, 12, 27, Finished, Available)

SynapseWidget(Synapse.DataFrame, abb4aa28-6329-4eeb-942d-b26eef63b866)

**prompt 2**
```
Classify the sentiment of following restaurant review. 
                 Classifications: [Positive, Negative , Neutral] 

 Review:"""""" The food was excellent and cooked to perfection. The staff were very friendly and happy to answer questions about the menu. The atmosphere was lively and the prices were very reasonable. Highly recommended!""""""
 
Classification:

In [29]:
from pyspark.sql.types import *
from pyspark.sql.functions import *

dfprompt = df.withColumn("prompt",\
                concat(lit("Classify the sentiment of following restaurant review. \n \
                Classifications: [Positive, Negative , Neutral] \n Review:\"\"\" ")\
                , col("review")\
                ,lit("\"\"\"\nClassification:")))\
            

#display(dfprompt)

StatementMeta(OpenAI, 12, 28, Finished, Available)

In [30]:
display(dfprompt.select("prompt").limit(1))

StatementMeta(OpenAI, 12, 29, Finished, Available)

SynapseWidget(Synapse.DataFrame, cc24cca3-1881-40aa-9979-d695da1697be)

In [31]:
from synapse.ml.cognitive import OpenAICompletion

completion = (
    OpenAICompletion()
    .setSubscriptionKey(key)
    .setDeploymentName(deployment_name)
    .setUrl("https://{}.openai.azure.com/".format(service_name))
    .setMaxTokens(2048)
    .setPromptCol("prompt")
    .setErrorCol("error")
    .setOutputCol("response")
)


StatementMeta(OpenAI, 12, 30, Finished, Available)

In [32]:
from pyspark.sql.functions import col

completed_df = completion.transform(dfprompt).cache()

display(completed_df)

StatementMeta(OpenAI, 12, 31, Finished, Available)

SynapseWidget(Synapse.DataFrame, 328e3082-caad-4b17-868c-fae317fe851f)

In [33]:
from pyspark.sql.functions import col

display(
    completed_df.select( 
        col('response.choices.text').getItem(0).alias('openai_sentiment'),
        col('Restaurant'), 
        col('reviewid'),
        col('Review')
    )
    
)

StatementMeta(OpenAI, 12, 32, Finished, Available)

SynapseWidget(Synapse.DataFrame, 242493e0-6482-4a8c-8145-2310f3cd1f79)

In [34]:
# Import pyspark.pandas
import pandas

plotdf = completed_df.select( 
        col('response.choices.text').getItem(0).alias('openai_sentiment'),
        col('Restaurant'), 
        col('reviewid'),
        col('Review')
    )
    
display(plotdf)
plotpandasdf = plotdf.toPandas()

# plotpandasdf.plot(kind="bar", x="openai_sentiment", y="Restaurant")

StatementMeta(OpenAI, 12, 33, Finished, Available)

SynapseWidget(Synapse.DataFrame, 8753c332-08db-4088-9510-1684c2b9c6c4)