### Microsoft Fabric Book - Azure AI Services demonstration

#### Analyze hotel reviews with AI in Fabric

*Imagine you’re planning a trip to Yellowstone National Park and need to book a hotel that meets your preferences. With hundreds of reviews to sift through, it can be overwhelming to find the most relevant information and filter hotels accordingly. However, with Fabric, you can easily translate, extract, and classify hotel reviews with zero setup effort using prebuilt AI services. Then, with Power BI, you can create a visual report that allows you to filter hotels by categories and view their ratings and comments.*

**SynapseML Installation**

We will be using [SynapseML](https://microsoft.github.io/SynapseML/docs/Overview/) as a tool to help us analyze hotel reviews. SynapseML is an open-source library that makes it easy to create large scale machine learning pipelines. Fabric has the latest SynapseML package preinstalled and integrated with prebuilt AI models, making it a breeze to create smart and scalable systems for various domains.

In [1]:
import synapse.ml.core
from synapse.ml.services import *
from pyspark.sql.functions import col, flatten, udf, lower, trim
from pyspark.sql.types import StringType

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 3, Finished, Available, Finished)

In [2]:
df_result = spark.sql("SELECT * FROM LakeDBIA.hotel_review_ai_services LIMIT 1000")
display(df_result)

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 4, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 7c8924e4-388c-4737-bf88-42cca02634e5)

**Classification using Azure OpenAI**

*The [Azure Open AI service](https://azure.microsoft.com/products/ai-services/openai-service/) provides REST API access to OpenAI’s powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series.*

*hese models can be easily used to suit your specific needs, such as content generation, summarization, and natural language to code translation.*

**In Fabric, you can access the prebuilt Azure OpenAI service through [REST API](https://learn.microsoft.com/fabric/data-science/ai-services/how-to-use-openai-via-rest-api), [Python SDK](https://learn.microsoft.com/fabric/data-science/ai-services/how-to-use-openai-sdk-synapse) or [SynapseML](https://learn.microsoft.com/fabric/data-science/ai-services/how-to-use-openai-sdk-synapse).**

To learn more about the Azure Open AI models that Fabric supports, please refer to the [AI services documentation](https://learn.microsoft.com/fabric/data-science/ai-services/ai-services-overview#azure-openai-service).


In [3]:
from pyspark.sql.functions import udf
from pyspark.sql.types import ArrayType, StructType, StructField, StringType

# Définir le schéma du message structuré
message_schema = ArrayType(StructType([
    StructField("role", StringType(), True),
    StructField("content", StringType(), True),
    StructField("name", StringType(), True)
]))

# Fonction UDF pour créer une liste de messages (system + user)
def create_messages(text):
    return [
        {"role": "system", "content": "You are a classifier. Classify the following news headline into exactly one of these categories: Service, Location, Facilities, Sanitation. Respond with a single word only.", "name": "system"},
        {"role": "user", "content": f"Headline: {text}, Classified category:", "name": "user"}
    ]

# UDF pour la colonne messages
process_column = udf(create_messages, message_schema)

# Application à ton DataFrame
df_en_key_prompt = df_result.withColumn("messages", process_column(df_result["translation"])).cache()
display(df_en_key_prompt.tail(5))


StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 5, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 5e21f7c5-1e0b-4a5a-91d6-c4a5ef96d8ff)

In [4]:
from synapse.ml.core.platform import find_secret

# Fill in the following lines with your service information
# Learn more about selecting which embedding model to choose: https://openai.com/blog/new-and-improved-embedding-model
service_name = "fgiai"
deployment_name = "gpt-4.1"
keyvault = "fgi"

key = find_secret(
    "openai-api-key-fabric",
    keyvault="fgi"
)

assert key is not None and service_name is not None

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 6, Finished, Available, Finished)

In [5]:
from synapse.ml.services.openai import OpenAIChatCompletion
from pyspark.sql.functions import col, lower, trim
from pyspark.sql import Row

chat_completion = (
    OpenAIChatCompletion()
    .setSubscriptionKey(key)
    .setDeploymentName(deployment_name)
    .setCustomServiceName(service_name)
    .setMessagesCol("messages")
    .setErrorCol("error")
    .setOutputCol("classification")
)

completed_df = chat_completion.transform(df_en_key_prompt)\
    .withColumn("class", trim(lower(col("classification.choices.message.content")[0])))\
    .cache()

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 7, Finished, Available, Finished)

In [6]:
display(completed_df)

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 21, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 97a022d2-b76e-4fcf-aad6-dafaefbef5e7)

In [7]:
df_final = completed_df.select(completed_df.columns[:8]+["class"])
display(df_final.tail(5))

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 9, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 317320b4-e6f3-40aa-b1b3-2cf55994d45c)

In [8]:
df_final.select("class").distinct().show(truncate = False)

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 10, Finished, Available, Finished)

+----------+
|class     |
+----------+
|facilities|
|location  |
|sanitation|
|service   |
+----------+



In [48]:
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf

def translate(mapping):
    def translate_(col):
        return mapping.get(col) or col
    return udf(translate_, StringType())

mapping = {'category: facilities' : 'facilities', 'customer service': 'service', 'service.':'service','facilities.':'facilities','\nsanitation':'sanitation','\nlocation':'location', 'sanitation.':'sanitation', 'location.':'location','\n\nsanitation':'sanitation', '\n\nfacilities.':'facilities', '\n\nlocation, sanitation, facilities, service' : 'location', '\n\nservice' : 'service', '\n\nfacilities' : 'facilities', 'service\n' : 'service', '\nservice' : 'services', 'facilities\n' : 'facilities', 'null' : 'aucune', 'service' : 'services', '1) facilities2) sanitation ' : 'facilities and sanitation ', 'facilities, sanitation' : 'facilities and sanitation ', 'sanitation.\n' : 'sanitation', '\tservice' : 'service', '\n\nlocation' : 'location', 'sanitat' : 'sanitation', '\nfacilities' : 'facilities', 'service' : 'services', '1) facilities2) sanitation' : 'facilities and sanitation'}

#df_final.translate(mapping)("class")
df_final = df_final.withColumn("class", translate(mapping)("class"))

StatementMeta(, 408c54e9-941c-4eb4-ac0d-1b1431340231, 50, Finished, Available, Finished)

In [49]:
df_final.select("class").distinct().show(truncate = False)

StatementMeta(, 408c54e9-941c-4eb4-ac0d-1b1431340231, 51, Finished, Available, Finished)

+----------+
|class     |
+----------+
|facilities|
|sanitation|
|location  |
|services  |
+----------+



In [9]:
spark.conf.set('spark.sql.parquet.vorder.enabled', 'true')

(df_final
.write
.mode("overwrite")
.format("delta")
.option("parquet.vorder.enabled ","true")
.saveAsTable("hotel_review_classification")
)

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 11, Finished, Available, Finished)

In [10]:
df_result = spark.sql("SELECT * FROM LakeDBIA.hotel_review_classification")
display(df_result)

StatementMeta(, c450a223-f740-4729-ab62-6999dc102382, 12, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 20dea447-14ab-4ce0-9ef5-ebb2eb7859ff)