# Transform and enrich data with AI functions
Microsoft Fabric AI Functions enable all business professionals (from developers to analysts) to transform and enrich their enterprise data using generative AI.

AI functions use industry-leading large language models (LLMs) for summarization, classification, text generation, and more. With a single line of code, you can:

- ai.analyze_sentiment: Detect the emotional state of input text.
- ai.classify: Categorize input text according to your labels.
- ai.extract: Extract specific types of information from input text (for example, locations or names).
- ai.fix_grammar: Correct the spelling, grammar, and punctuation of input text.
- ai.generate_response: Generate responses based on your own instructions.
- ai.similarity: Compare the meaning of input text with a single text value, or with text in another column.
- ai.summarize: Get summaries of input text.
- ai.translate: Translate input text into another language.

You can incorporate these functions as part of data science and data engineering workflows, whether you're working with pandas or Spark. 


### Import required libraries

In [1]:
import synapse.ml.spark.aifunc as aifunc

StatementMeta(, 7dd664f0-f4bd-4897-be0e-9ea0d0584ebf, 3, Finished, Available, Finished)

## Apply AI functions
Each of the following functions allows you to invoke the built-in AI endpoint in Fabric to transform and enrich data with a single line of code. You can use AI functions to analyze pandas DataFrames or Spark DataFrames.

### Detect sentiment with ai.analyze_sentiment
The `ai.analyze_sentiment` function invokes AI to identify whether the emotional state expressed by input text is positive, negative, mixed, or neutral. If AI can't make this determination, the output is left blank.

For more detailed instructions about the use of `ai.analyze_sentiment` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/analyze-sentiment).

In [2]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("The cleaning spray permanently stained my beautiful kitchen counter. Never again!",),
        ("I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",),
        ("I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",),
        ("The umbrella is OK, I guess.",)
    ], ["reviews"])

sentiment = df.ai.analyze_sentiment(input_col="reviews", output_col="sentiment")
display(sentiment)

StatementMeta(, 7dd664f0-f4bd-4897-be0e-9ea0d0584ebf, 4, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 7ecf7158-2bb7-4175-81d9-f97de282e963)

### Categorize text with ai.classify
The `ai.classify function` invokes AI to categorize input text according to custom labels you choose. For `ai.classify` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/classify).

In [3]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

StatementMeta(, 7dd664f0-f4bd-4897-be0e-9ea0d0584ebf, 5, Finished, Cancelled, Cancelled)

Py4JJavaError: An error occurred while calling z:com.microsoft.spark.notebook.visualization.display.getDisplayResultForIPython.
: org.apache.spark.SparkException: Job 12 cancelled part of cancelled job group 5
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3102)
	at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:2972)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleJobGroupCancelled$4(DAGScheduler.scala:1248)
	at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
	at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:1247)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3262)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3240)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3229)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1037)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2584)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2605)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2624)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:555)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:508)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
	at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4358)
	at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3327)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4348)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:836)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4346)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$2(SQLExecution.scala:267)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:324)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:263)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:961)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:254)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4346)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:3327)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:3550)
	at org.apache.spark.sql.GetRowsHelper$.getRowsInJsonString(GetRowsHelper.scala:51)
	at com.microsoft.spark.notebook.visualization.display$.generateTableConfig(Display.scala:422)
	at com.microsoft.spark.notebook.visualization.display$.exec(Display.scala:230)
	at com.microsoft.spark.notebook.visualization.display$.$anonfun$getDisplayResultInternal$1(Display.scala:190)
	at com.microsoft.spark.notebook.common.trident.CertifiedTelemetryUtils$.withTelemetry(CertifiedTelemetryUtils.scala:82)
	at com.microsoft.spark.notebook.visualization.display$.getDisplayResultInternal(Display.scala:180)
	at com.microsoft.spark.notebook.visualization.display$.getDisplayResultForIPython(Display.scala:99)
	at com.microsoft.spark.notebook.visualization.display.getDisplayResultForIPython(Display.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.base/java.lang.Thread.run(Thread.java:829)


### Extract entities with ai.extract
The `ai.extract` function invokes AI to scan input text and extract specific types of information that are designated by labels you choose (for example, locations or names). 

For more detailed instructions about the use of `ai.extract` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/extract).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)

### Fix grammar with ai.fix_grammar
The `ai.fix_grammar` function invokes AI to correct the spelling, grammar, and punctuation of input text. 

For more detailed instructions about the use of `ai.fix_grammar` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/fix-grammar).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",)
    ], ["text"])

corrections = df.ai.fix_grammar(input_col="text", output_col="corrections")
display(corrections)

### Answer custom user prompts with ai.generate_response
The `ai.generate_response` function invokes AI to generate custom text based on your own instructions. 

For more detailed instructions about the use of `ai.generate_response` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/generate-response).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Scarves",),
        ("Snow pants",),
        ("Ski goggles",)
    ], ["product"])

responses = df.ai.generate_response(prompt="Write a short, punchy email subject line for a winter sale.", output_col="response")
display(responses)

### Calculate similarity with ai.similarity
The `ai.similarity` function compares each input text value either to one common reference text or to the corresponding value in another column (pairwise mode). The output similarity score values are relative, and they can range from -1 (opposites) to 1 (identical). A score of 0 indicates that the values are unrelated in meaning. 

For more detailed instructions about the use of `ai.similarity` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/similarity).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture")
    ], ["names", "industries"])

similarity = df.ai.similarity(input_col="names", other_col="industries", output_col="similarity")
display(similarity)

### Summarize text with ai.summarize
The `ai.summarize` function invokes AI to generate summaries of input text (either values from a single column of a DataFrame, or row values across all the columns). 

For more detailed instructions about the use of `ai.summarize` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/summarize).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organizationâ€”a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summary")
display(summaries)

### Translate text with ai.translate
The `ai.translate` function invokes AI to translate input text to a new language of your choice. 

For more detailed instructions about the use of `ai.translate` with PySpark, [see this article](https://learn.microsoft.com/en-us/fabric/data-science/ai-functions/pyspark/translate).

In [ ]:
# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)