## Use Azure AI Language text analytics in Fabric with REST API and SynapseML

Azure AI Language is an Azure AI service that enables you to perform text mining and text analysis with Natural Language Processing (NLP) features.

### Set up authentication and endpoints

In [1]:
# Get workload endpoints and access token
from synapse.ml.fabric.service_discovery import get_fabric_env_config
from synapse.ml.fabric.token_utils import TokenUtils
import json
import requests

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 3, Finished, Available, Finished)

> This code uses Fabric's built-in authentication. The get_fabric_env_config function automatically retrieves your workspace credentials and connects to the prebuilt AI services. No API key is required.

In [2]:
fabric_env_config = get_fabric_env_config().fabric_env_config
auth_header = TokenUtils().get_openai_auth_header()

# Make a RESful request to AI service
prebuilt_AI_base_host = fabric_env_config.ml_workload_endpoint + "cognitive/textanalytics/"
print("Workload endpoint for AI service: \n" + prebuilt_AI_base_host)

service_url = prebuilt_AI_base_host + "language/:analyze-text?api-version=2022-05-01"
print("Service URL: \n" + service_url)

auth_headers = {
    "Authorization" : auth_header
}

def print_response(response):
    if response.status_code == 200:
        print(json.dumps(response.json(), indent=2))
    else:
        print(f"Error: {response.status_code}, {response.content}")

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 4, Finished, Available, Finished)

Workload endpoint for AI service: 
https://pbipindcen2-centralindia.pbidedicated.windows.net/webapi/capacities/d40e324e-1d9e-4953-aed7-9f4787b9536b/workloads/ML/ML/Automatic/workspaceid/813bdb96-27f1-4c2a-b79c-c24a08722010/cognitive/textanalytics/
Service URL: 
https://pbipindcen2-centralindia.pbidedicated.windows.net/webapi/capacities/d40e324e-1d9e-4953-aed7-9f4787b9536b/workloads/ML/ML/Automatic/workspaceid/813bdb96-27f1-4c2a-b79c-c24a08722010/cognitive/textanalytics/language/:analyze-text?api-version=2022-05-01


## REST API approach: Direct HTTP calls to the service (recommended for beginners)

### Sentiment Analysis

The Sentiment Analysis feature provides a way for detecting the sentiment labels (such as "negative," "neutral" and "positive") and confidence scores at the sentence and document-level. This feature also returns confidence scores between 0 and 1 for each document and sentences within it for positive, neutral, and negative sentiment. See the [Sentiment Analysis and Opinion Mining language support](https://learn.microsoft.com/en-us/azure/ai-services/language-service/sentiment-opinion-mining/language-support) for the list of enabled languages.

In [7]:
payload = {
    "kind": "SentimentAnalysis",
    "parameters": {
        "modelVersion": "latest",
        "opinionMining": "True"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language":"en",
                "text": "The food and service were unacceptable. The concierge was nice, however."
            }
        ]
    }
} 

response = requests.post(service_url, json=payload, headers=auth_headers)


# Output all information of the request process
print_response(response)

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 9, Finished, Available, Finished)

{
  "kind": "SentimentAnalysisResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "sentiment": "negative",
        "confidenceScores": {
          "positive": 0.0,
          "neutral": 0.0,
          "negative": 1.0
        },
        "sentences": [
          {
            "sentiment": "negative",
            "confidenceScores": {
              "positive": 0.0,
              "neutral": 0.0,
              "negative": 1.0
            },
            "offset": 0,
            "length": 40,
            "text": "The food and service were unacceptable. "
          },
          {
            "sentiment": "neutral",
            "confidenceScores": {
              "positive": 0.22,
              "neutral": 0.75,
              "negative": 0.04
            },
            "offset": 40,
            "length": 32,
            "text": "The concierge was nice, however."
          }
        ],
      }
    ],
    "errors": [],
    "modelVersion": "2025-01-01"
  }
}


In [8]:
payload = {
  "kind": "SentimentAnalysis",
  "parameters": {
    "modelVersion": "latest"
  },
  "analysisInput": {
    "documents": [
      {
        "id": "1",
        "language": "en-US",
        "text": "I am so happy today, its sunny!"
      },
      {
        "id": "2",
        "language": "en-US",
        "text": "I am frustrated by this rush hour traffic"
      },
      {
        "id": "3",
        "language": "en-US",
        "text": "The cognitive services on Fabric aint bad"
      }
    ]
  }
}

response = requests.post(service_url, json=payload, headers=auth_headers)


# Output all information of the request process
print_response(response)

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 10, Finished, Available, Finished)

{
  "kind": "SentimentAnalysisResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "sentiment": "positive",
        "confidenceScores": {
          "positive": 1.0,
          "neutral": 0.0,
          "negative": 0.0
        },
        "sentences": [
          {
            "sentiment": "positive",
            "confidenceScores": {
              "positive": 1.0,
              "neutral": 0.0,
              "negative": 0.0
            },
            "offset": 0,
            "length": 31,
            "text": "I am so happy today, its sunny!",
            "targets": [],
            "assessments": []
          }
        ],
      },
      {
        "id": "2",
        "sentiment": "negative",
        "confidenceScores": {
          "positive": 0.0,
          "neutral": 0.0,
          "negative": 1.0
        },
        "sentences": [
          {
            "sentiment": "negative",
            "confidenceScores": {
              "positive": 0.0,
              "neutral

### Language detector
The Language Detector evaluates text input for each document and returns language identifiers with a score that indicates the strength of the analysis. This capability is useful for content stores that collect arbitrary text, where language is unknown. See the [Supported languages for language detection](https://learn.microsoft.com/en-us/azure/ai-services/language-service/language-detection/language-support) for the list of enabled languages.

In [9]:
payload = {
    "kind": "LanguageDetection",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "text": "This is a document written in English."
            }
        ]
    }
}

response = requests.post(service_url, json=payload, headers=auth_headers)

# Output all information of the request process
print_response(response)

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 11, Finished, Available, Finished)

{
  "kind": "LanguageDetectionResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "detectedLanguage": {
          "name": "English",
          "iso6391Name": "en",
          "confidenceScore": 0.95
        }
      }
    ],
    "errors": [],
    "modelVersion": "2024-11-01"
  }
}


### Key Phrase Extractor
The Key Phrase Extraction evaluates unstructured text and returns a list of key phrases. This capability is useful if you need to quickly identify the main points in a collection of documents. See the [Supported languages for key phrase extraction](https://learn.microsoft.com/en-us/azure/ai-services/language-service/key-phrase-extraction/language-support) for the list of enabled languages.

In [10]:
payload = {
    "kind": "KeyPhraseExtraction",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language":"en",
                "text": "Dr. Smith has a very modern medical office, and she has great staff."
            }
        ]
    }
}

response = requests.post(service_url, json=payload, headers=auth_headers)

# Output all information of the request process
print_response(response)

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 12, Finished, Available, Finished)

{
  "kind": "KeyPhraseExtractionResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "keyPhrases": [
          "modern medical office",
          "Dr. Smith",
          "great staff"
        ],
      }
    ],
    "errors": [],
    "modelVersion": "2022-10-01"
  }
}


### Named Entity Recognition (NER)
Named Entity Recognition (NER) is the ability to identify different entities in text and categorize them into predefined classes or types such as: person, location, event, product, and organization. See the [NER language support](https://learn.microsoft.com/en-us/azure/ai-services/language-service/named-entity-recognition/language-support?tabs=ga-api) for the list of enabled languages.

In [11]:
payload = {
    "kind": "EntityRecognition",
    "parameters": {
        "modelVersion": "latest"
    },
    "analysisInput":{
        "documents":[
            {
                "id":"1",
                "language": "en",
                "text": "I had a wonderful trip to Seattle last week."
            }
        ]
    }
}

response = requests.post(service_url, json=payload, headers=auth_headers)

# Output all information of the request process
print_response(response)

StatementMeta(, 58cb1deb-85f4-481a-8b8d-0abae4d36128, 13, Finished, Available, Finished)

{
  "kind": "EntityRecognitionResults",
  "results": {
    "documents": [
      {
        "id": "1",
        "entities": [
          {
            "text": "trip",
            "category": "Event",
            "offset": 18,
            "length": 4,
            "confidenceScore": 0.66
          },
          {
            "text": "Seattle",
            "category": "Location",
            "subcategory": "City",
            "offset": 26,
            "length": 7,
            "confidenceScore": 1.0
          },
          {
            "text": "last week",
            "category": "DateTime",
            "subcategory": "DateRange",
            "offset": 34,
            "length": 9,
            "confidenceScore": 1.0
          }
        ],
      }
    ],
    "errors": [],
    "modelVersion": "2025-02-01"
  }
}


## SynapseML approach: Using Spark DataFrames for larger-scale processing

#### Import required libraries

In [2]:
import synapse.ml.core
from synapse.ml.services.language import AnalyzeText
from pyspark.sql.functions import col

StatementMeta(, 9b2e1043-2bc0-41fe-9453-6882b90aff60, 4, Finished, Available, Finished)

### Sentiment analysis

In [3]:
df = spark.createDataFrame([
    ("Great atmosphere. Close to plenty of restaurants, hotels, and transit! Staff are friendly and helpful.",),
    ("What a sad story!",)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("SentimentAnalysis")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("sentiment", col("documents.sentiment"))

display(result.select("text", "sentiment"))

StatementMeta(, 9b2e1043-2bc0-41fe-9453-6882b90aff60, 5, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 23792e46-717e-4190-aecf-064c4405ba16)

### Language detector

In [4]:
df = spark.createDataFrame([
    (["Hello world"],),
    (["Bonjour tout le monde", "Hola mundo", "Tumhara naam kya hai?"],),
    (["你好"],),
    (["日本国（にほんこく、にっぽんこく、英"],)
], ["text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("LanguageDetection")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("detectedLanguage", col("documents.detectedLanguage.name"))

display(result.select("text", "detectedLanguage"))

StatementMeta(, 9b2e1043-2bc0-41fe-9453-6882b90aff60, 6, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 3a63ba26-a2f2-40c3-9685-41e91c96f9b3)

### Key Phrase Extractor

In [5]:
df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Text Analytics is one of the Azure Cognitive Services."),
    ("en", "My cat might need to see a veterinarian.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("KeyPhraseExtraction")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("keyPhrases", col("documents.keyPhrases"))

display(result.select("text", "keyPhrases"))

StatementMeta(, 9b2e1043-2bc0-41fe-9453-6882b90aff60, 7, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 0c46b56b-a103-448b-a908-5d03af7c6282)

### Named Entity Recognition (NER)

In [7]:
df = spark.createDataFrame([
    ("en", "Microsoft was founded by Bill Gates and Paul Allen."),
    ("en", "Pike place market is my favorite Seattle attraction.")
], ["language", "text"])

model = (AnalyzeText()
        .setTextCol("text")
        .setKind("EntityRecognition")
        .setOutputCol("response"))

result = model.transform(df)\
        .withColumn("documents", col("response.documents"))\
        .withColumn("entityNames", col("documents.entities.text"))

display(result.select("text", "entityNames"))

StatementMeta(, 9b2e1043-2bc0-41fe-9453-6882b90aff60, 9, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 9b0cc5a8-1a61-4f4a-bf0b-ec59db1ff88b)