# AnalyzeText with SynapseML and Azure AI Language
[Azure AI Language](https://learn.microsoft.com/azure/ai-services/language-service/overview) is a cloud-based service that provides Natural Language Processing (NLP) features for understanding and analyzing text. Use this service to help build intelligent applications using the web-based Language Studio, REST APIs, and client libraries.
You can use SynapseML with Azure AI Language for **named entity recognition**, **language detection**, **entity linking**, **key phrase extraction**, **Pii entity recognition** and **sentiment analysis**.

In [0]:
from synapse.ml.services.language import AnalyzeText
from synapse.ml.core.platform import find_secret

ai_service_key = find_secret(
    secret_name="ai-services-api-key", keyvault="mmlspark-build-keys"
)
ai_service_location = "eastus"

## Named Entity Recognition 
[Named Entity Recognition](https://learn.microsoft.com/azure/ai-services/language-service/named-entity-recognition/overview) is one of the features offered by Azure AI Language, a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. The NER feature can identify and categorize entities in unstructured text. For example: people, places, organizations, and quantities. Use this article to learn which [natural languages are supported](https://learn.microsoft.com/azure/ai-services/language-service/named-entity-recognition/language-support?tabs=ga-api) by the NER feature of Azure AI Language.

In [0]:
df = spark.createDataFrame(
    data=[
        ["en", "Dr. Smith has a very modern medical office, and she has great staff."],
        ["en", "I had a wonderful trip to Seattle last week."],
    ],
    schema=["language", "text"],
)

entity_recognition = (
    AnalyzeText()
    .setKind("EntityRecognition")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("entities")
    .setErrorCol("error")
    .setLanguageCol("language")
)

df_results = entity_recognition.transform(df)
display(df_results.select("language", "text", "entities.documents.entities"))

language,text,entities
en,"Dr. Smith has a very modern medical office, and she has great staff.","List(List(Person, 0.98, 5, 4, null, Smith), List(Location, 0.79, 14, 28, Structural, medical office), List(PersonType, 0.85, 5, 62, null, staff))"
en,I had a wonderful trip to Seattle last week.,"List(List(Event, 0.74, 4, 18, null, trip), List(Location, 1.0, 7, 26, GPE, Seattle), List(DateTime, 0.8, 9, 34, DateRange, last week))"


## LanguageDetection
[Language detection](https://learn.microsoft.com/azure/ai-services/language-service/language-detection/overview) can detect the language a document is written in, and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages. Use this article to learn which [natural languages that language detection supports](https://learn.microsoft.com/en-us/azure/ai-services/language-service/language-detection/language-support).

In [0]:
df = spark.createDataFrame(
    data=[
        ["This is a document written in English."],
        ["这是一份用中文写的文件"],
    ],
    schema=["text"],
)

language_detection = (
    AnalyzeText()
    .setKind("LanguageDetection")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("detected_language")
    .setErrorCol("error")
)

df_results = language_detection.transform(df)
display(df_results.select("text","detected_language.documents.detectedLanguage"))

text,detectedLanguage
This is a document written in English.,"List(English, en, 0.99)"
这是一份用中文写的文件,"List(Chinese_Simplified, zh_chs, 1.0)"


## EntityLinking
[Entity linking](https://learn.microsoft.com/azure/ai-services/language-service/entity-linking/overview) identifies and disambiguates the identity of entities found in text. For example, in the sentence "We went to Seattle last week.", the word "Seattle" would be identified, with a link to more information on Wikipedia. [English and Spanish are supported](https://learn.microsoft.com/azure/ai-services/language-service/entity-linking/language-support).

In [0]:
df = spark.createDataFrame(
    data=[
        ["Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975."],
        ["We went to Seattle last week."],
    ],
    schema=["text"],
)

entity_linking = (
    AnalyzeText()
    .setKind("EntityLinking")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("entity_linking")
    .setErrorCol("error")
)

df_results = entity_linking.transform(df)
display(df_results.select("text","entity_linking.documents.entities"))

text,entities
"Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975.","List(List(a093e9b9-90f5-a3d5-c4b8-5855e1b01f85, Wikipedia, Microsoft, en, List(List(0.48, 9, 0, Microsoft)), Microsoft, https://en.wikipedia.org/wiki/Microsoft), List(0d47c987-0042-5576-15e8-97af601614fa, Wikipedia, Bill Gates, en, List(List(0.52, 10, 25, Bill Gates)), Bill Gates, https://en.wikipedia.org/wiki/Bill_Gates), List(df2c4376-9923-6a54-893f-2ee5a5badbc7, Wikipedia, Paul Allen, en, List(List(0.54, 10, 40, Paul Allen)), Paul Allen, https://en.wikipedia.org/wiki/Paul_Allen), List(52535f87-235e-b513-54fe-c03e4233ac6e, Wikipedia, April 4, en, List(List(0.38, 7, 54, April 4)), April 4, https://en.wikipedia.org/wiki/April_4))"
We went to Seattle last week.,"List(List(5fbba6b8-85e1-4d41-9444-d9055436e473, Wikipedia, Seattle, en, List(List(0.17, 7, 11, Seattle)), Seattle, https://en.wikipedia.org/wiki/Seattle))"


## KeyPhraseExtraction
[Key phrase extraction](https://learn.microsoft.com/en-us/azure/ai-services/language-service/key-phrase-extraction/overview) is one of the features offered by Azure AI Language, a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Use key phrase extraction to quickly identify the main concepts in text. For example, in the text "The food was delicious and the staff were wonderful.", key phrase extraction will return the main topics: "food" and "wonderful staff". Use this article to find the [natural languages supported by Key Phrase Extraction](https://learn.microsoft.com/azure/ai-services/language-service/key-phrase-extraction/language-support). 


In [0]:
df = spark.createDataFrame(
    data=[
        ["Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975."],
        ["Dr. Smith has a very modern medical office, and she has great staff."],
    ],
    schema=["text"],
)

key_phrase_extraction = (
    AnalyzeText()
    .setKind("KeyPhraseExtraction")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("key_phrase_extraction")
    .setErrorCol("error")
)

df_results = key_phrase_extraction.transform(df)
display(df_results.select("text","key_phrase_extraction.documents.keyPhrases"))

text,keyPhrases
"Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975.","List(Bill Gates, Paul Allen, Microsoft, April)"
"Dr. Smith has a very modern medical office, and she has great staff.","List(modern medical office, Dr. Smith, great staff)"


## PiiEntityRecognition
The PII detection feature can identify, categorize, and redact sensitive information in unstructured text. For example: phone numbers, email addresses, and forms of identification. The method for utilizing PII in conversations is different than other use cases, and articles for this use have been separated. [Use this article to learn which natural languages are supported](https://learn.microsoft.com/azure/ai-services/language-service/personally-identifiable-information/language-support?tabs=documents).

In [0]:
df = spark.createDataFrame(
    data=[
        ["Call our office at 312-555-1234, or send an email to support@contoso.com"],
        ["Dr. Smith has a very modern medical office, and she has great staff."],
    ],
    schema=["text"],
)

pii_entity_recognition = (
    AnalyzeText()
    .setKind("PiiEntityRecognition")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("pii_entity_recognition")
    .setErrorCol("error")
)

df_results = pii_entity_recognition.transform(df)
display(df_results.select("text","pii_entity_recognition.documents.entities"))

text,entities
"Call our office at 312-555-1234, or send an email to support@contoso.com","List(List(PhoneNumber, 0.8, 12, 19, null, 312-555-1234), List(Email, 0.8, 19, 53, null, support@contoso.com))"
"Dr. Smith has a very modern medical office, and she has great staff.","List(List(Person, 0.93, 5, 4, null, Smith))"


## SentimentAnalysis
[Sentiment analysis](https://learn.microsoft.com/en-us/azure/ai-services/language-service/sentiment-opinion-mining/overview) and opinion mining are features offered by the Language service, a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. These features help you find out what people think of your brand or topic by mining text for clues about positive or negative sentiment, and can associate them with specific aspects of the text.

[Use this article to learn which natural languages are supported](https://learn.microsoft.com/en-us/azure/ai-services/language-service/sentiment-opinion-mining/language-support)

In [0]:
df = spark.createDataFrame(
    data=[
        ["The food and service were unacceptable. The concierge was nice, however."],
        ["It taste great."]
    ],
    schema=["text"],
)

sentiment_analysis = (
    AnalyzeText()
    .setKind("SentimentAnalysis")
    .setLocation(ai_service_location)
    .setSubscriptionKey(ai_service_key)
    .setTextCol("text")
    .setOutputCol("sentiment_analysis")
    .setErrorCol("error")
)

df_results = sentiment_analysis.transform(df)
display(df_results.select("text","sentiment_analysis.documents.sentiment"))

text,sentiment
"The food and service were unacceptable. The concierge was nice, however.",mixed
It taste great.,positive
