# Microsoft Cognitive Services: Text Analysis
<a name="HOLTop"></a>



Refer to the [API definitions](//go.microsoft.com/fwlink/?LinkID=759346) for technical documentation for the APIs.

## Prerequisites

You must have a [Cognitive Services API account](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account) with **Text Analytics API**. You can use the **free tier for 5,000 transactions/month** to complete this walkthrough.

You must also have the [endpoint and access key](../How-tos/text-analytics-how-to-access-key.md) that was generated for you during sign-up. 

To continue with this walkthrough, replace `subscription_key` with a valid subscription key that you obtained earlier.

In [None]:
subscription_key = "YOUR API KEY"
assert subscription_key

**THE ENDPOINT:**

In [None]:
text_analytics_base_url = "https://westcentralus.api.cognitive.microsoft.com/text/analytics/v2.0/"

<a name="Detect"></a>

## Detect languages

The Language Detection API detects the language of a text document, using the [Detect Language method](https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c7). The service endpoint of the language detection API for your region is available via the following URL:

In [None]:
language_api_url = text_analytics_base_url + "languages"
print(language_api_url)

The payload to the API consists of a list of `documents`, each of which in turn contains an `id` and a `text` attribute. The `text` attribute stores the text to be analyzed. 

Replace the `documents` dictionary with any other text for language detection. 

In [None]:
documents = { 'documents': [
    { 'id': '1', 'text': 'This is a document written in English.' },
    { 'id': '2', 'text': 'Este es un document escrito en Español.' },
    { 'id': '3', 'text': '这是一个用中文写的文件' },
    {'id': '4', 'text': "**************"}
]}

The next few lines of code call out to the language detection API using the `requests` library in Python to determine the language in the documents.

In [None]:
import requests
from pprint import pprint
headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(language_api_url, headers=headers, json=documents)
languages = response.json()
pprint(languages)

The following lines of code render the JSON data as an HTML table.

In [None]:
from IPython.display import HTML
table = []
for document in languages["documents"]:
    text  = next(filter(lambda d: d["id"] == document["id"], documents["documents"]))["text"]
    langs = ", ".join(["{0}({1})".format(lang["name"], lang["score"]) for lang in document["detectedLanguages"]])
    table.append("<tr><td>{0}</td><td>{1}</td>".format(text, langs))
HTML("<table><tr><th>Text</th><th>Detected languages(scores)</th></tr>{0}</table>".format("\n".join(table)))

<a name="SentimentAnalysis"></a>

## Sentiment Analysis

The Sentiment Analysis API detexts the sentiment of a set of text records, using the [Sentiment method](https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c9). The following example scores two documents, one in English and another in Spanish.

In [None]:
sentiment_api_url = text_analytics_base_url + "sentiment"
print(sentiment_api_url)

**Replace the text field with whatever you like!** I'm using quotes from Overheard at UC Berkeley.  As a side note, it might be cool to analyze the overall sentiment of the Facebook group!

In [None]:
documents = {'documents' : [
  {'id': '1', 'language': 'en', 'text': 'Yeah I lied, I always lie, I’m an ASUC Senator'},
  {'id': '2', 'language': 'en', 'text': 'In my opinion, Im a really handsome person'},  
  {'id': '3', 'language': 'es', 'text': 'Women should be respected, not just because theyre moms, wives, and sisters, not just because were born from one, but because theyre people.'},  
  {'id': '4', 'language': 'es', 'text': 'All roads lead to Taco Bell'}
]}

The sentiment API can now be used to analyze the documents for their sentiments.

In [None]:
headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(sentiment_api_url, headers=headers, json=documents)
sentiments = response.json()
pprint(sentiments)

The sentiment score for a document is between $0$ and $1$, with a higher score indicating a more positive sentiment.

<a name="KeyPhraseExtraction"></a>

## Extract key phrases

The Key Phrase Extraction API extracts key-phrases from a text document, using the [Key Phrases method](https://westus.dev.cognitive.microsoft.com/docs/services/TextAnalytics.V2.0/operations/56f30ceeeda5650db055a3c6). This section of the walkthrough extracts key phrases for both English and Spanish documents.

**YOUR ENDPOINT:**

In [None]:
key_phrase_api_url = text_analytics_base_url + "keyPhrases"
print(key_phrase_api_url)

Here is all of the text you have stored in documents:

In [None]:
pprint(documents)

In [None]:
headers   = {"Ocp-Apim-Subscription-Key": subscription_key}
response  = requests.post(key_phrase_api_url, headers=headers, json=documents)
key_phrases = response.json()
pprint(key_phrases)

The JSON object can once again be rendered as an HTML table using the following lines of code:

In [None]:
from IPython.display import HTML
table = []
for document in key_phrases["documents"]:
    text    = next(filter(lambda d: d["id"] == document["id"], documents["documents"]))["text"]    
    phrases = ",".join(document["keyPhrases"])
    table.append("<tr><td>{0}</td><td>{1}</td>".format(text, phrases))
HTML("<table><tr><th>Text</th><th>Key phrases</th></tr>{0}</table>".format("\n".join(table)))