# Introduction to cognitive services on azure 

use state of the art cognitive services models...

(...with some mouse clicks using Azure, Google Colabatory and a lot of code snippets).


## Setup Colabratory Environment:
- installation of Azure CLI & python modules
- import python modules
- login to azure


In [0]:
#@title Install Azure CLI & python modules
!curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
!pip install ffmpeg-python
!pip install wavio
!pip install pysoundfile

In [0]:
#@title Import Python Packages
from IPython.display import display, Javascript, HTML, Audio, Markdown, clear_output
from google.colab.output import eval_js
from base64 import b64decode
import http.client, urllib.request, urllib.parse, urllib.error, base64
import numpy as np
from scipy.io.wavfile import read as wav_read
from scipy.io.wavfile import write as wav_write
import io
import ffmpeg
import wavio
from PIL import Image, ImageDraw
from six.moves import urllib
import json
import os, requests
import requests
from io import BytesIO
import time
from xml.etree import ElementTree
import soundfile

In [0]:
#@title Login to Azure
!az login

## Setup Azure Cognitive Services Resources

Install cognitive services on azure:
- ComputerVision
- Face
- LUIS
- TextAnalytics
- SpeechServices 


In [0]:
#@title Install cognitive services on azure via bash script
%%bash
resourceGroup=MyGroup
# create resource group
az group create -l westeurope -n $resourceGroup

# create array with all the resources
# If you with to add more: see all valid kinds with `az cognitiveservices account list-kinds` 
resources=( 
"ComputerVision"
"Face"
"LUIS"
"TextAnalytics"
"SpeechServices")

# loop over resources and create them
for i in "${resources[@]}"; do
    echo Creating Resource for $i
    echo $(az cognitiveservices account create --name $i --resource-group $resourceGroup \
           --kind $i --sku F0 --location WestEurope --yes)
done

# creating file where to save the results
rm keys.py 
touch keys.py

echo "subscriptions = {" > keys.py
# for every resource find the key and append it to keys.py
for i in "${resources[@]}"; do
    echo "Retrieving Key for Resource $i "
    echo -e "\t\"$i\"": $(az cognitiveservices account keys list --name $i \
         --resource-group $resourceGroup | grep key1 | cut -d' ' -f 4) >> keys.py
done && echo "}" >> keys.py && echo "print(subscriptions)" >> keys.py

In [0]:
#@title Import keys into jupyter notebook
from keys import subscriptions

## Deploy Utilities 

### Description Utilities
- `get_headers_body` : requests to azure cognitive services are defined with a headers and a body
- `send_request` :  to establish a connection, send our request and retrieve the response from the servers


#### **`get_headers_body`**

requests to azure cognitive services are defined with a `headers` and a `body`. 

We introduce a function `get_headers_body` and define the variables dynamically. The `headers` contains the authorization key for a resource and the type of data we will send. In the body, we will specify the data we want to send.


In detail:
- The `headers` of most of our requests is a dictionary  with two keys:
  - `Content-Type`: specifies the type of  data to be sent (`"application/json"`, `"application/octet-stream"` etc.)
  
  - `Ocp-Apim-Subscription-Key`: specifies our **authorization key for the specific resource we will use**.
  

- The `body` depends on the `Content-Type` of the headers and is where we "put" or "point" the data to be sent.
  - For  ``'application/json'`` we send a dictionary, for example:
```python
body = {"url" : "<http://www.with_some_image.jpg>"}
```
 
  - For  `"application/octet-stream"`, the body is made up of binary data. We use this to send **local files** to the cloud. For example:
```python
body =  open('<filepath>',  'rb')


Example - local file:

```python
headers, body = get_headers_body(API='ComputerVision', data='path_to_img', localfile=True)

```
Our headers and body are now defined as:
```python
headers = {"Content-Type": "application/octet-stream",
                    "Ocp-Apim-Subscription-Key": subscriptions['ComputerVision']}
                    
body = open('path_to_img', 'rb')
```


#### **`send_request`**
We introduce a second function `send_request`, to establish a connection, send our request and retrieve the response from the servers.

We use this to send our headers and body to an specific API endpoint and retrieve the full response from the server.

### Deploy Utilities

In [0]:
#@title Deploy utilities to notebook
def get_headers_body(API, data, localfile=False):
    '''Returns headers and body for the respective API using your resource key.
    API [str]: refers to the resource name as in `subscriptions`
    localfile [bool]: wether data is local or remote
    data [str]:  if localfile==True use local path of the data. Else use an URL'''
    
    content_Types = ['application/json', 'application/octet-stream']
 
    # Defines Body
    if localfile == False:
      body = "{"+'url: "{}"'.format(data)+"}"
    else:
      body = open(data, 'rb')
    
    # Defines Headers
    headers = { 'Content-Type': content_Types[localfile],
              'Ocp-Apim-Subscription-Key': subscriptions[API]}

    return headers, body

def send_request(endpoint, headers, body, params, location='westeurope'):
  '''Sends headers, body and parameters to a given API and endpoint'''
  try:
    conn = http.client.HTTPSConnection(f'{location}.api.cognitive.microsoft.com')
    conn.request("POST", f"{endpoint}" + "?%s" % params, body, headers)
    response = conn.getresponse()
    data = response.read()
    r = json.loads(data.decode())
    conn.close()
    return r 
  except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))
    
      

## Deploy Actuators 


### Description Actuators 

- `take_picture`: accepts a filepath to save the picture.
- `record_audio`: accepts a filepath to record the audio file. 

### Deploy Acuators

In [0]:
def take_picture(filename='photo.jpg'):
  '''Takes picture and saves it as <filename>'''
  quality=0.8
  js = Javascript('''
    async function takePhoto(quality) {
      const div = document.createElement('div');
      const capture = document.createElement('button');
      capture.textContent = 'Capture';
      div.appendChild(capture);

      const video = document.createElement('video');
      video.style.display = 'block';
      const stream = await navigator.mediaDevices.getUserMedia({video: true});

      document.body.appendChild(div);
      div.appendChild(video);
      video.srcObject = stream;
      await video.play();

      // Resize the output to fit the video element.
      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);

      // Wait for Capture to be clicked.
      await new Promise((resolve) => capture.onclick = resolve);

      const canvas = document.createElement('canvas');
      canvas.width = video.videoWidth;
      canvas.height = video.videoHeight;
      canvas.getContext('2d').drawImage(video, 0, 0);
      stream.getVideoTracks()[0].stop();
      div.remove();
      return canvas.toDataURL('image/jpeg', quality);
      }
    ''')
  display(js)
  data = eval_js('takePhoto({})'.format(quality))
  binary = b64decode(data.split(',')[1])
  with open(filename, 'wb') as f:
    f.write(binary)
  return filename

def record_audio(message, audiofile = 'audio.wav'):
  '''Records audio and saves it as <audiofile>'''
  
  AUDIO_HTML = """
<script>
var my_div = document.createElement("DIV");
var my_p = document.createElement("P");
var my_btn = document.createElement("BUTTON");
var t = document.createTextNode("Press to start recording");
my_btn.appendChild(t);
//my_p.appendChild(my_btn);
my_div.appendChild(my_btn);
document.body.appendChild(my_div);

var base64data = 0;
var reader;
var recorder, gumStream;
var recordButton = my_btn;

var handleSuccess = function(stream) {
  gumStream = stream;
  var options = {
    //bitsPerSecond: 16000, //chrome seems to ignore, always 48k
    //mimeType : 'audio/webm;codecs=opus' ! -> changed to PCM
    mimeType : 'audio/webm;codecs=pcm'
  };            
  //recorder = new MediaRecorder(stream, options);
  recorder = new MediaRecorder(stream);
  recorder.ondataavailable = function(e) {            
    var url = URL.createObjectURL(e.data);
    var preview = document.createElement('audio');
    preview.controls = true;
    preview.src = url;
    document.body.appendChild(preview);

    reader = new FileReader();
    reader.readAsDataURL(e.data); 
    reader.onloadend = function() {
      base64data = reader.result;
      //console.log("Inside FileReader:" + base64data);
    }
  };
  recorder.start();
  };

recordButton.innerText = "Recording... press to stop";
navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);
function toggleRecording() {
  if (recorder && recorder.state == "recording") {
      recorder.stop();
      gumStream.getAudioTracks()[0].stop();
      recordButton.innerText = "Saving the recording..."
  }
}

// https://stackoverflow.com/a/951057
function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

var data = new Promise(resolve=>{
//recordButton.addEventListener("click", toggleRecording);
recordButton.onclick = ()=>{
toggleRecording()

sleep(2000).then(() => {
  // wait 2000ms for the data to be available...
  // ideally this should use something like await...
  //console.log("Inside data:" + base64data)
  resolve(base64data.toString())
});
}
});  
</script>
"""
  if message != None:
    _ = input(f'{message}')
  else:
    pass
  display(HTML(AUDIO_HTML))
  data=eval_js('data')
  binary = b64decode(data.split(',')[1])
  
  process = (ffmpeg
    .input('pipe:0')
    .output('pipe:1', format='wav')
    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
  )
  output, err = process.communicate(input=binary)
  
  riff_chunk_size = len(output) - 8
  q = riff_chunk_size
  b = []
  for i in range(4):
      q, r = divmod(q, 256)
      b.append(r)

  riff = output[:4] + bytes(b) + output[8:]
  
  sr, audio = wav_read(io.BytesIO(riff))
  print('Audio recorded and saved as {}'.format(audiofile))
  _ = wavio.write(audiofile, audio, sr, sampwidth=3)
  # converting to the right format for Cogntive Services Speech API
  data, samplerate = soundfile.read(audiofile)
  soundfile.write(audiofile, data, samplerate)
  return audiofile


## Deploy Effectors 


### Description Effectors 

correspondent effectors to reveal the data:

- `show_picture`: receives a filepath or URL with an image.
- `play_audio`: receives a filepath with the audio file and plays it out.
- `draw_boxes`: draw bounding boxes on pictures

###Deploy Effectors 

In [0]:

def show_picture(img='photo.jpg', localfile=False):
  '''Displays the picture in <filename>'''
  if localfile:
    display(Image.open(img))
  else:
    response = requests.get(img)
    img = Image.open(BytesIO(response.content))
    display(img)

from IPython.display import Audio
def play_audio(audiofile='audio.wav', autoplay=False):
  return Audio(audiofile, autoplay=autoplay)


def draw_show_boxes(img, regions, boxes, localfile=False):
    if not localfile: # we need to download the image
      r = requests.get(img, allow_redirects=True)
      img = 'temp.jpg'
      open(img, 'wb').write(r.content)
    # Drawing boxes
    im = Image.open(img)   
    d = ImageDraw.Draw(im)
    for i, label in zip(boxes, regions):
    # `i` will need to represent the x-coordinate of the left edge, 
    # the y-coordinate of the top edge, width, and height of the bounding box
      x, y, width, height = i
      top, top_right = (x, y), (x+width, y)
      bottom, bottom_right = (x, y+height), (x+width, y+height)
      line_color = (0, 0, 255)
      d.line([top, top_right, bottom_right, bottom, top], fill=line_color, width=2)
      d.text(top,str(label),(0,0,255))
    im.save('temp.jpg')
    return show_picture('temp.jpg', localfile=True)

# Cognitive Services Snippets


###Notes
For each API, we will define a couple of functions and see practical examples of how to use them. 

pattern:

- header, body and request parameters (params) are defined.
- The request is sent and a response retrieved
- The response is parsed to extract some useful information.



### Computer Vision API

> Extract rich information from images to categorize and process visual data

We will implement four functions that talk to certain API endpoints (specified in `send_request`) that will allow us to analyze images.

#### `describe`: 
* Returns one or more textual descriptions of an image. 

In [0]:
#@title define function _describe_
def describe(img, number_of_descriptions=1, localfile=False):
    '''Returns a number_of_descriptions from an image'''
    params = urllib.parse.urlencode({
    # Request parameters
    'maxCandidates': "{}".format(number_of_descriptions),
    'language': 'en'})
    
    # Defining body and headers
    headers, body = get_headers_body(API='ComputerVision', data=img, localfile=localfile)
    
    # Send request and retrieve response
    r = send_request('/vision/v2.0/describe', params=params, headers=headers, body=body)
    
    # Parse response to return the info of interest
    description = '\n' 
    for i in range(len(r['description']['captions'])):
      description += r['description']['captions'][i]['text'].capitalize()+'\n'
    return description

In [0]:
# showing image
img = 'https://images.unsplash.com/photo-1563207153-f403bf289096?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=80'
# showing and describing image
show_picture(img)

In [0]:
# describing image
description = describe(img, number_of_descriptions=3)
print(description)

#### `classify`

* Assigns a category to an image and tags it. 

In [0]:
#@title define function _classify_
def classify(img, localfile=False, visualFeatures='Description, Categories, Faces'):
    '''visualFeatures: a string of comma separated keywords 'Brands, Adult, Objects'
    Returns tags in an image
    '''
    params = urllib.parse.urlencode({
      # Request parameters
      'visualFeatures': '{}'.format(visualFeatures),
      'details': '',
      'language': 'en'})
    
    # Defining body and headers
    headers, body = get_headers_body(API='ComputerVision', data=img, localfile=localfile)
    # Send headers, body a params and retrieve respose
    r = send_request("/vision/v2.0/analyze", params=params, headers=headers, body=body)
    
    # Parse response to return the info of interest
    info = '\nCategories in Image: \n'
    for i in r['categories']:
      info+=("\t"+ i['name'] +'\n')
    info+='Tags found in the image \n\t {}'.format(r['description']['tags'])
    return info
    if hasattr(img, 'close'):
      img.close()
    

In [0]:
# showing picture
img='https://images.unsplash.com/photo-1553196798-b71feabce946?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=80'
show_picture(img)


In [0]:
# retrieving and printing info
print(classify(img))

#### `read`: 

* Perform OCR and retrieve text from images.
* Draws regions where text is located and displays the result.

In [0]:
#@title define function _read_
def read(img, localfile=False):
  '''Retrieves text from images using OCR'''
  global draw_show_boxes
  # Request parameters
  params = urllib.parse.urlencode({
      'language': 'unk',
      'detectOrientation': 'true'})
  
  # Defining body and headers
  headers, body = get_headers_body(API='ComputerVision', data=img, localfile=localfile)
  
  # Send headers, body a params and retrieve respose
  r = send_request('/vision/v2.0/ocr', params=params, headers=headers, body=body)
  
  # Parse response to return the info of interest: 
  boxes, texts = [], []# regions in the image
  for i in r['regions']:
    boxes.append([int(i) for i in i['boundingBox'].split(',')])
    text = ''
    for a in i['lines']:
      text+=(' '.join([w['text'] for w in (a['words'])])+'\n')
    texts.append(text)

  # draws rectangles surrounding the objects and info
  IDs = ['Region'+str(i) for i in range(len(boxes))]
  draw_show_boxes(img, IDs, boxes, localfile)
  return texts


In [0]:
# showing picture
img= 'https://www.oreilly.com/library/view/handbook-of-information/9780471648307/images/titlepage.jpg'
show_picture(img)

In [0]:
# retrieving and printing text
for line in read(img):
  print(line)

#### `see` 

* Performs object detection and retrieves recognized objects and bounding boxes.
* Draws and labels bounding boxes around the objects and displays result.

In [0]:
#@title define function _see_
def see(img, localfile=False):
    '''Object Detection, displays the labeled bounding boxes in the image'''    
    # Defining params, body and headers, send and retrieve response
    params = urllib.parse.urlencode({})
    headers, body = get_headers_body(API='ComputerVision', data=img, localfile=localfile)
    r = send_request('/vision/v2.0/detect', params=params, headers=headers, body=body)
    # manipulate response to extract useful information
    print(f"In the image of size {r['metadata']['width']} by {r['metadata']['height']} pixels, {len(r['objects'])} objects were detected")
                                                                               
    found_objects, object_boxes = [], []
    for i in r['objects']:
        found_objects.append(i['object'])
        print('{} detected at region {}'.format(i['object'], i['rectangle']))
        box = []
        for el in 'xywh':
          box.append(i['rectangle'][el])
        object_boxes.append(box)

    # In this case, the response returns the recognized objects and their boundaring boxes.
    # We can draw them in the image with the following function:
    draw_show_boxes(img, found_objects, object_boxes, localfile)
    return found_objects

In [0]:
see('https://images.unsplash.com/photo-1493857671505-72967e2e2760?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=800&q=80')

### Face API

> Detect faces

#### `detect_face`

* `detect_face`: detects faces and gives various information about them (including age, sex and sentiment). 

* `detect_objects` also returns boundaring boxes.

In [0]:
#@title define function _detect_face_
def detect_face(img, localfile=False):
    """Finds location and attributes of human faces in a picture (age, gender, emotion) and displays them"""
    # Define headers, body and params, send and retrieve response
    params = urllib.parse.urlencode({
        'returnFaceId': 'true',
        'returnFaceLandmarks': 'false',
        'returnFaceAttributes': 'age,gender,smile,facialHair,glasses,emotion,hair',
        'recognitionModel': 'recognition_01',
        'returnRecognitionModel': 'false'})
    headers, body = get_headers_body(API='Face', data=img, localfile=localfile)
    r = send_request('/face/v1.0/detect', params=params, headers=headers, body=body)
    # manipulate the response to extract information
    boxes,genders,ages,emotions = [], [], [], []
    for i in r:
      boxes.append([i['faceRectangle'][el] for el in ['left', 'top', 'width', 'height']]) 
      genders.append(' '+i['faceAttributes']['gender'].capitalize())
      ages.append(int(i['faceAttributes']['age']))
      emotions = []
      for i in r:
        e, maxx = None, 0
        for key, val in i['faceAttributes']['emotion'].items():
          if val > maxx:
            e, maxx = key, int(val*100)
        emotions.append(f'{maxx}% {e}')
    print(f'{len(boxes)} faces recognized')

    # Draws object and writes information _ _ _ _ _ _ _
    def draw_show_boxes(img, boxes, genders, ages, emotions, localfile=False):
      if not localfile: # We need to download the image
        r = requests.get(img, allow_redirects=True)
        img = 'faces.jpg'
        open(img, 'wb').write(r.content)
      im = Image.open(img)   
      # Drawing boxes
      d = ImageDraw.Draw(im)
      for i, gender, age, emotion in zip(boxes, genders, ages, emotions):
        # represent the x-coordinate of the left edge, the y-coordinate of the top edge, width, and height of the bounding box
        x, y, width, height = i
        top, top_right = (x, y), (x+width, y)
        bottom, bottom_right = (x, y+height), (x+width, y+height)
        line_color = (0, 0, 255)
        d.line([top, top_right, bottom_right, bottom, top], fill=line_color, width=2)
        d.text(top,f'{gender} {age}',(0,0,255))
        d.text(bottom, emotion,(0,0,255))
      im.save(img)
      return show_picture(img, localfile=True)

    # draw boxes and writes info
    return draw_show_boxes(img, boxes, genders, ages, emotions, localfile)

In [0]:

r = detect_face('http://secureservercdn.net/198.71.233.19/3ec.e4c.myftpupload.com/wp-content/uploads/2014/04/Larremore-3.jpg')


### realworld example: Capture and Send a Picture

take a picture and analyse it with azure cognitive services. 

**_hint_**: Inside all of our functions, we can specify the argument `localfile=True` to allow us to send local files as binary images.

In [0]:
# take a picture and save it as photo.jpg
img = take_picture('photo.jpg')

In [0]:
# describe photo.jpg with 5 sentences
print(describe(img, localfile=True, number_of_descriptions=5))

In [0]:
# classify photo.jpg
print(classify(img, localfile=True))

In [0]:
# read and display any text 
print(read(img, localfile=True))

In [0]:
# see the objects recognized 
see(img, localfile=True)

In [0]:
# detect and display information about faces in the picture
detect_face(img, localfile=True)




### Text Analytics API

> Used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction and language detection


**NOTE**: This API requires us to define a different `body` with a list of dictionaries. Each one of them with the `text` to be analyzed and an `ID`. In our functions, we will construct this body using `*args` which will allow us to send multiple texts, all in one request.

#### `detect_language` 
* Returns the language detected for each string in a list of strings. 


In [0]:
#@title define function _detect_language_
def detect_language(*documents):
    '''Returns language detected for each string in documents (*args)'''
    
    params = urllib.parse.urlencode({}) 
    headers = {
        'Content-Type': 'application/json',
        'Ocp-Apim-Subscription-Key': subscriptions['TextAnalytics']}
    
    # redefine body for proper text format
    body={"documents": []}
    ID = 0
    for document in documents:
       doc = {
            # assign unique idea
            "id": ID,
            "text": "'{}'".format(document)
        }
       body['documents'].append(doc)
       ID+= 1
    body = json.dumps(body)
    
    r = send_request('/text/analytics/v2.0/languages', params=params, headers=headers, body=body)
    return  [i['detectedLanguages'][0]['name'] for i in  r['documents']]
    

In [0]:
# because of *args, our function can take a variable number of strings
detect_language('Was soll das denn', 'No tengo ni idea', "Don't look at me", 'ごめんなさい', 'Sacré bleu!')

#### `key_phrases`
* Returns the keys found in the given strings `documents`. *A list of strings denoting the key talking points in the input text*. 

In [0]:
#@title define function _key_phrases_
def key_phrases(*documents):
    # Define the parameters
    params = urllib.parse.urlencode({})
    headers = {
        'Content-Type': 'application/json',
        'Ocp-Apim-Subscription-Key': subscriptions['TextAnalytics']}
    body={"documents": []}
    ID = 0
    for document in documents:
       doc = {
            # assign unique idea
            "id": ID,
            "text": "'{}'".format(document)
        }
       body['documents'].append(doc)
       ID+= 1
    body = json.dumps(body)
    r = send_request('/text/analytics/v2.0/keyPhrases', headers, body, params)
    # _ _ _ _ _ _ _ 
    results = []
    count = 0
    for document in r['documents']:
        doc = []
        for phrase in document['keyPhrases']:
          doc.append(f"{phrase}")
        count += 1
        results.append(doc)
    return results

In [0]:
for i in key_phrases('I just spoke with the supreme leader of the galactic federation', 'I was dismissed', 'But I managed to steal the key', 'It was in his coat'):
  print(i)

#### `check_sentiment`

* Assigns a positive, negative or neutral category to the strings given.

In [0]:
#@title define function _check_sentiment_
def check_sentiment(*documents):
     # Define the parameters
    headers = {
        'Content-Type': 'application/json',
        'Ocp-Apim-Subscription-Key': subscriptions['TextAnalytics']}
    params = urllib.parse.urlencode({})
    ID = 0
    body={"documents": []}
    for document in documents:
       doc = {
            # assign unique idea
            "id": ID,
            "text": "'{}'".format(document)
        }
       body['documents'].append(doc)
       ID+=1
    body = json.dumps(body)
    r = send_request('/text/analytics/v2.0/sentiment', headers, body, params)
    # _ _ _ _ _ _ _ 
    sentiments = []
    for document in r['documents']:
      sentiment = "negative"
      # if it's more than 0.5, consider the sentiment to be positive.
      if document["score"] >= 0.5:
        sentiment = "positive"
      elif document['score'] == 0.5:
        sentiment = 'neutral'
      sentiments.append(sentiment)
    return sentiments

In [0]:
print(check_sentiment('Not bad', "Not good", 'Good to know', 'Not bad to know', 
                      "I didn't eat the hot dog", 'Kill all the aliens'))

#### `find_entities`

* Returns a list of entities, assigned to a category. If possible, also a wikipedia link.


In [0]:
#@title define function _find_entities_
def find_entities(*documents):  
  """Returns a list of entities recognized in each document of `documents`"""
  headers = {
        'Content-Type': 'application/json',
        'Ocp-Apim-Subscription-Key': subscriptions['TextAnalytics']}
  params = urllib.parse.urlencode({})
  ID = 0
  body={"documents": []}
  for document in documents:
      doc = {
          "id": ID,
          "text": "'{}'".format(document)
      }
      body['documents'].append(doc)
      ID+=1
  body = json.dumps(body)
  r = send_request('/text/analytics/v2.1/entities', headers, body, params)
  # _ _ _ _ _ _ _ 
  results = []
  for doc in r['documents']:
    names, types, links = [], [], []
    for e in doc['entities']:
      names.append(f"{e['name']}, ")
      types.append(f"Type: {e['type']}, ")
      try:
        # if wikipedia score above a minimum a treshold append it:
        if e['matches'][0]['wikipediaScore'] >= 0.10:
          links.append(f"Wiki [Link]({e['wikipediaUrl']})")
      except:
        links.append(None)
    docs = []
    for n, t, l in zip(names, types, links):
      if l == None:
        doc_info= n + t
      else:
        doc_info = n + t + l
      docs.append(doc_info)
    results.append(docs)
  return results

In [0]:
find_entities('I attended the lecture of Richard Feynmann at Cornell ')

### realworld example: OCR + Text Analytics
report function that extracts the individual text regions, analyzes them and makes up a report with our results:
 



In [0]:
#@title define function _report_
def report(img, localfile=False):
  """Recognizes and displays regions in the image with text.
  Extracts text and analyzes its language, sentiment, key phrases and entities
  Displays nicely in Markdown"""
  # list of texts in all regions recognized with OCR
  regions = read(img, localfile)
  # list of languages for all regions
  langs = detect_language(*regions)
  # list of sentiments for all regions
  sentiments = check_sentiment(*regions)
  # list of entitites for all regions
  entities = find_entities(*regions)
  # list of key phrases for all regions
  keys = key_phrases(*regions)

  # looping over lists to print results. 
  m = ''
  m +='# Report \n'
  count = 0
  for r, l, s, e, k in zip(regions,langs, sentiments, entities, keys):
    m += f'## Region {count}\n'
    m +='> "'+r[0:82].rstrip()+'...'+'"\n\n'
    m+= f'- Language: {l}\n'
    m += f'- Sentiment: {s}\n'
    m += '- Entities:\n'
    for entity in e:
      m += f' - {entity}\n'
    m += '- Keys:\n'
    for key in k:
      m += f' - {key}\n'
    m += "\n"
    count += 1
  #return display(Markdown(m)) -> for Jupyter
  return print(m)

In [0]:
report('https://i.pinimg.com/originals/5d/0c/90/5d0c90add4024dae1020e4e7fb545f7e.jpg')

##  Speech Services API

> To convert audio to text and text-to-speech

We will use it to transform voice to text and viceversa. This will allow us to hear and talk to our notebook. But before we can use Speech Services, we need to get a token (a secret string) that is valid for 10 minutes. For this we will use the function `get_token` as seen below:

In [0]:
#@title define function _get_token_
def get_token(subscription_key):
    '''Retrieves a token for Speech Services'''
    global last_time, last_token
    this_time = time.time()
    fetch_token_url = 'https://westeurope.api.cognitive.microsoft.com/sts/v1.0/issueToken'
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key
    }
    try:
      # if more than 9.5 minutes have passed retrieve and return a new token
      if (this_time - last_time)/60 >= 9.5:
        response = requests.post(fetch_token_url, headers=headers)
        new_token = str(response.text)
        last_time = time.time()
        last_token = new_token
        print('9.5 minutes have passed, a new token has been retrieved')
        return new_token
      else:
        #print('9.5 minutes have not passed, using last available token')
        pass
    except: # `last_time` does not exist, retrieve first token
      last_time = time.time()
      response = requests.post(fetch_token_url, headers=headers)
      last_token = str(response.text)
      # first token requested in session
      print('First token retrieved')
      return last_token
    # last available token
    return last_token

**Note**: token is valid for $10$ minutes

### `speech_to_text`

* To transform speech to text in [various languages](https://docs.microsoft.com/en-gb/azure/cognitive-services/speech-service/language-support#speech-to-text).


In [0]:
#@title define function _speech_to_text_
def speech_to_text(path_to_wav, language='en-US'):
    '''Input am audio .wav file, returns text with words recognized in the given language
    See docs for valid arguments as language: https://docs.microsoft.com/en-gb/azure/cognitive-services/speech-service/language-support#speech-to-text'''
    # opening audio file
    with open("{}".format(path_to_wav), mode="rb") as audio_file:
        audio_data =  audio_file.read()
    
    speechToTextEndPoint = "https://westeurope.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1"
    headers = {"Content-type": "audio/wav; codec=audio/pcm; samplerate=16000", 
            "Authorization": "Bearer " + get_token(subscriptions['SpeechServices']),
            "Expect":"100-continue"}
    params = {"language":language}
    body = audio_data

    # Connect to server, post the request, and get the result
    response = requests.post(speechToTextEndPoint,data=body, params=params, headers=headers)
    result = str(response.text)
    return json.loads(result)['DisplayText']

### `text_to_speech`

* Now let's give voice to our notebook with the function `text_to_speech`.

In [0]:
#@title define function _text_to_speech_
def text_to_speech(text, path_to_wav='temp.wav'):
  headers = {
      "X-Microsoft-OutputFormat": 'riff-24khz-16bit-mono-pcm',
      'Content-Type':'application/ssml+xml',
      'Authorization': "Bearer " + get_token(subscriptions['SpeechServices']),
      "User-Agent": "SmartColabNotebook"
  }
  endpoint = 'https://westeurope.tts.speech.microsoft.com/cognitiveservices/v1'
  xml_body = ElementTree.Element('speak', version='1.0')
  xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
  voice = ElementTree.SubElement(xml_body, 'voice')
  voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
  voice.set('name', 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)')
  voice.text = text
  body = ElementTree.tostring(xml_body)

  r = requests.post(endpoint, headers=headers, data=body)
  if r.status_code == 200:
    with open('temp.wav', 'wb') as audio:
      audio.write(r.content)
      #print("\nStatus code: " + str(r.status_code) + "\nVoice is ready for playback.\n")
      return path_to_wav
  return "Voice could not be made"

Because `speech_to_text` receives an audio file and returns words and `text_to_speech` recieeves words and returns audio file, we can do something like this:

In [0]:
voice = text_to_speech("Hi, I'm your virtual assistant")
play_audio(voice)

In [0]:
speech_to_text(voice)

record voice with `record_audio`, transform it into words with `speech_to_text`, do something with the words and speak out loud the results.

In [0]:
def motivational_bot():
  '''Checks for feeling in your speech and says something'''
  voice = record_audio('Press ENTER to record how you are feeling', 'voice.wav')
  clear_output()
  text = speech_to_text(voice)
  sentiment = check_sentiment(text)[0]
  if sentiment == 'negative':
    voice = text_to_speech("Call 911, you are in deep trouble")
  else:
    voice = text_to_speech("Nice to see you so positive today! Keep it up!")
  return play_audio(voice, autoplay=True)

start the flow:

In [0]:
motivational_bot()

Instead of checking for feelings in the words, we could translate them to a bunch of different languages (see the [Text Translator API](https://azure.microsoft.com/en-us/services/cognitive-services/translator-text-api/) , look up for something on the web (see [Bing Search](https://dev.cognitive.microsoft.com/docs/services/f40197291cd14401b93a478716e818bf/operations/56b4447dcf5ff8098cef380d))

## Language Understanding
> A machine learning-based service to build natural language understanding into apps, bots and IoT devices.

Let's make our notebook more "intelligent" by giving it the ability to understand certain intentions in the language using the LUIS API. In short words, we will train a linguistic model that recognizes certain intentions in the language.

For example, let's say we have the intent to `take_picture`. After training our model, if our notebook 'hears' sentences of the like of:
- take a photo
- use the camera and take a screenshot
- take a pic

It will know that our intention is to `take_picture`. We call these phrases **utterances**. And are what we need to provide to teach the language model how to recognize our **intents** (the tasks or actions we want to perform). 


By using varied and nonredundant utterances, as well as adding additional linguistic components such as entities and roles, you can create flexible and robust models tailored to your needs. Well implemented language models (backed up by the proper software) are what allow answer engines to respond questions like "What is the weather in San Francisco?", "How many kilometers from Warsaw to Prag?", "How far is the Sun?" etc.

For this post though,  we will keep things simple and assign $5$ utterances to a handful of intents. As you might pressume, the intents will match some of the functions that we've already implemented. 


### Activate LUIS 

In contrast to all the services we've seen, LUIS is a complex tool that comes with its own "Portal", where you manage your LUIS apps and create, train, test and iteratively improve your models. But before we can  use it,we need to [activate a LUIS subscription](https://eu.luis.ai/). Once you've done this:

- Go the LUIS dashboard and retrieve the **Authoring Key** of your account. If you need help with this step, take a look at the post or [refer here](https://docs.microsoft.com/en-us/azure/cognitive-services/luis/luis-concept-keys) for more detailed information.

In [0]:
# paste your Authoring Key here (fake example)
authoring_key = 'keykeykeykey'

### Create a LUIS app

The LUIS Portal makes it very easy to create your LUIS models, but we'll be using the [LUIS programmatic API](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c2f) to set things up from within the notebook using the `authoring_key`.

- Let's start off by creating an app with the function [`create_luis_app`](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c2f):

In [0]:
def create_luis_app(app_name):
  headers = {
      # Request headers
      'Content-Type': 'application/json',
      'Ocp-Apim-Subscription-Key': authoring_key}

  luis_config = {"name": f"{app_name}",
          "culture": "en-us",
          "description": "Language Understanding for Notebook",
          "InitialVersionId": "1.0",
          }
  body = json.dumps(luis_config)
  params = urllib.parse.urlencode({
  })
  
  r = send_request("/luis/api/v2.0/apps", headers=headers, body=body, params=params)
  return r, luis_config

In [0]:
app_id, luis_config = create_luis_app('NoteBot')

In this implementation, we keep track of the app ID (returned by the server) and the parameters that we specified inside `app_id` and `luis_config`  as global variables for later use.



### Add intents and utterances
Let's now define a function to add intents and a function to add their respective utterances.

- `create_intent`: adds one intent to the luis app (specified by the variables `app_id` and `luis_config`).

- `add_utterances`: adds a batch of examples/utterances to an existing intent.

In [0]:
def create_intent(intent):
  headers = {
      # Request headers
      'Content-Type': 'application/json',
      'Ocp-Apim-Subscription-Key': authoring_key,
  }

  params = urllib.parse.urlencode({
  })

  body = {"name": intent}
  body = json.dumps(body)
  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("POST", f"/luis/api/v2.0/apps/{app_id}/versions/{luis_config['InitialVersionId']}/intents?%s" % params, body, headers)
      response = conn.getresponse()
      data = response.read()
      print(data)
      conn.close()
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [0]:
def add_utterances(intent, utterances):
  headers = {
      # Request headers
      'Content-Type': 'application/json',
      'Ocp-Apim-Subscription-Key': authoring_key,
  }

  params = urllib.parse.urlencode({
  })

  body = []
  for utterance in utterances:
    body.append({"text": utterance,
                 "intentName": intent})
  body = json.dumps(body)

  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("POST", f"/luis/api/v2.0/apps/{app_id}/versions/{luis_config['InitialVersionId']}/examples?%s" % params, body, headers)
      response = conn.getresponse()
      data = response.read()
      print(data)
      conn.close()
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))

With these functions, let's define our language model inside a dictionary as seen below and apply them to it. There is big room for experimentation at this stage.

In [0]:
# a lot of room for experiment here
intentions = {
    
'describe': ["Can you describe what you see",
            "Give a detailed account in words of the picture",
            "Give me some descriptions about the image",
             "Tell me something about the image"], # -> speak
              
'see': ["What objects can you detect",
        "What objects do you recognize",
        "What things can you percieve"
        "Perform object detection"],# -> Show and speak

'detect_faces': ["Do you see any humans",
                 "Are there any men in the picture"
                 "Show me the faces",
                 "Do you detect any faces in the picture"], # -> Show and Speak

"read": ['What can you read',
         "Extract text from a picture",
         "Can you read this out for me please"
         "Read what you are seeing"], # -> Speak
}

The keys of this dictionary will be the intents for our application. Let's loop over them and create them:

In [0]:
intents = intentions.keys()
for intent in intents:
  create_intent(intent)

Each intent has 4 examples/utterances, let's now add these to their respective intents.

In [0]:
for intent, utterances in intentions.items():
  add_utterances(intent=intent, utterances=utterances)

*italicized text*## Train the model

Let's now train the model with the information we've specified with [`train_luis_app`](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c45).

In [0]:
def train_luis_app(app_id, luis_config):
  headers = {
      # Request headers
      'Ocp-Apim-Subscription-Key': authoring_key,
  }
  params = urllib.parse.urlencode({
  })
  body = ""
  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("POST", f"/luis/api/v2.0/apps/{app_id}/versions/{luis_config['InitialVersionId']}/train?%s" % params, body, headers)
      response = conn.getresponse()
      data = response.read()
      print(data)
      conn.close()
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [0]:
train_luis_app(app_id, luis_config)

### Publish the Application

We are now ready to publish the application with [`publish_app`](https://westus.dev.cognitive.microsoft.com/docs/services/5890b47c39e2bb17b84a55ff/operations/5890b47c39e2bb052c5b9c3b).

In [0]:
def publish_app(app_id):
  headers = {
      # Request headers
      'Content-Type': 'application/json',
      'Ocp-Apim-Subscription-Key': authoring_key,
  }

  params = urllib.parse.urlencode({
  })

  body = {
      'versionId': luis_config['InitialVersionId'],
      'isStaging': False,
      'directVersionPublish': False
  }
  body = json.dumps(body)
  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("POST", f"/luis/api/v2.0/apps/{app_id}/publish?%s" % params, body, headers)
      response = conn.getresponse()
      data = response.read()
      print(data)
      conn.close()
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [0]:
# leave some time after training and
publish_app(app_id)

### Making a prediction

Let's see if our trained model is any useful  making predictions of our intents. Note that LUIS has a separate API to make predictions( the [LUIS endpoint API](https://westus.dev.cognitive.microsoft.com/docs/services/5819c76f40a6350ce09de1ac/operations/5819c77140a63516d81aee79)). Let's write the LUIS [prediction call](https://westus.dev.cognitive.microsoft.com/docs/services/5819c76f40a6350ce09de1ac/operations/5819c77140a63516d81aee78) as the function `understand`.

In [0]:
def understand(text, app_id=app_id):
  headers = {
      # Request headers
      'Content-Type': 'application/json',
      'Ocp-Apim-Subscription-Key': authoring_key,
  }

  params = urllib.parse.urlencode({
      # Request parameters
      'appId': app_id})

  body = json.dumps(text)
  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("POST", f"/luis/v2.0/apps/{app_id}?%s" % params, body, headers)
      response = conn.getresponse()
      data = response.read()
      data = json.loads(data)
      conn.close()
      try:
        return data['topScoringIntent']['intent']
      except:
        return None
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [0]:
understand('can you describe the picture')



our notebook can understand what our intentions are from plain language. But having to type the text ourselves doesn't seem so helpful. The notebook should hear what we say and understand the intention that we have. Let's address this writing a function `hear` that uses `understand` together with the functions `record_audio` and `speech_to_text`.

In [0]:
def hear():
  """Hears speech, transforms to text and predicts using the LUIS model specified by app_id"""
  # starts recording
  voice = record_audio(message=None, audiofile='voice.wav')
  clear_output()
  try:
    text = speech_to_text(voice)
  except:
    print('Could not understand')
    return 
  intent = understand(text)
  if intent != None:
    return intent

def speak(text):
    voice = text_to_speech(text)
    display(play_audio(voice, autoplay=True))

We can now call `hear` to speak into the mic, transfer our speech to words and predict the intent that we mean using our LUIS app.

In [0]:
# say something, the function will predict your intention
intent = hear()
# speak out loud the result
speak(intent)

### Using your creation

Let's write a function to trigger a set of actions based on the predicted/recognized intent of our LUIS model.
In a sentence: let's define a function to execute what happens when a certain intent is predicted. (I'm sure you can come up with **many** ways to experiment at this step).


In [0]:
def Notebot():
  '''Executes a set of actions based on what you say'''
  # greeting message
  speak('Hi there!')
  # sleep while sound is being played
  time.sleep(1)
  clear_output()
  
  def run():
    speak('What can I do for you?')
    time.sleep(2)
    # record voice, transform to text and predict the intention
    intent = hear()
    # carry out the predicted intent
    if intent == 'describe':
      speak("Alright, take a picture and I will describe it for you")
      img = take_picture('photo.jpg')
      clear_output()
      description = describe(img, localfile=True)
      print(description)
      speak('That looks like ' + description.strip())

    elif intent == 'see':
      speak("Alright, take a picture and I'll see what objects I can find")
      img = take_picture('photo.jpg')
      clear_output()
      # recognize objects 
      objects = see(img, localfile=True)
      # making up results into a sentence
      result = 'I can see '
      for i in range(len(objects)):
        if len(objects) > 1 and i == len(objects) - 1:
          result = result[:-2] + f' and a {objects[i]}.'
        else:
          result += f'a {objects[i]}, '
      # speak out loud the results
      if len(objects) == 0:
        result = "Sorry I couldn't recognize anything"
      speak(result)

    elif intent == 'detect_faces':
      speak("Alright, take a picture and I'll find any humans")
      img = take_picture('photo.jpg')
      clear_output()
      detect_face(img, localfile=True)

    elif intent == "read":
      speak("Alright, put a document in front of me and I'll read it for you")
      img = take_picture('photo.jpg')
      clear_output()
      text = read(img, localfile=True)
      result = ' '.join(text).replace('\n', ' ')
      if result == '':
        speak("Hmm I couldn't find any text to read")
      speak(result)

    # if no intent can be recognized, intent will be None
    else:
      speak("Sorry I could not understand you")
      time.sleep(2)
      clear_output()
      run()

  run()
  return "I hope that was useful"

To finalilze, let's summon the *Notebot* to fulfill our wishes. Depending on what you say, the `Notebot` can take a picture and:
- speak out loud a description
- display any detected objects
- display any detected faces.
- apply OCR and read out loud the results.

In [0]:
Notebot()



Let's sum up what happens when you call it. At the beginning you will hear a greeting message. After that the Notebot will apply the function `hear` and start recording anything you say, your speech (the percept) will be transcribed to words and sent to the LUIS application to predict the intention that you have. Based on this prediction a different set of actions will be executed. In case there is no clear intent recognized from your speech, the intent "None" will be predicted and the `Notebot` will call itself again.



Looked from above, the `Notebot` ends up acting as a simple reflex based agent, that simply finds a rule whose condition match the current situation and executes it. (In this case what the Notebot does if you say this or something else).  At this point you might like to upgrade your agent with additional concepts, e.g adding memory about what is perceived. But I'll leave that task for the diligent reader.


## Cleaning up



In [0]:
%%bash
resourceGroup=MyGroup
az group delete --name $resourceGroup --yes
rm -f keys.py

In [0]:
def delete_luis_app(app_id):
  headers = {
      # Request headers
      'Ocp-Apim-Subscription-Key': authoring_key,
  }

  params = urllib.parse.urlencode({
      # Request parameters
      'force': 'false',
  })

  try:
      conn = http.client.HTTPSConnection('westeurope.api.cognitive.microsoft.com')
      conn.request("DELETE", f"/luis/api/v2.0/apps/{app_id}?%s" % params, "", headers)
      response = conn.getresponse()
      data = response.read()
      print(data)
      conn.close()
  except Exception as e:
      print("[Errno {0}] {1}".format(e.errno, e.strerror))


In [0]:
delete_luis_app(app_id)