# **Microsoft Azure Cognitive Services study**

The aim of this notebook is to carry out an study on the impact that beauty filters might have on ITAs, more specifically, on the one developed by Microsoft Azure.

The prerequisites for this study are:
- Installing the Computer Vision Software Development Kit
- Installing the Python Imaging Library (PIL)
- Creating a folder called "images" in the same route as this notebook and add some images to this folder. In this case, the loaded images consist of the faces of people of different races, before and after a beauty filter has been applied.

## **Prerequisites and imports**

In [None]:
#Install the Computer Vision SDK
!pip install --upgrade azure-cognitiveservices-vision-computervision

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting azure-cognitiveservices-vision-computervision
  Downloading azure_cognitiveservices_vision_computervision-0.9.0-py2.py3-none-any.whl (39 kB)
Collecting azure-common~=1.1
  Downloading azure_common-1.1.28-py2.py3-none-any.whl (14 kB)
Collecting msrest>=0.5.0
  Downloading msrest-0.7.1-py3-none-any.whl (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.4/85.4 KB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting isodate>=0.6.0
  Downloading isodate-0.6.1-py2.py3-none-any.whl (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.7/41.7 KB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Collecting azure-core>=1.24.0
  Downloading azure_core-1.26.2-py3-none-any.whl (173 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.8/173.8 KB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: azure-common, iso

In [None]:
#Install the Python Imaging Library
!pip install pillow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes, Details
from msrest.authentication import CognitiveServicesCredentials


from array import array
import os
from PIL import Image
import sys
import time
import numpy as np
import json
import pandas as pd
from pandas import json_normalize

## **Authentication**
Use credentials to authenticate and create a client.
To do so, enter own key from Azure subscription.

In [None]:
'''
Authenticate
Authenticates your credentials and creates a client.
'''
subscription_key = "enter_subscription_key_here"
endpoint = "https://imagetagging2023.cognitiveservices.azure.com/"

computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))
'''
END - Authenticate
'''

'\nEND - Authenticate\n'

## **Load local images**
To carry out the study, load all the selected images from the races folders, separated by race and into original and beautified ones.

The names of the race folders are the same for both the beautified and the original cases.

In [None]:
#Filename of the current notebook
__file__ = 'azure_tagging.ipynb'

#Route of the beautified images
filter_beauty = "images/beauty"
beauty_folder = os.path.join(os.path.dirname(os.path.abspath(__file__)), filter_beauty)

#Route of the original images
filter_original = "images/original"
original_folder = os.path.join(os.path.dirname(os.path.abspath(__file__)),filter_original)

#Folder names for the different races
race_folders = os.listdir(beauty_folder)

## **Tag an Image**
The aim of this section is to **obtain and store** a **set of tags** for each **loaded image**, depending on wheter it is beautified or not.

For every race, the corresponding images are loaded one by one and the API is called to obtain the set of tags that are assigned to each of them. The sets of tags are stored in a list called tag_results, in the same order in which they have been obtained.

An important point here is that no more than 20 calls can be generated to the API per minute. To handle this, the number of succesive calls that have been done in the last minute are controlled and, in case they reach 20, we wait one minute until we start calling the API again.

This process is repeated for both the folder with the original images and the one with the beautified images.

In [None]:
'''
Tag an Image - local
This example returns a tag (key word) for each thing in the image.
'''
tag_results = [ ]


n_calls = 0   #nº of consecutive calls to API

for race in range(len(race_folders)-1):
  # Open each race folder and list local image files
  folder_name = os.path.join(beauty_folder, race_folders[race+1])
  img_files = os.listdir(folder_name)
  
  n_imgs = len(img_files) #nº images in the folder
  
  for img in range(n_imgs):
    #Open images one by one
    local_image_path = os.path.join (folder_name, img_files[img])
    local_image = open(local_image_path, "rb")
    
    #Control nº of calls/min
    if n_calls >= 20 :
      n_calls = 0
      print("===== Wait for 1 min =====")
      time.sleep(60) #wait for 1 minute
      
    n_calls += 1

    # Call API
    print("===== Tag an Image - local =====")
    tags_result_local = computervision_client.tag_image_in_stream(local_image)
    
    #Store tags set
    tag_results.append(tags_result_local.tags)
'''
END - Tag an Image - local
'''

## **Save the results in a dictionary**
Once the tags have been obtained for each single image, the next thing to do is to store them in the form of a dictionary. This dictionary has the following keys: '#ID' and 'tags'. 

The values stored for each key are the different images that have been used to call the API and the obtained set of tags for each of these images, in the form: *tag_name: confidence*, all of them in a matching order.

This process need to be repeated both for the original images folder and for the beautified images folder, obtaining two different dictionaries.

In [None]:
#Create dictionary
tags_dict_all = {'#ID':[ ],
              'tags':[ ]}
              
for race in range(len(race_folders)-1):
  # Open each race folder and list local image files
  folder_name = os.path.join(beauty_folder, race_folders[race+1])
  img_files = os.listdir(folder_name)
  
  n_imgs = len(img_files)
  
  for img in range(n_imgs):
    #Open & store images one by one
    tags_dict_all['#ID'].append(img_files[img])
    tags_dict_all['tags'].append({})
    #Save tags for each image (tag:confidence)
    for tag in tag_results[(race*n_imgs)+img]:
      tags_dict_all['tags'][(race*n_imgs)+img][tag.name] = tag.confidence*100


## **Write the tags dictionary to a JSON file**
To avoid losing the obtained results, the created dictionary is saved in a JSON file.

This file is also useful to work with dataframes in order to analyse the tags that are obtained.

This process is also repeated for both the original images folder and the beautified images folder, so two different JSON files are obtained at the end, with names "*beauty_tags.json*" and "*original_tags.json*".

In [None]:
# Write tags to a JSON file
with open("beauty_tags.json", "w") as outfile:
    json.dump(tags_dict_all, outfile)

NameError: ignored

## **Convert local JSON files into Pandas DataFrames**
We read the local JSON files previously generated via Pandas, using the *read_json()* method.

This method is used to extract the data from JSON files and store them as DataFrame.

Doing so, two dataframes are generated: one for the beautified images and the other one for the original ones.

These dataframes consist of two columns, the first one for all the analysed images and the second one for the corresponding set of tags obtained for each of these images.

In [None]:
#1st dataframe version

#Dataframe for the beautified images
df1beauty = pd.read_json('beauty_tags.json')

#Dataframe for the original images
df1original = pd.read_json('original_tags.json')

Unnamed: 0,#ID,tags
0,7733_01.png,"{'person': 98.46811294555664, 'human face': 98..."
1,1705_01.png,"{'human face': 99.54532384872437, 'person': 99..."
2,3547_01.png,"{'human face': 99.87502098083496, 'person': 99..."
3,396_01.png,"{'person': 99.15273785591125, 'human face': 98..."
4,10834_01.png,"{'human face': 99.50470328330994, 'forehead': ..."
...,...,...
296,7385_01.png,"{'human face': 98.80687594413757, 'person': 98..."
297,2600_01.png,"{'human face': 99.3657112121582, 'fashion acce..."
298,9348_01.png,"{'human face': 99.90034103393555, 'person': 99..."
299,8793_01.png,"{'person': 98.4387993812561, 'human face': 97...."


## Format DataFrame structure
In the previous DataFrames, it can be observed that the nested list of tags for each image is put up into a single column 'tags'. Here we are going to flatten the nested list of tags.

We now load data using Python json module and after that, json_normalize function is called with the argument *record_path* set to ['tags'], to flatten the nested list in tags.
To flatten this nested list, we use the Pandas json_normalize() function

Here, the JSON files are loaded using json.loads() function and then the JSON object is passed to json_normalize().

For the result to include the images filenames, they are collected throughout the folder (they are the same and in the same order for the beautified and the original ones) and stored in the *img_files* list.
Finally, this list is assigned to the Dataframes indices values.

In [None]:
#2nd dataframe version

#Dataframe for the BEAUTIFIED images
df2beauty = json.loads(open('beauty_tags.json').read())

#Flatten nested list in tags
df2beauty = pd.json_normalize(df2beauty, record_path =['tags'])

#Confidence=0 for tags not being assigned to specific images
df2beauty[df2beauty.isnull()] = 0

#Collect image filenames
img_index = []
for race in range(len(race_folders)-1):
  folder_name = os.path.join(beauty_folder, race_folders[race])
  img_files = os.listdir(folder_name)
  for img in img_files:
    img_index.append(img)

#Assign image filenames to DataFrame indices
df2beauty.index = img_index


#Dataframe for the ORIGINAL images
df2original = json.loads(open('original_tags.json').read())

#Flatten nested list in tags
df2original = pd.json_normalize(df2original, record_path =['tags'])

#Confidence=0 for tags not being assigned to specific images
df2original[df2original.isnull()] = 0

#Assign image filenames to DataFrame indices
df2original.index = img_index

Unnamed: 0,person,human face,eyebrow,lip,skin,cheek,smile,portrait photography,girl,forehead,...,mammal,blue,building,crown jewels,temple,screen,newborn,beanie,knit cap,earphone
5797_01.png,98.468113,98.408616,94.973540,94.833589,92.442918,89.450395,88.346857,87.934518,87.507850,87.333477,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
10396_01.png,99.086314,99.545324,97.391045,93.091023,94.030166,95.854425,0.000000,0.000000,0.000000,97.695827,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
6921_01.png,99.473584,99.875021,97.466552,86.560416,90.748370,89.743829,97.387266,0.000000,0.000000,96.050119,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
4251_01.png,99.152738,98.333335,94.530892,90.832853,94.658494,89.918983,0.000000,0.000000,71.804047,86.446774,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
7385_01.png,96.265638,99.504703,96.621042,89.517248,90.185213,92.662573,0.000000,0.000000,0.000000,97.395873,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3360_01.png,98.469973,98.806876,87.848037,92.030001,89.097011,86.121821,94.727314,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
9798_01.png,93.424058,99.365711,0.000000,93.042856,84.871948,90.977645,0.000000,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,68.870705
1730_01.png,99.026918,99.900341,0.000000,0.000000,0.000000,0.000000,98.833615,0.000000,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
8090_01.png,98.438799,97.949147,0.000000,90.487278,88.906950,89.659834,90.398860,0.000000,57.921273,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000


# **Some statistics of the Microsoft Azure ITA**
Hereunder, the analysis of the tags obtained with the Image Tagging Algorithm by Microsoft Azure is carried out.

## **Unique tags count**
The number of unique tags obtained in the case of original images and in the case of beautified images are counted, by measuring the number of columns each DataFrame has.

The number of unique tags assigned by the Azure service to original faces and their beautified versions are the following:
* **200 unique tags** are assigned to **original** faces.
* **173 unique tags** are assigned to **beautified** faces.

Note that the number of unique tags decreases after beautification.

In [None]:
#Collect tags assigned to original & beautified faces
original_tags = df2original.columns.values
beauty_tags = df2beauty.columns.values

#nº unique tags in original & beautified faces
N_original = len(original_tags)
N_beauty = len(beauty_tags)

## Lost and new tags count
Here, the number of original tags lost (only present in original case) and the number of new tags (only present after beautification) are counted.

For that, we go through the tags in the original set one by one and check if they are in the beautified tag set. If so, the repeated tags and its corresponding index are stored. Otherwise, the tag is stored in the *lost_tags* list.

The new tags are those tags in the beauty set that are not in the indices corresponding to the repeated tags that are stored previously.

Measuring the lenght of the *new_tags* and the *lost_tags* lists, the obtained numbers are the following:
* Among the original 200 tags, **47** (more than 23%) are **lost** (not present after beautification)
* On the other hand, **20** tags are **new** (only present after beautification

In [None]:
#Count repeated, lost and new tags
N_repeated = 0
repeated_tags = []
lost_tags = []

#Indices of the repeated tags
rep_index = []

#Go through original tags one by one
for t in range(len(original_tags)):
  #Check if original tag is in beauty set
  a = np.where(beauty_tags == original_tags[t])
  if (np.size(a) > 0):
    #Repeated tag: save tag & index
    rep_index.append(a[0][0])
    repeated_tags.append(original_tags[t])
  else:
    #Lost tag
    lost_tags.append(original_tags[t])

#New tags: beauty tags that aren't repeated
new_tags = np.delete(beauty_tags, rep_index)

#Repeated tags in original & beauty
N_repeated = len(repeated_tags)

#Original tags lost
N_lost = len(lost_tags)

#New beauty tags
N_new = len(new_tags)

## Attractiveness rating
Here, an study on how beauty filters are changing the algorithm's rating of attractiveness is carried out. In this sense, the number of faces that are considered attractive only after beautification and only before beautification are counted.

To do so, a set of tags to refer to attractiveness is defined among all the tags obtained with the Azure algorithm, namely "*child model, cool, gentleman, dating, posing, love, kiss, romance, makeover*", so we consider attractive the images that are assigned, at least, one of these tags.

Then, going through all the image filenames one by one (that are the same for the original face and its corresponding beautified version) it is checked for each image if any of the tags associated to attractiveness has been assigned to its original version or its beautified one. If so, the index of the corresponding image is stored to use it later.

Once these checks have been carried out, for each of the original and beautified images, the non-attractive faces are the images that are not in the stored indices corresponding to attractive faces, whereas attractive faces are those that correspond, in fact, to these indices.

To obtain the faces that are considered attractive only after beautification, the intersection among the images that are in the original non-attractive subset and in the beautified attractive subset is obtained.

On the other hand, to obtain the faces that are considered attractive only before beautification, the intersection among the images that are in the original attractive subset and in the beautified non-attractive subset is performed.

Measuring the lenght of the intersected lists, the resulting numbers are the following:
* The faces that are considered **attractive** by the algorithm **only after beautification** are **16**, which is **5,32%** of images.
* Conversely, the faces that are considered **attractive only before beautification** are **15**, which is **4,98%** of images.

In [None]:
#Set of tags referring to attractiveness
att_set = ['child model', 'cool', 'gentleman', 'dating', 'posing', 'love', 'kiss', 'romance', 'makeover'] 

att_beauty = []
att_original = []

#Check if attractiveness tags are in original & beautified sets
for tag in att_set:
  b = np.where(df2beauty.columns.values == tag) #ver si beauty contiene esa tag
  c = np.where(df2original.columns.values == tag) #ver si original contiene esa tag
  if np.size(b)>0:
    att_beauty.append(tag)
  if np.size(c)>0:
    att_original.append(tag)

#Attractiveness subset
beauty = df2beauty.loc[:, att_beauty]
original = df2original.loc[:, att_original]

beauty_index = []
original_index = []

for img in range(len(img_index)):
  #Check if any attractiveness tag is assigned
  d = np.where(beauty.loc[img_index[img],:]>0)
  e = np.where(original.loc[img_index[img],:]>0)
  #Collect indexes of attractive images
  if (np.size(d)>0):
    beauty_index.append(img)
  if (np.size(e)>0):
    original_index.append(img)

#Non attractive: images that aren't in attractive indices
non_attractive_original = np.delete(img_index, original_index)
non_attractive_beauty = np.delete(img_index, beauty_index)

#Attractive: images that are in attractive indices
attractive_original = [img_index[i] for i in original_index]
attractive_beauty = [img_index[i] for i in beauty_index]


#Attractive only after beautification
only_beauty = np.intersect1d(non_attractive_original, attractive_beauty)
N_only_after= len(only_beauty)

#Attractive only before beautification
only_original = np.intersect1d(non_attractive_beauty, attractive_original)
N_only_before = len(only_original)

In [None]:
#Statistics summary
# % of lost tags from original case when beautified
perc_lost = (N_lost/N_original)*100

# % of faces that are considered attractive only after beautification
perc_after = (N_only_after/len(img_index))*100

# % of faces that are considered attractive only before beautification
perc_before = (N_only_before/len(img_index))*100