# Block C: Responsible AI 

For details regarding ILO 3.1 and the use-case, please refer to the Assessment rubric in Microsoft Teams, and the [DataLab: Responsible AI](https://adsai.buas.nl/Study%20Content/Responsible%20and%20Explainable%20AI/UseCases.html) GitHub page.

## Use-case 1: Identifying, and describing bias

**- Define the concept of bias in relation to the Imsitu dataset, and a computer vision task:**

Bias in relation to the Imsitu dataset and a computer vision task refers to the tendency of a machine learning algorithm to make decisions that are not based on facts or data, but instead on the programmer's own preconceived notions, stereotypes, or preferences. This can lead to incorrect or unfair results for certain populations or groups, leading to a lack of fairness in the system. Bias in computer vision tasks can be caused by a variety of factors, including the dataset used to train the model, the type of algorithm chosen, and the programmer's own assumptions.

**- List, and describe the type of bias that you identified in the Imsitu dataset:**

Representation Bias:
When we look at praying there are more images of people praying with both hands put together than not. Out of the 19 pictures that show up when searching for the verb praying, only 2 are pictures of hands not put together.

**- Discuss the possible ramification (e.g., harm) in terms of fairness of the identified bias instance:
Why, and when, is this particular instance of bias undesirable? In other words, who might be disproportionally affected by this particular instance of bias, and when does this negative effect come into play?**

This negatively affects the different religions. Most of the pictures are Christian people praying, but this disproportionately affects other religions such as Hinduism, Islam, Judaism, Buddhism, etc. 


## Use-case 2: Propose individual fairness method

**- Identify a sensitive/protected attribute in the Imsitu dataset:**
Race/ Ethnicity

**- Mitigate bias in the Imsitu dataset by applying the ‘Fairness Through Unawareness' or ‘Fairness Through Awareness' method to this sensitive/protected attribute.**

Fairness through Unawareness: This would remove all the incorrect labels that are related to Race or ethnicity form the dataset. This would make sure that the dataset does not contain any information about the race/ethnicity of the people in the images, and  any potential biases would be eliminated.

**- Elaborate on the individual fairness method that you applied, and why you think it is a good method to mitigate bias in the Imsitu dataset:**

This method is a good to medicate bias because it eliminates all the incorrect labels. This allows the dataset to remain unbiased from any potential discrimination or prejudice. The method also allows the dataset to remain open and accessible to all users, regardless of their race/ethnicity.




## Use-case 3: Create a subset of images from the original dataset

Write your text for use-case 3 here

In [9]:
#Write your Python code for use-case 3 here
#Determine which agent codes are associated with which nouns:
import json

# load imsitu_space.json file
imsitu_space = json.load(open("Data/imsitu_space.json"))

nouns = imsitu_space["nouns"]
verbs = imsitu_space["verbs"]

# function to get all agent codes for a specific agent/noun
def get_agent_codes(agent = "banana"):
    for noun in nouns:
        if nouns[noun]['gloss'][0] in agent:
            print(f"{agent} found")
            print(noun)

# get all agent codes for men (use your own nouns here)
get_agent_codes("banana")

In [None]:
# Code to extract the images that contain a specific verb and agent value:
import json

def get_verb_agent(json_file, verb_custom, agent_custom):
    train = json.load(open(json_file))
    verb_value = []
    agent_key = []
    agent_value = []
    file_path = []
    count = 0
    for i in train:
        verb = train[i]['verb']
        if verb == verb_custom:
            frames = train[i]['frames']
            for frame in frames:
                for key, value in frame.items():
                    if key == 'agent':
                        if value in agent_custom:
                            if i not in file_path:
                                agent_key.append(key)
                                agent_value.append(value)
                                file_path.append(i)
                                verb_value.append(verb)
                                count += 1
                        else:
                            continue
                    else:
                        continue
    return(file_path, verb_value, agent_key, agent_value, count)

get_verb_agent('Data/train.json', 'spoiling', ['n12352287', 'n07753592' ])

In [None]:
# Select the images that contain the predefined verb and agent values, and storing them in a new folder
import shutil

def img_to_folder(dirs_original, dirs_destination):
    image_list = get_verb_agent('Data/train.json', 'dusting', ['n12352287', 'n07753592' ])[0]
    dirs_list = [(dirs_original, dirs_destination)]
    for img in image_list:
        for source_folder, destination_folder in dirs_list:
            shutil.copy(source_folder+img, destination_folder+img)

img_to_folder("Data/dev.json", "Data/Spoiling")

In [None]:
#Creating a csv file that contains the file name, verb and agent value for each image
import pandas as pd

def lists_to_df(dirs_destination, col1_name, col2_name, col3_name):
    col1 = get_verb_agent('Data/train.json', 'spoiling', ['n12352287', 'n07753592' ])[0]
    col2 = get_verb_agent('Data/train.json', 'spoiling', ['n12352287', 'n07753592' ])[1]
    col3 = get_verb_agent('Data/train.json', 'spoiling', ['n12352287', 'n07753592' ])[3]
    df = pd.DataFrame(list(zip(col1, col2, col3)), columns=[col1_name, col2_name, col3_name])
    df.to_csv(dirs_destination, index=False)
    return df

lists_to_df('Data/Dusting_train.csv', 'file_name','spoiling', 'banana')

In [None]:
df_imsitu= pd.read_csv(r"C:/Users/neilr/github-classroom/BredaUniversityADSAI/2022-23c-1fcmgt-reg-ai-01-neildaniel221270/Data/imsitu2.csv")
# create a new column 'age_group' based on the values in the 'agent' column
df_imsitu['condition'] = df_imsitu['agent'].apply(lambda x: 'spoiled' if x in ['n12352287', 'n07753592'] else 'yellow')
df_imsitu

In [None]:
import os
# Set the path to the image directory of google scraped images using a raw string
image_dir = r"C:/Users/neilr/OneDrive/Documents/BUAS/Year 1/Block C/Creative brief/Data/Resized bannanas/spoiled"

# Get a list of all image file names in the directory
image_files = os.listdir(image_dir)

# Create a dictionary to store the data
data = {"file_name": image_files, "condition": ["spoiled"] * len(image_files)}

# Create a dataframe from the dictionary
df_spoilscraped = pd.DataFrame(data)

# Save the dataframe to a CSV file
df_spoilscraped.to_csv("Data/image_spoiled_scraped.csv", index=False)

In [None]:

df_igspoil = pd.merge(df_imsitu, df_spoilscraped, on=['file_name', 'condition'], how='outer')
df_igspoil.head()

In [None]:
# Set the path to the image directory of google scraped images using a raw string
image_dir = r"C:/Users/neilr/OneDrive/Documents/BUAS/Year 1/Block C/Creative brief/Data/Resized bannanas/not_spoiled"

# Get a list of all image file names in the directory
image_files = os.listdir(image_dir)

# Create a dictionary to store the data
data = {"file_name": image_files, "condition": ["not spoiled"] * len(image_files)}

# Create a dataframe from the dictionary
df_notspoiled = pd.DataFrame(data)

# Save the dataframe to a CSV file
df_notspoiled.to_csv("Data/image_notspoiled_scraped.csv", index=False)
df_notspoiled.head()

In [None]:
df_igbananas = pd.merge(df_igspoil, df_notspoiled, on=['file_name', 'condition'], how='outer')
df_igbananas

In [None]:
from sklearn.model_selection import train_test_split

# X represents the feature matrix, and y represents the labels
train_df, test_df = train_test_split(df_igbananas, test_size=0.2, stratify=df_igbananas["condition"], random_state=42)
train_df, val_df = train_test_split(train_df, test_size=0.2, random_state=42)

# Check the size of each set
print(f'Training set size: {len(train_df)}')
print(f'Validation set size: {len(val_df)}')
print(f'Testing set size: {len(test_df)}')

## Use-case 4: Write Python functions; group fairness metrics

Write your text for use-case 4 here

In [None]:
import numpy as np

In [10]:
#Write your Python code for use-case 4 here

def load_confusion_matrices():
    cm_priv = np.load('./Responsible_AI/confusion_matrix_priv_female.npy')
    tn_priv = cm_priv[0][0]
    fp_priv = cm_priv[0][1]
    fn_priv = cm_priv[1][0]
    tp_priv = cm_priv[1][1]
    cm_unpriv = np.load('./Responsible_AI/confusion_matrix_unpriv_male.npy')
    tn_unpriv = cm_unpriv[0][0]
    fp_unpriv = cm_unpriv[0][1]
    fn_unpriv = cm_unpriv[1][0]
    tp_unpriv = cm_unpriv[1][1]
    return [[tn_priv, fp_priv, fn_priv, tp_priv], [tn_unpriv, fp_unpriv, fn_unpriv, tp_unpriv]]

load_confusion_matrices()

def demographic_parity():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    NR = (tp + fp)/(tn + fp + fn + tp)
    NRm = (tpm + fpm)/(tnm + fpm + fnm + tpm)
    return [NR, NRm, NR-NRm]
demographic_parity()

def predictive_parity():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    PPV = tp/(tp+fp)
    PPVm = tpm/(tpm+fpm)
    return [PPVm, PPV, PPVm - PPV]
predictive_parity()

def equalized_odds():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    EO = (tp )/(tp + fn )
    EOm = (tpm)/(tpm + fnm )
    EOf = (tn )/(tn + fp )
    EOmf = (tnm)/(tnm + fpm )
    return [EO - EOm, EOmf-EOf]
equalized_odds()

def conditional_use_accuracy_equality():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    CAE_PPV = - (tp )/(tp + fp ) + (tpm )/(tpm + fpm )
    CAE_NPV = - (tn)/(tn + fn) + (tnm)/(tnm + fnm)
    return [CAE_PPV, CAE_NPV]
conditional_use_accuracy_equality()

def equal_selection_parity():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    SP = tp +fp
    SPm = tpm + fpm
    return [SPm,SP, SP - SPm]
equal_selection_parity()

def equal_opportunity():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    EOm = tp/(fn+tp)
    EO = tpm/(fnm+tpm)
    return [EO, EOm, EOm - EO]
equal_opportunity()

def predictive_equality():
    [[tn,fp,fn,tp],[tnm,fpm,fnm,tpm]]=load_confusion_matrices()
    PE = tn/(fp+tn)
    PEm = tnm/(fpm+tnm)
    return [PEm, PE, PEm -PE]
predictive_equality()


## Use-case 5: Write Python function; group fairness taxonomy

I have not done this use case becauses it is for the excellent criteria for the ILO.

## Use-case 6: Apply one/multiple explainable AI method(s) to the image classifier

Write your text for use-case 6 here

In [1]:
!pip install tf_explain
!pip install opencv-python









In [2]:
#load libraries
import numpy as np
import tensorflow as tf
import PIL

#load GradCAM
from tf_explain.core.grad_cam import GradCAM

In [10]:
IMAGE_PATH = r"C:\Users\neilr\github-classroom\BredaUniversityADSAI\2022-23c-1fcmgt-reg-ai-01-neildaniel221270\360_F_368360619_9ddfAPtaAp0ZMPdroyQIMIhhS6SyKCVK.jpeg" 
class_index = 281

In [11]:
img = tf.keras.preprocessing.image.load_img(IMAGE_PATH, target_size=(224, 224))
img = tf.keras.preprocessing.image.img_to_array(img)

In [12]:
model = tf.keras.applications.vgg16.VGG16(weights="imagenet", include_top=True)
#get model summary
model.summary()

Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 block1_conv1 (Conv2D)       (None, 224, 224, 64)      1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 224, 224, 64)      36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 112, 112, 64)      0         
                                                                 
 block2_conv1 (Conv2D)       (None, 112, 112, 128)     73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 112, 112, 128)     147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 56, 56, 128)       0     

In [13]:
#first create the input in a format that the explainer expects (a tuple)
input_img = (np.array([img]), None)

#initialize the explainer as an instance of the GradCAM object
explainer = GradCAM()

# Obtain explanations for your image using VGG 16 and GradCAM
grid = explainer.explain(input_img,
                         model,
                         class_index=class_index
                         )

#save the resulting image
explainer.save(grid, "C:/Users/neilr/github-classroom/BredaUniversityADSAI/2022-23c-1fcmgt-reg-ai-01-neildaniel221270/", "grad_cam_spoiled1.png")

In [14]:
IMAGE_PATH2 = r"C:\Users\neilr\github-classroom\BredaUniversityADSAI\2022-23c-1fcmgt-reg-ai-01-neildaniel221270\Picture1.jpg" 
class_index = 281

In [15]:
img2 = tf.keras.preprocessing.image.load_img(IMAGE_PATH2, target_size=(224, 224))
img2 = tf.keras.preprocessing.image.img_to_array(img2)

In [16]:
#first create the input in a format that the explainer expects (a tuple)
input_img = (np.array([img2]), None)

#initialize the explainer as an instance of the GradCAM object
explainer = GradCAM()

# Obtain explanations for your image using VGG 16 and GradCAM
grid = explainer.explain(input_img,
                         model,
                         class_index=class_index
                         )

#save the resulting image
explainer.save(grid, "C:/Users/neilr/github-classroom/BredaUniversityADSAI/2022-23c-1fcmgt-reg-ai-01-neildaniel221270/", "grad_cam_notspoiled1.png")