# Select dataset for age
In this notebook we select the images that will be used for the age bias analysis. Our experiments use the models NIC, SAT, NIC_PLUS and NIC_EQUALIZER. The datasets for each model are separated into different files. We decided to use the intersection between the images selected for each of the models, which still keeps enough images for our experiments.

In [1]:
import os
import re
import pickle
import pandas as pd

In [2]:
DATA_DIR = '../data/bias_data'

In [3]:
human_file = 'Human_Ann/gender_obj_cap_mw_entries.pkl'
model_files = {
    'oscar': 'Oscar/gender_val_oscar_cap_mw_entries.pkl',
    'sat': 'Show-Attend-Tell/gender_val_sat_cap_mw_entries.pkl',
    'nicplus': 'Woman-Snowboard/gender_val_baselineft_cap_mw_entries.pkl',
    'niceq': 'Woman-Snowboard/gender_val_snowboard_cap_mw_entries.pkl',
}

## Human captions dataset

In [4]:
human_path = os.path.join(DATA_DIR, human_file)
main_df = pd.DataFrame(pd.read_pickle(human_path))[['img_id', 'caption_list']]
main_df

Unnamed: 0,img_id,caption_list
0,192,[A group of baseball players is crowded at the...
1,241,[a man standing holding a game controller and ...
2,294,[A man standing in front of a microwave next t...
3,328,[Three men in military suits are sitting on a ...
4,338,[Two people standing in a kitchen looking arou...
...,...,...
10775,579902,"[A person riding a motorcycle down a street., ..."
10776,580197,[Two men in bow ties standing next to steel ra...
10777,580294,[Person cooking an eggs on a black pot on a st...
10778,581317,"[A woman holding a small item in a field., Wom..."


## Model captions datasets
All the datasets happened to have captions for the same images, so the intersection remains as the whole set.

In [5]:
for model_name, model_file in model_files.items():
    model_path = os.path.join(DATA_DIR, model_file)
    df = pd.DataFrame(pd.read_pickle(model_path))[['img_id','pred']]
    df = df.rename(columns={'pred': f'pred_{model_name}'})
    print(model_name, len(df))
    main_df = pd.merge(main_df, df, how='inner', on=['img_id'])
main_df = main_df.dropna()
main_df

oscar 10780
sat 10780
nicplus 10780
niceq 10780


Unnamed: 0,img_id,caption_list,pred_oscar,pred_sat,pred_nicplus,pred_niceq
0,192,[A group of baseball players is crowded at the...,a baseball player holding a bat on top of a fi...,a batter catcher and umpire during a baseball ...,a baseball player holding a bat on top of a fi...,a baseball player holding a bat on a field.
1,241,[a man standing holding a game controller and ...,a man standing in a living room holding a nint...,a couple of people that are playing a video game,a group of people playing a game with nintendo...,a group of people playing a video game.
2,294,[A man standing in front of a microwave next t...,a man standing in front of a bunch of pots and...,a woman is pouring wine into a wine glass,a woman standing in a kitchen preparing food.,a man standing in a kitchen holding a knife.
3,328,[Three men in military suits are sitting on a ...,a group of three men sitting on top of a bench.,a group of people sitting on a bench,a black and white photo of a group of people s...,a black and white photo of a group of people s...
4,338,[Two people standing in a kitchen looking arou...,a couple of women standing in a kitchen next t...,a group of people standing in a kitchen,a woman standing in a kitchen next to a stove.,a woman standing in a kitchen next to a stove.
...,...,...,...,...,...,...
10775,579902,"[A person riding a motorcycle down a street., ...",a man riding a motorcycle down a street next t...,a man riding a motorcycle down a street,a man riding a motorcycle down a street.,a man riding a motorcycle down a street.
10776,580197,[Two men in bow ties standing next to steel ra...,a couple of men standing next to each other in...,a man in a suit and tie in a room,a man in a suit and tie standing next to a woman.,a man in a suit and tie standing next to anoth...
10777,580294,[Person cooking an eggs on a black pot on a st...,a woman in a kitchen making pancakes on a stove.,a woman is preparing food in a kitchen,a woman standing in a kitchen preparing food.,a woman standing in a kitchen preparing food.
10778,581317,"[A woman holding a small item in a field., Wom...",a woman standing in a field looking at her cel...,a woman in a field with a cell phone,a woman standing in a field with a frisbee.,a woman is standing in the grass talking on a ...


## Save captions
The file contains all captions from all models and human.

In [6]:
main_df.set_index('img_id')

Unnamed: 0_level_0,caption_list,pred_oscar,pred_sat,pred_nicplus,pred_niceq
img_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
192,[A group of baseball players is crowded at the...,a baseball player holding a bat on top of a fi...,a batter catcher and umpire during a baseball ...,a baseball player holding a bat on top of a fi...,a baseball player holding a bat on a field.
241,[a man standing holding a game controller and ...,a man standing in a living room holding a nint...,a couple of people that are playing a video game,a group of people playing a game with nintendo...,a group of people playing a video game.
294,[A man standing in front of a microwave next t...,a man standing in front of a bunch of pots and...,a woman is pouring wine into a wine glass,a woman standing in a kitchen preparing food.,a man standing in a kitchen holding a knife.
328,[Three men in military suits are sitting on a ...,a group of three men sitting on top of a bench.,a group of people sitting on a bench,a black and white photo of a group of people s...,a black and white photo of a group of people s...
338,[Two people standing in a kitchen looking arou...,a couple of women standing in a kitchen next t...,a group of people standing in a kitchen,a woman standing in a kitchen next to a stove.,a woman standing in a kitchen next to a stove.
...,...,...,...,...,...
579902,"[A person riding a motorcycle down a street., ...",a man riding a motorcycle down a street next t...,a man riding a motorcycle down a street,a man riding a motorcycle down a street.,a man riding a motorcycle down a street.
580197,[Two men in bow ties standing next to steel ra...,a couple of men standing next to each other in...,a man in a suit and tie in a room,a man in a suit and tie standing next to a woman.,a man in a suit and tie standing next to anoth...
580294,[Person cooking an eggs on a black pot on a st...,a woman in a kitchen making pancakes on a stove.,a woman is preparing food in a kitchen,a woman standing in a kitchen preparing food.,a woman standing in a kitchen preparing food.
581317,"[A woman holding a small item in a field., Wom...",a woman standing in a field looking at her cel...,a woman in a field with a cell phone,a woman standing in a field with a frisbee.,a woman is standing in the grass talking on a ...


In [7]:
OUT_FILE = '../res/IntersectionSAT-OSCAR-NICPL-NICEQ.csv'
main_df.set_index('img_id').to_csv(OUT_FILE, index_label='img_id')