# **[GXY] Deus Ex Machina: Power, Perception, and Prejudice**

Ana Azevedo (FEUP/FCUP, M.IA)

Diogo Silva (FEUP/FCUP, M.IA)

Félix Martins (FEUP/FCUP, M.IA)

Francisco Campos (FEUP/FCUP, M.IA)

João Figueiredo (FEUP/FCUP, M.IA)

### **Dataset and Problem Description**

The dataset used in this project is the **MBA Admission Dataset, Class of 2025**, which is a classification problem. The dataset contains several instances with 9 features, and the target variable is **admission status** (Admit, Waitlist, or Deny). The dataset is relatively balanced but may contain bias due to factors such as gender, race, and academic background, which can affect the fairness of the admissions process.

### **Features**
This dataset includes various attributes of the applicants, such as:

- **application_id**: Unique identifier for each application
- **gender**: Applicant's gender (Male, Female)
- **international**: Whether the applicant is an international student (TRUE/FALSE)
- **gpa**: Grade Point Average (on a 4.0 scale)
- **major**: Undergraduate major (Business, STEM, Humanities)
- **race**: Racial background of the applicant (e.g., White, Black, Asian, Hispanic, Other / null: international student)
- **gmat**: GMAT score (out of 800)
- **work_exp**: Number of years of work experience
- **work_industry**: Industry of previous work experience (e.g., Consulting, Finance, Technology, etc.)
- **admission**: Admission status (Admit, Waitlist, Deny)

### **Why this dataset?**

The **MBA Admission Dataset** and the **Rotterdam system** both share concerns regarding **fairness** and **explainability**, but in different contexts. While the Rotterdam system is punitive, as it denies essential benefits, the MBA admission process is not punitive, though it still has a significant impact on candidates' future professional opportunities. Both systems, however, can be affected by biases such as **gender**, **ethnicity**, and other demographic factors, which may influence the decision-making process in unjust ways.

Just as the Rotterdam system faced criticism for lack of fairness and transparency in its decisions, the MBA admissions process also raises similar concerns. Factors like **gender**, **race**, **academic performance**, and **work experience** can inadvertently introduce bias, potentially leading to unfair outcomes for certain groups. Therefore, **explainability** is crucial in both cases. For the MBA admission process, it is essential to ensure that the decisions are transparent and that the criteria used are well-understood by all stakeholders. This way, we can ensure that admissions are based on merit, not influenced by bias, and that the process is just, transparent, and accountable.

In both cases, addressing fairness and improving explainability will lead to better decision-making, ensuring that individuals are not unfairly excluded or advantaged based on arbitrary or discriminatory factors.



In [126]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder


df = pd.read_csv("MBA.csv")
df.head()

Unnamed: 0,application_id,gender,international,gpa,major,race,gmat,work_exp,work_industry,admission
0,1,Female,False,3.3,Business,Asian,620.0,3.0,Financial Services,Admit
1,2,Male,False,3.28,Humanities,Black,680.0,5.0,Investment Management,
2,3,Female,True,3.3,Business,,710.0,5.0,Technology,Admit
3,4,Male,False,3.47,STEM,Black,690.0,6.0,Technology,
4,5,Male,False,3.35,STEM,Hispanic,590.0,5.0,Consulting,


In [127]:
df.admission = df.admission.fillna('No admit')
df.head()

Unnamed: 0,application_id,gender,international,gpa,major,race,gmat,work_exp,work_industry,admission
0,1,Female,False,3.3,Business,Asian,620.0,3.0,Financial Services,Admit
1,2,Male,False,3.28,Humanities,Black,680.0,5.0,Investment Management,No admit
2,3,Female,True,3.3,Business,,710.0,5.0,Technology,Admit
3,4,Male,False,3.47,STEM,Black,690.0,6.0,Technology,No admit
4,5,Male,False,3.35,STEM,Hispanic,590.0,5.0,Consulting,No admit


In [128]:
del(df['application_id'])

In [129]:
df[df.international]["race"].unique()

array([nan], dtype=object)

In [130]:
df.race = df.race.fillna('International')
del(df['international'])

In [131]:
df.isnull().sum().sum()

0

In [132]:
df['gender'].unique()

array(['Female', 'Male'], dtype=object)

In [133]:
df.gender = (df.gender == 'Female').astype(int)
df.head()

Unnamed: 0,gender,gpa,major,race,gmat,work_exp,work_industry,admission
0,1,3.3,Business,Asian,620.0,3.0,Financial Services,Admit
1,0,3.28,Humanities,Black,680.0,5.0,Investment Management,No admit
2,1,3.3,Business,International,710.0,5.0,Technology,Admit
3,0,3.47,STEM,Black,690.0,6.0,Technology,No admit
4,0,3.35,STEM,Hispanic,590.0,5.0,Consulting,No admit


In [134]:
admission_mapping = {'Admit': 2, 'Waitlist': 1, 'No admit': 0}
df['admission'] = df.admission.map(admission_mapping)

In [135]:
df.groupby("race")["admission"].mean()

race
Asian            0.350480
Black            0.185590
Hispanic         0.221477
International    0.318132
Other            0.405063
White            0.353022
Name: admission, dtype: float64

In [136]:
for col in df.select_dtypes(include=['object']).columns: # major race and work_industry
    df[col] = LabelEncoder().fit_transform(df[col])
df.head()

Unnamed: 0,gender,gpa,major,race,gmat,work_exp,work_industry,admission
0,1,3.3,0,0,620.0,3.0,3,2
1,0,3.28,1,1,680.0,5.0,6,0
2,1,3.3,0,3,710.0,5.0,13,2
3,0,3.47,2,1,690.0,6.0,13,0
4,0,3.35,2,2,590.0,5.0,1,0


In [137]:
df.corr()

Unnamed: 0,gender,gpa,major,race,gmat,work_exp,work_industry,admission
gender,1.0,-0.02221,-0.025773,-0.002748,-0.022815,0.007427,-0.001822,0.122788
gpa,-0.02221,1.0,-0.006697,-0.013919,0.577539,0.000346,-0.009687,0.290997
major,-0.025773,-0.006697,1.0,-0.016681,0.003594,0.006741,0.014921,-0.003042
race,-0.002748,-0.013919,-0.016681,1.0,-0.018811,0.011747,-0.009973,0.036006
gmat,-0.022815,0.577539,0.003594,-0.018811,1.0,-0.000999,-0.001258,0.356453
work_exp,0.007427,0.000346,0.006741,0.011747,-0.000999,1.0,-0.009811,0.009433
work_industry,-0.001822,-0.009687,0.014921,-0.009973,-0.001258,-0.009811,1.0,-0.004361
admission,0.122788,0.290997,-0.003042,0.036006,0.356453,0.009433,-0.004361,1.0
