# Critically Engaging with AI Ethics

 Task 2: Identifying Bias

### Task 2: Identifying Bias

#### Task 2-a: Understanding the Scope of Bias

**Types of Bias Described:**
1. Selection Bias
2. Measurement Bias
3. Algorithmic Bias
4. Confirmation Bias
5. Label Bias

**Known Biases:**
- Selection Bias
- Confirmation Bias

**New Biases:**
- Measurement Bias
- Algorithmic Bias
- Label Bias

**Additional Biases:**
- Survivorship Bias
- Sampling Bias

#### Task 2-b: Exploring Bias in the Jigsaw Toxic Comment Classification Challenge

**Findings:**
- **Identified Biases:** 
  - Selection Bias: Certain types of comments might be underrepresented.
  - Label Bias: Human annotators might have inconsistent labeling standards.
  - Algorithmic Bias: The model might perform differently across various demographic groups.

**Discussion:**
- **Key Points:**
  - It's important to identify and mitigate biases at every stage of the machine learning pipeline.
  - Awareness and continuous monitoring are crucial for ensuring fairness and accuracy.
- **Mitigation Strategies:**
  - Diversifying the dataset to include a wide range of comment types.
  - Standardising labeling procedures to reduce label bias.
  - Regularly evaluating model performance across different subgroups to detect algorithmic bias.


### Task 3: Large Language Models and Bias: Word Embedding Demo

#### Task 3.1: Initial Exploration of Words and Relationships

- **Apple:**
  - **Observation:** Words like "computer" and "mac" are close to "apple" 
- **Silver:**
  - **Observation:** Words like "gold," "copper," "iron," and "bronze." are close to "silver"
- **Sound:**
  - **Observation:** Words like "sounds" and "soundtrack" are close to "sound."

**Conclusion:** Words related to each other are generally situated closer together in the Word2Vec model.

#### Task 3.2: Exploring "Word2Vec All" for Patterns


- **Engineer:**
  - **Observation:** Related words include "engineering," "inventors," "mathematicians," "physicists," and "designers."
- **Drummer:**
  - **Observation:** Related words include "guitarist," "bassist," "musician," "singer," and "songwriter."

#### Discussion on Gender Bias

- **Observation:** There are potential concerns of gender bias. For example, words related to "engineer" are predominantly associated with traditionally male-dominated professions, whereas words related to "drummer" are more gender-neutral but still tend to cluster around traditionally male-associated roles in music.
- **Conclusion:** This suggests that gender biases may be present in the word embeddings, reflecting societal biases in occupational roles.

#### Attribution

- The embeddings and visualizations are from the [TensorFlow Embedding Projector](https://projector.tensorflow.org/).




### Task 4: Thinking about AI Fairness

#### Task 4-a: Topics in AI Fairness

- **Criteria Described:**
  - **Demographic Parity / Statistical Parity:**
    - Ensures the model selects people in proportions that match the group membership percentages of the applicants.
    - Example: A conference selecting speakers with a model that ensures 50% of selected candidates are women if 50% of the attendees are women.
  - **Equal Opportunity:**
    - Ensures the true positive rate (TPR) is equal for each group.
    - Example: A medical tool designed to have a high TPR that is equal for each demographic group.
  - **Equal Accuracy:**
    - Ensures the model has the same percentage of correct classifications for each group.
    - Example: A bank loan approval model that is equally accurate for all demographic groups.
  - **Group Unaware / "Fairness through Unawareness":**
    - Removes all group membership information from the dataset to prevent bias based on these groups.
    - Example: Removing gender, race, or age data from a model, but also ensuring proxies like zip code are removed to avoid inference of group membership.

- **Known and New Criteria:**
  - Criteria I already knew about before this course:
    - Group unaware / "Fairness through unawareness"
  - Criteria that were new to me:
    - Demographic parity / statistical parity
    - Equal opportunity
    - Equal accuracy

- **Additional Criteria:**
  - Transparency in model decision-making
  - Accountability in AI deployment
  - Continuous monitoring and updating of AI models to ensure fairness




### Task 4-b: AI Fairness in the Context of the Credit Card Dataset


**Key Points and Findings:**

1. **Baseline Model:**
   - **Performance:**
     - Total approvals: 38,246
     - Group A approvals: 8,028 (21%)
     - Group B approvals: 30,218 (79%)
     - Overall accuracy: 94.89%
     - Group A accuracy: 94.56%
     - Group B accuracy: 95.02%
     - True positive rate (TPR) for Group A: 77.23%
     - TPR for Group B: 98.03%
   - **Observations:**
     - Group B had a higher representation in approved applicants.
     - Higher TPR for Group B indicates an unfair advantage in model approval decisions.

2. **Group Unaware Model:**
   - **Performance:**
     - Overall, removing group information resulted in a model that did not significantly reduce the disparity in fairness metrics.
   - **Observations:**
     - While removing group membership information aimed to eliminate bias, proxies within the data may still perpetuate inequality.
     - Demographic parity, equal accuracy, and equal opportunity were not fully achieved.

3. **Demographic Parity Model:**
   - **Performance:**
     - Adjusted thresholds to balance approval rates between groups.
     - This approach showed improved demographic parity but may have affected other fairness criteria such as accuracy and opportunity.
   - **Observations:**
     - Balancing representation in approved applicants was partially successful.
     - Trade-offs between different fairness criteria were evident.

**Discussion and Reflections:**

- **Fairness Criteria:**
  - **Demographic Parity:** Group B had an unfair advantage in the baseline model. Adjusting thresholds improved this but introduced trade-offs.
  - **Equal Accuracy:** The model was slightly more accurate for Group B, leading to potential unfairness.
  - **Equal Opportunity:** Group B had a significantly higher TPR, highlighting a bias in positive classifications.

- **Model Fairness:**
  - The fairness of models is complex and context-dependent. Removing group membership alone does not ensure fairness due to hidden proxies.
  - Ensuring fairness requires a balance between various criteria and careful consideration of the data and context.

**Conclusion:**
Achieving fairness in AI models is a nuanced and ongoing challenge. It involves not only model training but also understanding and mitigating biases in data collection and preprocessing. This exercise highlights the importance of evaluating multiple fairness criteria and recognising the trade-offs involved in striving for equitable AI systems.




### Task 5: AI and Explainability

**Introduction to Permutation Importance:**
Permutation importance is a technique used to determine the importance of features in a machine learning model by measuring the change in model performance when the values of a feature are shuffled. If shuffling the values of a feature significantly decreases model performance, that feature is considered important.

**Exercise Overview:**
We followed the tutorial on Permutation Importance at Kaggle, which provided an example of predicting whether a soccer team will have the "Man of the Game" winner based on team statistics. We then moved on to a hands-on exercise involving a Taxi Fare Prediction dataset.

**Task 1-a: Analysis of Taxi Fare Prediction Dataset**

1. **Number of Features:**
   - The dataset used in the exercise contains several features, including:
     - pickup_latitude
     - pickup_longitude
     - dropoff_latitude
     - dropoff_longitude
     - passenger_count
     - pickup_hour
     - pickup_minute
     - pickup_day_of_week

2. **Results and Intuition:**
   - **Results:** The permutation importance revealed that features like pickup_longitude, dropoff_longitude, and pickup_latitude had high importance, while features like passenger_count and pickup_minute were less important.
   - **Contrary to Intuition?:** The results were somewhat intuitive. Geographic features (pickup and dropoff locations) are naturally expected to be highly predictive of taxi fare due to distance and location-based pricing. However, some might intuitively expect passenger_count to have more importance, which was not the case here, likely because fare primarily depends on distance and time rather than the number of passengers.

3. **Discussion with Peers:**
   - Our peer discussion highlighted that while the results aligned with general expectations about fare prediction, they also underscored the value of empirical feature importance assessment. It was noted that certain intuitions, like the perceived importance of passenger_count, might not always hold true in practice.



**Task 1-b: Reflecting on Permutation Importance**

1. **Reasonableness of Permutation Importance:**
   - Permutation importance is a reasonable measure of feature importance because it directly evaluates the impact of each feature on model performance. It provides a straightforward and interpretable way to understand feature contributions.

2. **Potential Issues with Permutation Importance:**
   - **Correlated Features:** If features are highly correlated, shuffling one feature may not significantly impact model performance because the correlated feature can still provide similar information.
   - **Non-Linear Relationships:** In models capturing complex, non-linear relationships, the impact of shuffling may not fully capture the importance of interactions between features.
   - **Data Sensitivity:** The technique is sensitive to the specific data used for shuffling. Different subsets of data might yield different importance rankings.

**Example Issue:**
   - Consider a model predicting house prices with features like square footage and number of bedrooms. These features are often correlated; shuffling square footage might not drastically affect the model if the number of bedrooms remains in place, potentially underestimating the importance of square footage.

**Discussion:**
   - Despite these issues, permutation importance remains a valuable tool for feature importance assessment, particularly when used alongside other methods. It encourages data scientists to critically evaluate model behavior and ensure robustness in their feature selection process.

**Conclusion:**
Permutation importance provides a valuable lens through which to view and understand feature importance in machine learning models. By highlighting both intuitive and non-intuitive results, it aids in refining models and enhancing their interpretability and fairness.


