# Multi-Signal Gender Inference Demo

This notebook demonstrates how the system combines **three independent signals**  
(name, sport gender, and photo) to infer a likely gender with weighted evidence.

The notebook walks through:

1. Loading the inference engine  
2. Creating several example athlete profiles  
3. Running inference on each  
4. Inspecting how each signal contributed to the final result  


## Imports & Auto-Reload

In [1]:
%load_ext autoreload
%autoreload 2

from inference import infer_gender, InferenceConfig
from dataclasses import asdict


## Quick Test: Calling the Vision Model Directly

Before running the full inference pipeline, it is useful to test the photo-based
signal alone. The following code calls:

```python
from signals import get_photo_signal

signal = get_photo_signal("../examples/test_1.jpg")
signal


In [2]:
from signals import get_photo_signal

signal = get_photo_signal("../examples/test_1.jpg")
signal


Signal(source='photo', p_male=0.0, p_female=1.0, quality='high', raw_value='../examples/test_1.jpg', meta={'notes': 'The image features a single female face with clear visibility and good lighting.'})

⚠️ **Note on the vision model and processing time**

When you run this block, the image is sent to OpenAI’s vision model for
processing, which may take a few seconds depending on network latency and
model load. The model returns a structured JSON response containing
`p_male`, `p_female`, and a `quality` label.

---

⚠️ **Limitations of the vision-based confidence**

In this prototype, the photo signal comes from an LLM-based vision model
(**GPT-4o-mini**) rather than a dedicated **CNN/ResNet classifier** with a
trained softmax head. The numeric values `p_male` and `p_female` are therefore
**confidence-like scores generated through prompting**, not calibrated probabilities produced
by a supervised computer-vision model.

Because we did not build or fine-tune a standalone classifier due to time
constraints, the fusion logic intentionally:

- assigns **lower base weight** to the photo signal  
- **down-weights** it further for low-quality cases (group photos, blurry images, multiple faces)

This helps reduce the influence of any potential misclassification from the
vision model and keeps the overall inference stable.


<table>
  <tr>
    <td align="center">
      <img src="../examples/test_1.jpg" width="160"><br>
      <sub>test_1</sub>
    </td>
    <td align="center">
      <img src="../examples/test_2.jpg" width="160"><br>
      <sub>test_2</sub>
    </td>
    <td align="center">
      <img src="../examples/test_3.jpg" width="160"><br>
      <sub>test_3 (group photo)</sub>
    </td>
    <td align="center">
      <img src="../examples/test_4.jpg" width="160"><br>
      <sub>test_4</sub>
    </td>
    <td align="center">
      <img src="../examples/test_5.jpg" width="160"><br>
      <sub>test_5</sub>
    </td>
    <td align="center">
      <img src="../examples/test_6.jpg" width="160"><br>
      <sub>test_6</sub>
    </td>
  </tr>
</table>


## 1. Create Example Athlete Profiles

These profiles simulate real cases the system might encounter.

Each profile includes:
- **first name**  
- **sport gender**  
- **photo path**  
- optional flags like `low_quality_photo`, `group_photo`, or explicit `gender`

Explicit gender (e.g., “Female”) **skips the entire inference pipeline**.


In [3]:
# Demo input profiles for the multi-signal gender inference pipeline.
# Each profile intentionally varies signal quality, sport labels, and image context.

profiles = [
    # 1. Female-leaning name, no sport gender, high-quality image
    {
        "id": 1,
        "first_name": "Mary",
        "sport_gender": None,
        "photo_path": "../examples/test_1.jpg",
    },

    # 2. Neutral sport label, image-only + name signal
    {
        "id": 2,
        "first_name": "Alice",
        "sport_gender": "Unknown",
        "photo_path": "../examples/test_2.jpg",
    },

    # 3. Group/low-context image → photo signal heavily down-weighted
    {
        "id": 3,
        "first_name": "Alex",
        "sport_gender": "Unknown",
        "photo_path": "../examples/test_3.jpg",
        "group_photo": True,
    },

    # 4. Clear male signals from both name + sport + image
    {
        "id": 4,
        "first_name": "James",
        "sport_gender": "Male",
        "photo_path": "../examples/test_4.jpg",
    },

    # 5. Female image but sport listed as male → conflict test case
    {
        "id": 5,
        "first_name": "Emily",
        "sport_gender": "Male",
        "photo_path": "../examples/test_5.jpg",
    },

    # 6. Strong male signals across name, sport, and image
    {
        "id": 6,
        "first_name": "Michael",
        "sport_gender": "Male",
        "photo_path": "../examples/test_6.jpg",
    },
]


## 2. Run the Inference Engine

The following cell runs inference on every profile and displays:

- inferred gender  
- confidence score  
- probability of male / female  
- whether explicit gender caused inference to be skipped  

This gives a high-level view of system behavior.


In [4]:
import pandas as pd

config = InferenceConfig(min_confidence=0.6)
rows = []

for p in profiles:
    result = infer_gender(p, config=config)
    row = {
        "id": p["id"],
        "first_name": p.get("first_name"),
        "sport_gender": p.get("sport_gender"),
        "explicit_gender": p.get("gender"),
        "inferred_gender": result.inferred_gender,
        "confidence": round(result.confidence, 3),
        "p_male": round(result.p_male, 3),
        "p_female": round(result.p_female, 3),
        "skipped_due_to_explicit_gender": result.skipped_due_to_explicit_gender,
    }
    rows.append(row)

df = pd.DataFrame(rows)
df


Unnamed: 0,id,first_name,sport_gender,explicit_gender,inferred_gender,confidence,p_male,p_female,skipped_due_to_explicit_gender
0,1,Mary,,,Female,0.907,0.093,0.907,False
1,2,Alice,Unknown,,Female,0.908,0.092,0.908,False
2,3,Alex,Unknown,,Unknown,0.508,0.492,0.508,False
3,4,James,Male,,Male,0.997,0.997,0.003,False
4,5,Emily,Male,,Male,0.697,0.697,0.303,False
5,6,Michael,Male,,Male,0.997,0.997,0.003,False


### Note on Current Photo Classification Limitations

Right now, the system is not using a trained face-gender classifier with numerical confidence.  
Instead, it relies on an LLM vision model (OpenAI 4o-mini).

Because LLM-based photo outputs are less quantitatively reliable, the weighting logic intentionally prioritizes  
**name-based priors** and **sport-gender signals** over photo signals.  
This can lead to cases where the image appears clearly female or male, but the final inference leans the other way  
due to the stronger, more stable non-photo signals.

That said, the photo signal is **not ignored**.  
If you compare the confidence and p_male / p_female values for the examples using `test_4`, `test_5`, and `test_6`,  
you can see that the photo input still nudges the final probabilities – the scores shift in response to the image,  
even though name and sport remain the primary drivers.


In [6]:
from signals import get_photo_signal

signal = get_photo_signal("../examples/test_5.jpg")
signal


Signal(source='photo', p_male=0.0, p_female=1.0, quality='high', raw_value='../examples/test_5.jpg', meta={'notes': 'The image features a single female athlete in action, indicating high quality.'})

## 3. Inspect Attribution for One Profile

The table above gives the final result, but sometimes we want to see:

- how each signal was weighted  
- how much each contributed  
- how low/medium/high quality affected the model  
- whether context flags (group photo, low quality) reduced photo weight  

Below, we inspect the full attribution for **Profile 3 (Alex)**.


In [5]:
profile = profiles[2]  # Alex
result = infer_gender(profile, config=config)
print("Profile:", profile)
print("Inference result:", result.inferred_gender, "confidence", round(result.confidence, 3))

import pprint
pp = pprint.PrettyPrinter(indent=2)
print("\nAttribution:")
pp.pprint(result.attribution)


Profile: {'id': 3, 'first_name': 'Alex', 'sport_gender': 'Unknown', 'photo_path': '../examples/test_3.jpg', 'group_photo': True}
Inference result: Unknown confidence 0.508

Attribution:
[ { 'meta': {'ambiguous': True, 'db_hit': True},
    'p_female': 0.45,
    'p_male': 0.55,
    'quality': 'medium',
    'source': 'name',
    'weight': 0.4186046511627907,
    'weighted_p_female': 0.18837209302325583,
    'weighted_p_male': 0.23023255813953492},
  { 'meta': {'reason': 'neutral_sport_category'},
    'p_female': 0.5,
    'p_male': 0.5,
    'quality': 'medium',
    'source': 'sport',
    'weight': 0.5232558139534884,
    'weighted_p_female': 0.2616279069767442,
    'weighted_p_male': 0.2616279069767442},
  { 'meta': { 'notes': 'The image features multiple female athletes running on '
                       'a track.'},
    'p_female': 1.0,
    'p_male': 0.0,
    'quality': 'high',
    'source': 'photo',
    'weight': 0.05813953488372094,
    'weighted_p_female': 0.05813953488372094,
    'w