# 🚀 Gender Classification with Zero-Shot Learning 🎯

This Google Colab notebook demonstrates a simple gender classification task using a zero-shot learning approach. The objective is to classify sentences into three categories: sentences with a male subject 👨, sentences with a female subject 👩, and neutral sentences with an inanimate or non-gendered subject 🔄.

We use the Hugging Face Transformers library 🤗 to build a classification pipeline and leverage a pre-trained zero-shot classifier model. The classifier is then applied to a small dataset of sentences to evaluate its performance in determining the gender or neutrality of the subjects.

Lastly, we compute a confusion matrix 📊 to assess the accuracy of the classifier in this particular task, enabling us to understand its effectiveness in differentiating between male, female, and neutral subjects.

Explore this notebook to learn how zero-shot learning can be applied to real-world classification tasks and achieve impressive results with minimal effort and data 🌟.


# install transformers lib

In [None]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.3-py3-none-any.whl (6.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m44.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.3-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.8/199.8 KB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.3 tokenizers-0.13.2 transformers-4.27.3


# create simple dataset of gendered and neutral sentences

In [None]:
male_sentences = [    "John is a great athlete.",    "Bob loves to play video games.",    "He is a doctor at the hospital.",    "David is a great cook.",    "My brother is an engineer.",]

female_sentences = [    "Samantha is a great dancer.",    "Emily loves to read books.",    "She is a teacher at the school.",    "Laura is a great singer.",    "My sister is a nurse.",]

neutral_sentences = [    "The sun is shining.",    "The book is on the table.",    "The car is parked outside.",    "The tree is tall.",    "The coffee is hot.",]

# load classifier and classify on gender

### deactivate warnings

In [None]:
import warnings
# Set a global warning filter to ignore the UserWarning generated by the pipeline
warnings.filterwarnings("ignore", message="Length of IterableDataset")

### infer on dataset

In [None]:
from transformers import pipeline

# Load the zero-shot classification pipeline
classifier = pipeline("zero-shot-classification")

# Define the candidate labels for the classification task
label_1 = "human male subject"
label_2 = "human female subject"
label_3 = "neutral or inanimate subject"

candidate_labels = [label_1, label_2, label_3]

# Classify the male sentences
male_results = classifier(male_sentences, candidate_labels)

# Classify the female sentences
female_results = classifier(female_sentences, candidate_labels)

# Classify the neutral sentences
neutral_results = classifier(neutral_sentences, candidate_labels)


# confusion matrix

In [None]:
from sklearn.metrics import confusion_matrix

# Combine the results into a single list of predictions and ground truth labels
predictions = []
labels = []
for result, category in [(male_results, label_1), (female_results, label_2), (neutral_results, label_3)]:
    for r in result:
        predictions.append(r["labels"][0])
        labels.append(category)

# Compute the confusion matrix
cm = confusion_matrix(labels, predictions, labels=[label_1, label_2, label_3])
print(cm)


[[5 0 0]
 [0 5 0]
 [0 0 5]]


In [None]:
# incredible result of 100% accuracy !!!
# [[5 0 0]
#  [0 5 0]
#  [0 0 5]]