# Evaluating CLIP's Content Classification Against Human Annotations

### Overview
The goal of this project is to compare human-generated annotations of Instagram images with classifications performed by CLIP in identifying content that is inappropriate for children. This project examines a small, curated sample of eight images from publicly available Instagram posts, selected as a proof of concept. Someone with access to an internal API could scale this project to a larger dataset, enabling more rigorous analysis. A key challenge arises from the fact that Instagram is a relatively safe space compared to other social media platforms, yet there are still posts that most reasonable adults would consider inappropriate for children. While hate speech and nudity are already well-regulated on the platform, this analysis intentionally excludes them to focus on content that has a substantial presence on the platform.

### Image Selection
The images were chosen to represent various categories of inappropriate content, and are not a random sample. The set includes clearly inappropriate images, benign images, and those that are ambiguously appropriate.

### Annotation Method
Images were annotated using an ordinal rating scale with three levels: major violation, minor violation, and no violation.
- Major Violation: The image contains content that a majority of parents would not want their children to see.
- Minor Violation: The content is somewhat controversial—between 5% and 50% of parents might object to it, but the majority would find it acceptable.
- No Violation: Fewer than 5% of parents would likely object to the image.
  
While more precise annotation scales are often useful, this three-tier system is sufficient for our project. Additionally, the term "children" is used broadly, some content that is inappropriate for a six-year-old may be acceptable for a teenager. More granular age-based annotations would be valuable but are beyond the scope of this study.

Each image is assessed across the following content annotation dimensions, which represent key aspects of content that may impact child-appropriateness.

### Content Annotation Dimensions
#### Attire
Does the clothing worn by subjects in the image appear appropriate for a child audience? Consider whether it is overly revealing, suggestive, or intended to emphasize intimate body parts. Examples: Subjects in lingerie, see-through clothing, or revealing swimwear. This dimension does not include clothing that violates another category (e.g., a t-shirt with an offensive gesture would be categorized under Gestures, not Attire).
#### Sexual Content
Does the image contain explicit or suggestive body language, positioning, or actions that imply something sexual? Examples: Seductive poses, accentuating intimate body parts, passionate kissing. Scoring: A couple making out would be a major violation, whereas a couple cuddling in bed may be considered a minor violation.
#### Gestures
Does the image depict offensive or aggressive hand gestures? Examples: A raised middle finger, gang signs, or other explicit gestures. Scoring: Gang signs or a middle finger would be a major violation, while an aggressive "finger gun" gesture could be a minor violation.
#### Substances (Alcohol, Tobacco, Drugs)
Does the image contain alcohol, tobacco, or other substances associated with drug use? This applies whether or not the substance is actively being consumed. Scoring: Clearly identifiable alcoholic beverages, drug paraphernalia, or visible cigarette/cigar use are major violations.  Beverages with ambiguous contents (e.g., a glass that could contain alcohol) are minor violations.
#### Violence
Does the image contain depictions of violence, physical harm, or aggression? Examples: Weapons, fighting, or acts of torture. Consider whether a child might find the image disturbing. Scoring: The severity of violence should guide classification, graphic or intense scenes should be major violations, while less extreme depictions, like Looney Tunes, could be minor violations.
#### Gore
Does the image depict blood, bodily harm, or other disturbing imagery? Examples: Dead bodies, visible wounds, or dismembered body parts. Scoring: The severity should be judged based on how likely a child would be to find the image upsetting. Extreme depictions should be major violations, while less severe imagery may be minor violations.
#### Overall Rating
This is a holistic rating based on the combined severity of all violations in the image:
- Inappropriate: The image contains one or more major violations and is not suitable for children.
- Questionable: The image contains minor violations, which may make it inappropriate for younger children but acceptable for older ones.
- Appropriate: The image has no significant violations, or at most one minor violation that does not substantially impact child-appropriateness.

#### Explanation
A brief (2–3 sentence) summary explaining the image and how it was rated. This explanation should be self-sufficient, meaning a reader should be able to understand the ratings without seeing the image.
#### Final Note
This list is not exhaustive. In practice, content moderation may involve additional categories and a more nuanced classification system.

### Annotated Images

In [1]:
import os
import pandas as pd
from PIL import Image
from IPython.display import display, HTML

# Set the folder path containing the images (adjust if needed)
folder_path = "posts"

# Load CSV annotations
csv_path = "inappropriate_analysis - Sheet1.csv"  # Update path if needed
df_human = pd.read_csv(csv_path)

# Function to append "violation" to annotation values
def format_violation(value):
    if value.lower() in ["no", "minor", "major"]:
        return f"{value} violation"
    return value

# Get all image filenames (assuming there are exactly 8 images)
image_files = sorted([f for f in os.listdir(folder_path) if f.lower().endswith((".jpg", ".jpeg", ".png"))])[:8]

# Loop through images and display them with annotations
for image_file in image_files:
    img_path = os.path.join(folder_path, image_file)
    img = Image.open(img_path)
    img.thumbnail((250, 250))  # Resize for display

    # Extract image name without extension
    image_name = os.path.splitext(image_file)[0]

    # Find the corresponding annotation in the CSV
    annotation_row = df_human[df_human["Image"] == image_name]

    # If annotation exists, format it
    if not annotation_row.empty:
        row = annotation_row.iloc[0]
        annotation_text = f"<b>Human Annotations for {image_file}:</b><br>"
        annotation_text += "<br>".join([f"<b>{col}:</b> {format_violation(row[col])}" for col in ["Attire", "Sexual Content", "Gestures", "Substances", "Violence", "Gore"]])
        annotation_text += "<br>"  # Added extra line break here
        annotation_text += f"<b>Overall:</b> {format_violation(row['Overall'])}<br>"
        annotation_text += f"<br><br><b>Explanation:</b> {row['Explanation']}"

    # Print text and display image side by side
    display(HTML(f"""
    <div style="display: flex; align-items: center; margin-bottom: 20px;">
        <div style="margin-right: 20px;"> <img src="{img_path}" width="250"> </div>
        <div style="max-width: 600px; font-size: 14px;"> {annotation_text} </div>
    </div>
    <hr>
    """))

### Image Classification with CLIP
CLIP is a publicly available neural network developed by OpenAI. According to its [GitHub Repository](https://github.com/openai/CLIP):
> CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

To use the model, an image and a natural language prompt are provided as input, and the model returns a probability indicating how likely the described content is present in the image.

One challenge was crafting prompts that accurately aligned with the annotation dimensions and were consistently detected by the model with a significant probability. For example, prompts like “bare skin” and “exposed skin” returned very low probabilities, while “skimpy clothing” was much more effective. Some dimensions required multiple prompts to improve detection. With larger datasets, prompt optimization could enhance accuracy, but for this project, the following prompts were used:

- Attire – "skimpy clothing"
- Sexual Content – "sexually suggestive", "sexual content"
- Gestures – "offensive gesture"
- Substances (Alcohol, Tobacco, Drugs) – "smoking", "cigarette", "alcohol", "drugs"
- Violence – "violence"
- Gore – "gore"

### Mapping CLIP Probabilities to Annotation Ratings
CLIP can detect the presence of offensive content, but unlike a human annotator, it cannot explicitly assess the degree of harm posed by that content. To compare CLIP’s predictions with human annotations, we map the highest probability among the prompts for a given dimension to a categorical rating using the following function:
$$
p =
\begin{cases} 
\text{Major Violation}, & p \geq 0.5 \\
\text{Minor Violation}, & 0.1 \leq p < 0.5 \\
\text{No Violation}, & p < 0.1
\end{cases}
$$

### Annotations Using CLIP

In [2]:
import os
import torch
import clip
import pandas as pd
from PIL import Image

# Load CLIP model and preprocessing
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# Define classification prompts (order corresponds to category grouping)
text_prompts = [
    "skimpy clothing",         # Attire
    "sexually suggestive",     # Sexual Content
    "sexual content",          # Sexual Content
    "smoking",                 # Substances
    "cigarette",               # Substances
    "alcohol",                 # Substances
    "drugs",                   # Substances
    "offensive gesture",       # Gestures
    "violence",                # Violence
    "gore"                     # Gore
]
text_inputs = clip.tokenize(text_prompts).to(device)

# Set the folder path containing the images
folder_path = "posts"

# List to accumulate CLIP raw probabilities
results_list = []

# Loop through images and compute CLIP probabilities
for image_file in sorted(os.listdir(folder_path)):
    if image_file.lower().endswith((".jpg", ".jpeg", ".png")):
        img_path = os.path.join(folder_path, image_file)
        image = Image.open(img_path)
        # Process image using CLIP's preprocessing pipeline
        processed_image = preprocess(image).unsqueeze(0).to(device)

        # Run CLIP classification
        with torch.no_grad():
            logits_per_image, _ = model(processed_image, text_inputs)
            probs = logits_per_image.softmax(dim=-1).cpu().numpy()[0]

        # Create a dictionary for raw probabilities for each prompt
        prob_dict = {"Image": os.path.splitext(image_file)[0]}
        for idx, prompt in enumerate(text_prompts):
            prob_dict[prompt] = probs[idx]
        results_list.append(prob_dict)

# Create a DataFrame from the results list
df_prob = pd.DataFrame(results_list)

In [3]:
df_prob

Unnamed: 0,Image,skimpy clothing,sexually suggestive,sexual content,smoking,cigarette,alcohol,drugs,offensive gesture,violence,gore
0,post1,0.000128,0.00819,0.823498,0.004925,0.005398,0.005393,0.089631,0.005767,0.052816,0.004253
1,post2,0.012965,0.030211,0.007404,0.00643,0.022654,0.00074,0.014749,0.060595,0.308278,0.535973
2,post3,0.00679,0.035207,0.30418,0.086693,0.017033,0.439727,0.028007,0.058659,0.019071,0.004634
3,post4,0.00036,0.038146,0.105708,0.042967,0.008827,0.014975,0.027154,0.704586,0.021907,0.035371
4,post5,0.816471,0.01022,0.006519,0.113479,0.025846,0.001396,0.0048,0.015908,0.002045,0.003317
5,post6,0.218317,0.190915,0.014945,0.232233,0.305086,0.007764,0.005841,0.011401,0.003102,0.010396
6,post7,0.002066,0.036958,0.003245,0.039372,0.054113,0.003753,0.029271,0.817967,0.006878,0.006377
7,post8,0.002486,0.661263,0.289163,0.002036,0.000969,0.001518,0.003016,0.000631,0.018643,0.020277


In [4]:
import pandas as pd

# Define a mapping function that converts a probability to a rating.
def map_probability(prob):
    if prob >= 0.5:
        return "major"
    elif prob >= 0.1:
        return "minor"
    else:
        return "no"

# Define which columns (from df_prob) correspond to each category.
# Note: For Sexuality we use the prompts "sexually suggestive" and "sexual content".
# For Substances we use the prompts "smoking", "cigarette", "alcohol", "drugs".
category_mapping = {
    "Attire": ["skimpy clothing"],
    "Sexual Content": ["sexually suggestive", "sexual content"],
    "Substances": ["smoking", "cigarette", "alcohol", "drugs"],
    "Gestures": ["offensive gesture"],
    "Violence": ["violence"],
    "Gore": ["gore"]
}

results_list = []

# Iterate over each row in df_prob.
for idx, row in df_prob.iterrows():
    new_row = {"Image": row["Image"]}
    major_count = 0
    minor_count = 0
    
    # For each category, compute the maximum probability among its associated prompts.
    for category, prompt_cols in category_mapping.items():
        max_prob = max(row[col] for col in prompt_cols)
        rating = map_probability(max_prob)
        new_row[category] = rating
        
        # Count the number of "major" and "minor" ratings.
        if rating == "major":
            major_count += 1
        elif rating == "minor":
            minor_count += 1
    
    # Apply overall rules:
    #   one or more major violation OR 4+ minor violations -> inappropriate
    #   2-3 minor violations -> questionable
    #   0-1 minor violations -> appropriate
    if major_count >= 1 or minor_count >= 4:
        overall = "inappropriate"
    elif 2 <= minor_count <= 3:
        overall = "questionable"
    else:
        overall = "appropriate"
    new_row["Overall"] = overall
    
    results_list.append(new_row)

# Create the DataFrame df_clip from the accumulated results.
df_clip = pd.DataFrame(results_list)

In [5]:
df_clip

Unnamed: 0,Image,Attire,Sexual Content,Substances,Gestures,Violence,Gore,Overall
0,post1,no,major,no,no,no,no,inappropriate
1,post2,no,no,no,no,minor,major,inappropriate
2,post3,no,minor,minor,no,no,no,questionable
3,post4,no,minor,no,major,no,no,inappropriate
4,post5,major,no,minor,no,no,no,inappropriate
5,post6,minor,minor,minor,no,no,no,questionable
6,post7,no,no,no,major,no,no,inappropriate
7,post8,no,major,no,no,no,no,inappropriate


In [10]:
import os
import pandas as pd
from IPython.display import display, HTML

# Assume df_clip and df_prob are already created.
# They should have a common "Image" column (file name without extension).

# Also assume text_prompts and categories are defined as follows:
text_prompts = [
    "skimpy clothing",         # Attire
    "sexually suggestive",     # Sexual Content
    "sexual content",          # Sexual Content
    "smoking",                 # Substances
    "cigarette",               # Substances
    "alcohol",                 # Substances
    "drugs",                   # Substances
    "offensive gesture",       # Gestures
    "violence",                # Violence
    "gore"                     # Gore
]

categories = {
    "Attire": [0],
    "Sexual Content": [1, 2],
    "Substances": [3, 4, 5, 6],
    "Gestures": [7],
    "Violence": [8],
    "Gore": [9]
}

# Set the folder path containing the images
folder_path = "posts"

# Define fixed column widths for the HTML table
col1_width = "150px"  # for dimension and rating
col2_width = "250px"  # for prompt
col3_width = "80px"   # for probability

# Loop through each row in df_clip and display annotations alongside raw probabilities from df_prob
for idx, clip_row in df_clip.iterrows():
    image_name = clip_row["Image"]
    
    # Construct full image path by checking common extensions
    found_image = None
    for ext in [".jpg", ".jpeg", ".png"]:
        candidate = os.path.join(folder_path, f"{image_name}{ext}")
        if os.path.exists(candidate):
            found_image = candidate
            break
    if found_image is None:
        continue

    # Retrieve corresponding row from df_prob
    prob_df = df_prob[df_prob["Image"] == image_name]
    if prob_df.empty:
        continue
    prob_row = prob_df.iloc[0]
    
    # Build the HTML table for annotations
    table_html = "<table style='font-size:14px; border-collapse: collapse;'>"
    # For each category, add rows for each prompt in that category.
    for category, indices in categories.items():
        cat_rating = clip_row[category]  # the CLIP rating (e.g., major, minor, no)
        for j, idx_val in enumerate(indices):
            prompt_text = text_prompts[idx_val]
            raw_prob = prob_row[prompt_text]
            if j == 0:
                # For the first prompt, show the category and its rating in the first column.
                table_html += (
                    "<tr>"
                    f"<td style='padding: 4px; vertical-align: top; width: {col1_width};'><b>{category}</b>: {cat_rating}</td>"
                    f"<td style='padding: 4px; vertical-align: top; width: {col2_width};'>{prompt_text}</td>"
                    f"<td style='padding: 4px; vertical-align: top; text-align: right; width: {col3_width};'>{raw_prob:0.2f}</td>"
                    "</tr>"
                )
            else:
                # For subsequent prompts, leave the first cell empty.
                table_html += (
                    "<tr>"
                    f"<td style='padding: 4px; vertical-align: top; width: {col1_width};'></td>"
                    f"<td style='padding: 4px; vertical-align: top; width: {col2_width};'>{prompt_text}</td>"
                    f"<td style='padding: 4px; vertical-align: top; text-align: right; width: {col3_width};'>{raw_prob:0.2f}</td>"
                    "</tr>"
                )
    # Append overall rating on its own row, placing it in the first column only.
    table_html += (
        f"<tr>"
        f"<td style='padding: 8px; vertical-align: top; width: {col1_width};'><b>Overall</b>: {clip_row['Overall']}</td>"
        f"<td style='padding: 8px; vertical-align: top; width: {col2_width};'></td>"
        f"<td style='padding: 8px; vertical-align: top; width: {col3_width};'></td>"
        f"</tr>"
    )
    table_html += "</table>"
    
    # Create the complete HTML block with the image on the left and the table on the right.
    html_block = f"""
    <div style="display: flex; align-items: flex-start; margin-bottom: 20px;">
      <div style="margin-right: 20px;">
         <img src="{found_image}" width="250">
      </div>
      <div style="max-width: 600px;">
         {table_html}
      </div>
    </div>
    <hr>
    """
    
    display(HTML(html_block))

0,1,2
Attire: no,skimpy clothing,0.0
Sexual Content: major,sexually suggestive,0.01
,sexual content,0.82
Substances: no,smoking,0.0
,cigarette,0.01
,alcohol,0.01
,drugs,0.09
Gestures: no,offensive gesture,0.01
Violence: no,violence,0.05
Gore: no,gore,0.0


0,1,2
Attire: no,skimpy clothing,0.01
Sexual Content: no,sexually suggestive,0.03
,sexual content,0.01
Substances: no,smoking,0.01
,cigarette,0.02
,alcohol,0.0
,drugs,0.01
Gestures: no,offensive gesture,0.06
Violence: minor,violence,0.31
Gore: major,gore,0.54


0,1,2
Attire: no,skimpy clothing,0.01
Sexual Content: minor,sexually suggestive,0.04
,sexual content,0.3
Substances: minor,smoking,0.09
,cigarette,0.02
,alcohol,0.44
,drugs,0.03
Gestures: no,offensive gesture,0.06
Violence: no,violence,0.02
Gore: no,gore,0.0


0,1,2
Attire: no,skimpy clothing,0.0
Sexual Content: minor,sexually suggestive,0.04
,sexual content,0.11
Substances: no,smoking,0.04
,cigarette,0.01
,alcohol,0.01
,drugs,0.03
Gestures: major,offensive gesture,0.7
Violence: no,violence,0.02
Gore: no,gore,0.04


0,1,2
Attire: major,skimpy clothing,0.82
Sexual Content: no,sexually suggestive,0.01
,sexual content,0.01
Substances: minor,smoking,0.11
,cigarette,0.03
,alcohol,0.0
,drugs,0.0
Gestures: no,offensive gesture,0.02
Violence: no,violence,0.0
Gore: no,gore,0.0


0,1,2
Attire: minor,skimpy clothing,0.22
Sexual Content: minor,sexually suggestive,0.19
,sexual content,0.01
Substances: minor,smoking,0.23
,cigarette,0.31
,alcohol,0.01
,drugs,0.01
Gestures: no,offensive gesture,0.01
Violence: no,violence,0.0
Gore: no,gore,0.01


0,1,2
Attire: no,skimpy clothing,0.0
Sexual Content: no,sexually suggestive,0.04
,sexual content,0.0
Substances: no,smoking,0.04
,cigarette,0.05
,alcohol,0.0
,drugs,0.03
Gestures: major,offensive gesture,0.82
Violence: no,violence,0.01
Gore: no,gore,0.01


0,1,2
Attire: no,skimpy clothing,0.0
Sexual Content: major,sexually suggestive,0.66
,sexual content,0.29
Substances: no,smoking,0.0
,cigarette,0.0
,alcohol,0.0
,drugs,0.0
Gestures: no,offensive gesture,0.0
Violence: no,violence,0.02
Gore: no,gore,0.02


### CLIP Results
The probabilities given by CLIP vary in quality, producing good annontations for certain images, but awful ones for others.  Lets see how it compares to the human annotations.  The table below shows the number of dimensions that didn't match for each images, and if the overall ratings matched.

In [11]:
# List of dimension columns to compare
dimensions_list = ["Attire", "Sexual Content", "Gestures", "Substances", "Violence", "Gore"]

results = []

# Loop through each row in the human annotation DataFrame
for idx, human_row in df_human.iterrows():
    image_name = str(human_row["Image"]).strip()  # e.g., "post1"
    
    # Find the corresponding row in df_clip
    clip_matches = df_clip[df_clip["Image"] == image_name]
    if clip_matches.empty:
        continue
    clip_row = clip_matches.iloc[0]
    
    # Count the number of dimensions that do not match
    discrepancy_count = 0
    for dim in dimensions_list:
        human_val = str(human_row[dim]).strip().lower()
        clip_val = str(clip_row[dim]).strip().lower()
        if human_val != clip_val:
            discrepancy_count += 1
    
    # Compare the overall rating (case-insensitive)
    overall_match = (str(human_row["Overall"]).strip().lower() == str(clip_row["Overall"]).strip().lower())
    
    results.append({
        "Image": image_name,
        "dimensions": discrepancy_count,
        "overall": overall_match
    })

# Create a DataFrame from the results and display it
df_diff = pd.DataFrame(results)

In [14]:
HTML(df_diff.to_html(index=False))

Image,dimensions,overall
post1,1,False
post2,1,True
post3,2,True
post4,2,False
post5,2,True
post6,1,False
post7,0,True
post8,1,True


### Conclusion
Although CLIP, when properly prompted, performs reasonably well at classifying the labels in this project, it still produces numerous errors even on a small sample. In its current form, the model is not suitable for unsupervised classification of posts, as even a single inappropriate post can significantly damage a brand’s reputation and negatively impact children. While the model shows promise, with many classifications aligning with human annotators, it would require significant fine-tuning on high-quality data to be practically useful. Moreover, perceptions of what is inappropriate are inherently subjective, evolving over time and varying by demographic. A robust team of human annotators is essential not only to capture these shifting standards and analyze trends but also to generate the reliable training data necessary for future models. Human annotation will thus remain a vital component of content moderation, ensuring safe spaces while enabling models to eventually operate with reduced oversight.