<a href="https://colab.research.google.com/github/sisaruiz/tesi/blob/main/KOCD_survey_results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# KOCD Survey Results Analysis
This notebook presents the workflow adopted to analyse the responses collected through the human evaluation questionnaire on the Kaggle Oral Cancer Dataset (KOCD).

The experimental design, including the number of images, classes, and generation techniques, is described in detail in the thesis document. Here, we focus exclusively on the analysis of the collected responses.

The objective of this analysis is to assess human performance in two related tasks:

*   the classification of oral cavity images into cancerous and non-cancerous categories;
*   the discrimination between real images and AI-generated ones.

The pipeline process can be divided into 4 phases:
1.   data loading
2.   data cleaning
3.   data restructuring
4.   metric computation

The same pipeline will be adopted later for analysing the results of the PhotoMOCI survey.

In [None]:
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt

## Data Loading

In [None]:
import pandas as pd

responses_raw = pd.read_csv("Survey KOCD.csv")
keys_raw = pd.read_csv("Survey KOCD (Keys) - Foglio1.csv")

print("Responses shape:", responses_raw.shape)
print("Keys shape:", keys_raw.shape)

responses_raw.head()


Responses shape: (15, 722)
Keys shape: (120, 6)


Unnamed: 0,Informazioni cronologiche,Risultato totale,Q1A: Which class does the image shown in the figure belong to?,Q1A: Which class does the image shown in the figure belong to? [Punteggio],Q1A: Which class does the image shown in the figure belong to? [Feedback],Q1B: Is the image shown in the figure AI generated or real?,Q1B: Is the image shown in the figure AI generated or real? [Punteggio],Q1B: Is the image shown in the figure AI generated or real? [Feedback],Q2A: Which class does the image shown in the figure belong to?,Q2A: Which class does the image shown in the figure belong to? [Punteggio],...,Q119A: Which class does the image shown in the figure belong to? [Feedback],Q119B: Is the image shown in the figure AI generated or real?,Q119B: Is the image shown in the figure AI generated or real? [Punteggio],Q119B: Is the image shown in the figure AI generated or real? [Feedback],Q120A: Which class does the image shown in the figure belong to?,Q120A: Which class does the image shown in the figure belong to? [Punteggio],Q120A: Which class does the image shown in the figure belong to? [Feedback],Q120B: Is the image shown in the figure AI generated or real?,Q120B: Is the image shown in the figure AI generated or real? [Punteggio],Q120B: Is the image shown in the figure AI generated or real? [Feedback]
0,2025/11/28 8:24:28 PM CET,7.00 / 240,Cancerous,0.00 / 1,,AI generated,1.00 / 1,,,,...,,,,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
1,2025/11/29 10:27:19 AM CET,142.00 / 240,Cancerous,0.00 / 1,,AI generated,1.00 / 1,,Cancerous,0.00 / 1,...,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,,Real,0.00 / 1,
2,2025/12/07 12:27:58 AM CET,194.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
3,2025/12/07 4:59:28 PM CET,194.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
4,2025/12/13 11:42:30 AM CET,172.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,


## Data Cleaning

Before performing the analysis, the collected responses are inspected in order to remove invalid submissions. This is necessary since while creating the KOCD questionnaire, the form was completed twice by the author for testing purposes. Consequently, the first two rows of the response dataset are removed from the analysis.

In [None]:
responses = responses_raw.iloc[2:].copy()
responses

Unnamed: 0,Informazioni cronologiche,Risultato totale,Q1A: Which class does the image shown in the figure belong to?,Q1A: Which class does the image shown in the figure belong to? [Punteggio],Q1A: Which class does the image shown in the figure belong to? [Feedback],Q1B: Is the image shown in the figure AI generated or real?,Q1B: Is the image shown in the figure AI generated or real? [Punteggio],Q1B: Is the image shown in the figure AI generated or real? [Feedback],Q2A: Which class does the image shown in the figure belong to?,Q2A: Which class does the image shown in the figure belong to? [Punteggio],...,Q119A: Which class does the image shown in the figure belong to? [Feedback],Q119B: Is the image shown in the figure AI generated or real?,Q119B: Is the image shown in the figure AI generated or real? [Punteggio],Q119B: Is the image shown in the figure AI generated or real? [Feedback],Q120A: Which class does the image shown in the figure belong to?,Q120A: Which class does the image shown in the figure belong to? [Punteggio],Q120A: Which class does the image shown in the figure belong to? [Feedback],Q120B: Is the image shown in the figure AI generated or real?,Q120B: Is the image shown in the figure AI generated or real? [Punteggio],Q120B: Is the image shown in the figure AI generated or real? [Feedback]
2,2025/12/07 12:27:58 AM CET,194.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
3,2025/12/07 4:59:28 PM CET,194.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
4,2025/12/13 11:42:30 AM CET,172.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
5,2025/12/15 3:38:24 AM CET,191.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
6,2025/12/15 10:56:38 AM CET,168.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
7,2025/12/19 5:14:33 PM CET,185.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
8,2025/12/19 5:15:31 PM CET,190.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
9,2025/12/19 5:19:40 PM CET,198.00 / 240,Non Cancerous,1.00 / 1,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
10,2025/12/19 5:21:45 PM CET,175.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,
11,2025/12/19 5:25:29 PM CET,166.00 / 240,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,,Non Cancerous,1.00 / 1,...,,Real,0.00 / 1,,Non Cancerous,1.00 / 1,,AI generated,1.00 / 1,


We then remove the columns related to feedback, which are not implemented in the questionnaire but are automatically added to the exported spreadsheet by Google Forms.
The timestamp and total score columns are also removed, as they are not relevant to the analysis.
Finally, all columns containing question label values are discarded, since this information can be derived from the score assigned to each question (see columns containing the string *[Punteggio]* and from the ground truth key file, which is loaded at a later stage.

In [None]:
# Keep only score columns
score_cols = [c for c in responses.columns if "punteggio" in c.lower()]
responses = responses[score_cols]

responses

Unnamed: 0,Q1A: Which class does the image shown in the figure belong to? [Punteggio],Q1B: Is the image shown in the figure AI generated or real? [Punteggio],Q2A: Which class does the image shown in the figure belong to? [Punteggio],Q2B: Is the image shown in the figure AI generated or real? [Punteggio],Q3A: Which class does the image shown in the figure belong to? [Punteggio],Q3B: Is the image shown in the figure AI generated or real? [Punteggio],Q4A: Which class does the image shown in the figure belong to? [Punteggio],Q4B: Is the image shown in the figure AI generated or real? [Punteggio],Q5A: Which class does the image shown in the figure belong to? [Punteggio],Q5B: Is the image shown in the figure AI generated or real? [Punteggio],...,Q116A: Which class does the image shown in the figure belong to? [Punteggio],Q116B: Is the image shown in the figure AI generated or real? [Punteggio],Q117A: Which class does the image shown in the figure belong to? [Punteggio],Q117B: Is the image shown in the figure AI generated or real? [Punteggio],Q118A: Which class does the image shown in the figure belong to? [Punteggio],Q118B: Is the image shown in the figure AI generated or real? [Punteggio],Q119A: Which class does the image shown in the figure belong to? [Punteggio],Q119B: Is the image shown in the figure AI generated or real? [Punteggio],Q120A: Which class does the image shown in the figure belong to? [Punteggio],Q120B: Is the image shown in the figure AI generated or real? [Punteggio]
2,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
3,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1
4,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
5,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
6,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1
7,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,...,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
8,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,...,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
9,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
10,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,1.00 / 1
11,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,...,1.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1,0.00 / 1,0.00 / 1,1.00 / 1,1.00 / 1


Score values exported by Google Forms are converted to integers to enable quantitative analysis.

In [None]:
def parse_score(value):
    if isinstance(value, str):
        return int(float(value.split("/")[0].strip()))
    return value

responses = responses.applymap(parse_score)

responses.head()

  responses = responses.applymap(parse_score)


Unnamed: 0,Q1A: Which class does the image shown in the figure belong to? [Punteggio],Q1B: Is the image shown in the figure AI generated or real? [Punteggio],Q2A: Which class does the image shown in the figure belong to? [Punteggio],Q2B: Is the image shown in the figure AI generated or real? [Punteggio],Q3A: Which class does the image shown in the figure belong to? [Punteggio],Q3B: Is the image shown in the figure AI generated or real? [Punteggio],Q4A: Which class does the image shown in the figure belong to? [Punteggio],Q4B: Is the image shown in the figure AI generated or real? [Punteggio],Q5A: Which class does the image shown in the figure belong to? [Punteggio],Q5B: Is the image shown in the figure AI generated or real? [Punteggio],...,Q116A: Which class does the image shown in the figure belong to? [Punteggio],Q116B: Is the image shown in the figure AI generated or real? [Punteggio],Q117A: Which class does the image shown in the figure belong to? [Punteggio],Q117B: Is the image shown in the figure AI generated or real? [Punteggio],Q118A: Which class does the image shown in the figure belong to? [Punteggio],Q118B: Is the image shown in the figure AI generated or real? [Punteggio],Q119A: Which class does the image shown in the figure belong to? [Punteggio],Q119B: Is the image shown in the figure AI generated or real? [Punteggio],Q120A: Which class does the image shown in the figure belong to? [Punteggio],Q120B: Is the image shown in the figure AI generated or real? [Punteggio]
2,1,1,1,1,0,0,0,0,1,0,...,1,1,1,1,1,1,0,1,1,1
3,1,1,1,1,0,0,1,1,1,1,...,1,1,1,1,1,0,0,0,1,1
4,1,1,1,1,0,1,0,0,1,0,...,1,1,1,1,1,1,1,1,1,1
5,1,1,1,1,1,1,1,1,0,1,...,1,1,1,1,1,1,0,1,1,1
6,1,1,1,1,1,1,0,1,1,0,...,1,1,1,0,1,1,0,0,1,1


For ease of processing, column names are reduced to the corresponding question identifiers (QxA/QxB)

In [None]:
def simplify_column_name(col_name):
    match = re.search(r"(Q\d+[AB])", col_name)
    if match:
        return match.group(1)
    return col_name

responses = responses.rename(
    columns={c: simplify_column_name(c) for c in responses.columns}
)

responses.head()

Unnamed: 0,Q1A,Q1B,Q2A,Q2B,Q3A,Q3B,Q4A,Q4B,Q5A,Q5B,...,Q116A,Q116B,Q117A,Q117B,Q118A,Q118B,Q119A,Q119B,Q120A,Q120B
2,1,1,1,1,0,0,0,0,1,0,...,1,1,1,1,1,1,0,1,1,1
3,1,1,1,1,0,0,1,1,1,1,...,1,1,1,1,1,0,0,0,1,1
4,1,1,1,1,0,1,0,0,1,0,...,1,1,1,1,1,1,1,1,1,1
5,1,1,1,1,1,1,1,1,0,1,...,1,1,1,1,1,1,0,1,1,1
6,1,1,1,1,1,1,0,1,1,0,...,1,1,1,0,1,1,0,0,1,1
