# **Turing Test**

This notebook highlights how we selected the questionnaires for the Turing Test and the got results.

In [1]:
import os
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [2]:
PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), os.pardir))

## **Selecting the best questionnaires**

The selection of these questionnaires was based on specific criteria: the first questionnaire was chosen for its high intra-questionnaire similarity; the second was selected for its strong semantic similarity; and the third was identified as one of the best based on its serendipity measures.

What submitted during the test is in [this folder](/results/turing_test/submitted).

In [4]:
result_path = os.path.join(PROJECT_ROOT, "results")
intraqst_sim_filename = "Intraquestionnaire_Syntactic_Similarity.csv"
sem_sim_filename = "SemanticSimilarity_questionnaires.csv"
serendipity_filename = "Serendipity_Scores.csv"

intraqst_sims = pd.DataFrame()
sem_sims = pd.DataFrame()
serendipity = pd.DataFrame()



for subfolder in os.listdir(result_path):
        experiment_path = os.path.join(result_path, subfolder)
        
        if os.path.isdir(experiment_path):
            try:
                intraqst_sim_scores = pd.read_csv(os.path.join(experiment_path, intraqst_sim_filename))
                semsim_scores = pd.read_csv(os.path.join(experiment_path, sem_sim_filename))
                serendipity_scores = pd.read_csv(os.path.join(experiment_path, serendipity_filename))
                
                intraqst_sim_scores["EXPERIMENT_ID"] = [subfolder] * len(intraqst_sim_scores)
                semsim_scores["EXPERIMENT_ID"] = [subfolder] * len(semsim_scores)
                serendipity_scores["EXPERIMENT_ID"] = [subfolder] * len(serendipity_scores)

                intraqst_sims = pd.concat([intraqst_sims, intraqst_sim_scores])
                sem_sims = pd.concat([sem_sims, semsim_scores])
                serendipity = pd.concat([serendipity, serendipity_scores])
            except FileNotFoundError:
                print(f"Files not found for experiment: {subfolder}")
                continue
                          

  intraqst_sims = pd.concat([intraqst_sims, intraqst_sim_scores])
  intraqst_sims = pd.concat([intraqst_sims, intraqst_sim_scores])


Files not found for experiment: 0s_gpt-35-turbo-dev_6000MT_0T_1FP
Files not found for experiment: failures_log
Files not found for experiment: turing_test


In [4]:
intraqst_sims[intraqst_sims["ROUGE_L_F1"] >= 0.14][["EXPERIMENT_ID", "QUESTIONNAIRE_ID", "ROUGE_L_F1"]]

Unnamed: 0,EXPERIMENT_ID,QUESTIONNAIRE_ID,ROUGE_L_F1
0,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0.5FP,1173005,0.159395
1,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0.5FP,1213249,0.250000
2,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0.5FP,500100043,0.386168
0,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,1146002,0.277665
1,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,1146004,0.401602
...,...,...,...
1,1s_gpt-4-dev_4000MT_0T_1FP_JSON,500100007,0.333333
3,1s_gpt-4-dev_4000MT_0T_1FP_JSON,500100023,0.224831
4,1s_gpt-4-dev_4000MT_0T_1FP_JSON,500100028,0.416469
5,1s_gpt-4-dev_4000MT_0T_1FP_JSON,500100029,0.358333


In [22]:
sem_sims[sem_sims["FINAL_SCORE_MEAN"].max() == sem_sims["FINAL_SCORE_MEAN"]][["EXPERIMENT_ID", "QUESTIONNAIRE_ID", "FINAL_SCORE_MEAN"]]

Unnamed: 0,EXPERIMENT_ID,QUESTIONNAIRE_ID,FINAL_SCORE_MEAN
40,1s_gpt-35-turbo-dev_6000MT_0.25T_0FP,500100028,0.865941


In [6]:
serendipity[serendipity["SERENDIPITY_SCORE"].max() > serendipity["SERENDIPITY_SCORE"]][["EXPERIMENT_ID", "QUESTIONNAIRE_ID", "SERENDIPITY_SCORE"]].sort_values(by="SERENDIPITY_SCORE", ascending=False).head(20)

Unnamed: 0,EXPERIMENT_ID,QUESTIONNAIRE_ID,SERENDIPITY_SCORE
9,0s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,5100002,1.2
39,1s_FULL_gpt-4-dev_4000MT_0T_0FP_JSON,500100027,1.2
38,1s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,500100027,1.2
19,1s_FULL_gpt-4-dev_4000MT_0.5T_0FP_JSON,500100007,1.166667
8,1s_FULL_gpt-4-dev_4000MT_0.25T_0.5FP_JSON,5100002,1.1
39,1s_FULL_gpt-35-turbo-dev_6000MT_0.5T_0FP,500100027,1.1
49,1s_FULL_gpt-4-dev_4000MT_0.25T_0FP_JSON,500100037,1.1
9,1s_FULL_gpt-4-dev_4000MT_0.5T_0FP_JSON,5100002,1.1
38,1s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,500100027,1.1
9,1s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,5100002,1.1


In [7]:
serendipity[serendipity["SERENDIPITY_SCORE"] > 1][["EXPERIMENT_ID", "QUESTIONNAIRE_ID", "SERENDIPITY_SCORE"]]

Unnamed: 0,EXPERIMENT_ID,QUESTIONNAIRE_ID,SERENDIPITY_SCORE
9,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,5100002,1.1
39,0s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,500100027,1.3
39,0s_FULL_gpt-35-turbo-dev_6000MT_0.5T_0FP,500100027,1.1
9,0s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,5100002,1.2
39,0s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,500100027,1.3
19,0s_FULL_gpt-4-dev_4000MT_0.5T_0FP_JSON,500100007,1.083333
19,0s_FULL_gpt-4-dev_4000MT_0T_0FP_JSON,500100007,1.083333
38,1s_FULL_gpt-35-turbo-dev_6000MT_0.25T_0FP,500100027,1.1
39,1s_FULL_gpt-35-turbo-dev_6000MT_0.5T_0FP,500100027,1.1
9,1s_FULL_gpt-35-turbo-dev_6000MT_0T_0FP,5100002,1.1


In [8]:
serendipity[serendipity["EXPERIMENT_ID"] == "1s_gpt-4-dev_4000MT_0.5T_0FP_JSON"]

Unnamed: 0,QUESTIONNAIRE_ID,SERENDIPITY_SCORE,EXPERIMENT_ID
0,1146002,1.000000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
1,1146004,0.875000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
2,1173001,0.875000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
3,1173002,0.666667,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
4,1173003,0.750000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
...,...,...,...
74,500100062,0.777778,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
75,500100063,1.000000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
76,500100064,0.900000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON
77,500100065,0.900000,1s_gpt-4-dev_4000MT_0.5T_0FP_JSON


## **Results**

Here we present the analysis of the collected answers in order to catch useful insights about which are the characteristics took into account by involved users.

The collected results are in [this folder](/results/turing_test/answers/).

In [9]:
ANSWERS_PATH = os.path.join(PROJECT_ROOT, "results", "turing_test", "answers")

KICKOFF_MEETING_PATH = os.path.join(ANSWERS_PATH, "KickOffMeeting.csv")
EMPLOYEE_FEEDBACK_PATH = os.path.join(ANSWERS_PATH, "EmployeeFeedback.csv")
STRESS_TOLERANCE_PATH = os.path.join(ANSWERS_PATH, "StressTolerance.csv")

In [10]:
km_results = pd.read_csv(KICKOFF_MEETING_PATH)
ef_results = pd.read_csv(EMPLOYEE_FEEDBACK_PATH)
st_results = pd.read_csv(STRESS_TOLERANCE_PATH)

In [11]:
km_results

Unnamed: 0,ID,Start time,Completion time,Email,Name,Company Kick-off - Which Questionnaire is AI Generated?,"Why the questionnaire you chose is, according to you, AI generated?"
0,1,6/28/24 12:14:28,6/28/24 12:14:42,anonymous,,B,Language style;
1,2,6/28/24 12:14:21,6/28/24 12:14:42,anonymous,,B,Variability of questions;
2,3,6/28/24 12:14:22,6/28/24 12:14:43,anonymous,,B,Language style;Consistency between question an...
3,4,6/28/24 12:14:21,6/28/24 12:14:55,anonymous,,A,Variability of response types;Language style;
4,5,6/28/24 12:14:26,6/28/24 12:14:55,anonymous,,A,Consistency between question and related answe...
5,6,6/28/24 12:14:20,6/28/24 12:14:57,anonymous,,B,Variability of answers;
6,7,6/28/24 12:14:23,6/28/24 12:14:58,anonymous,,B,Questions sequence/order;
7,8,6/28/24 12:14:22,6/28/24 12:15:06,anonymous,,A,Questions sequence/order;Variability of respon...
8,9,6/28/24 12:14:49,6/28/24 12:15:13,anonymous,,A,Relevance to topic;Consistency between questio...
9,10,6/28/24 12:14:20,6/28/24 12:15:18,anonymous,,B,Variability of questions;Variability of answer...


In [12]:
ef_results

Unnamed: 0,ID,Start time,Completion time,Email,Name,Employee's feedback - Which Questionnaire is AI Generated?,"Why the questionnaire you chose is, according to you, AI generated?"
0,1,6/28/24 12:17:01,6/28/24 12:17:19,anonymous,,B,Variability of questions;Language style;
1,2,6/28/24 12:17:01,6/28/24 12:17:20,anonymous,,A,Consistency between question and response;Lang...
2,3,6/28/24 12:17:01,6/28/24 12:17:27,anonymous,,A,Variability of questions;
3,4,6/28/24 12:17:04,6/28/24 12:17:27,anonymous,,A,
4,5,6/28/24 12:17:00,6/28/24 12:17:31,anonymous,,A,Consistency between question and response;
5,6,6/28/24 12:17:01,6/28/24 12:17:36,anonymous,,A,Relevance to topic;
6,7,6/28/24 12:17:01,6/28/24 12:17:36,anonymous,,A,Language style;Variability of questions;
7,8,6/28/24 12:17:00,6/28/24 12:17:37,anonymous,,B,Variability of responses;
8,9,6/28/24 12:17:00,6/28/24 12:17:41,anonymous,,B,Language style;Consistency between question an...
9,10,6/28/24 12:17:00,6/28/24 12:17:47,anonymous,,B,Language style;Variability of response types;


In [13]:
st_results

Unnamed: 0,ID,Start time,Completion time,Email,Name,Stress tolerance feedback - Which Questionnaire is AI Generated?,"Why the questionnaire you chose is, according to you, AI generated?"
0,1,6/28/24 12:19:53,6/28/24 12:20:08,anonymous,,B,Language style;
1,2,6/28/24 12:19:51,6/28/24 12:20:10,anonymous,,B,Variability of questions;Relevance to topic;
2,3,6/28/24 12:19:51,6/28/24 12:20:14,anonymous,,B,Language style;Relevance to topic;
3,4,6/28/24 12:19:51,6/28/24 12:20:14,anonymous,,B,Variability of questions;Variability of answer...
4,5,6/28/24 12:19:51,6/28/24 12:20:15,anonymous,,A,Language style;Variability of response types;
5,6,6/28/24 12:19:55,6/28/24 12:20:16,anonymous,,B,Consistency between question and response;
6,7,6/28/24 12:19:51,6/28/24 12:20:16,anonymous,,B,Variability of answers;Variability of question...
7,8,6/28/24 12:19:52,6/28/24 12:20:19,anonymous,,A,Language style;
8,9,6/28/24 12:19:51,6/28/24 12:20:19,anonymous,,B,Language style;
9,10,6/28/24 12:19:51,6/28/24 12:20:23,anonymous,,A,Variability of response types;


In [14]:
def plot_questionnaire_analysis(df, topic, choice_column, reason_column, correct_answer, width=800, height=600):
    # Plot 1: Distribution of AI Generated Questionnaire Choices
    choices_count = df[choice_column].value_counts().reset_index()
    choices_count.columns = [choice_column, 'Count']
    choices_count['Correct'] = choices_count[choice_column] == correct_answer

    fig1 = px.bar(
        choices_count, 
        x=choice_column, 
        y='Count', 
        color='Correct',
        color_discrete_map={True: 'green', False: 'red'},
        title=f'{topic} - Distribution of AI Generated Questionnaire Choices',
        labels={choice_column: 'Questionnaire', 'Count': 'Count of Respondents'}, 
        text='Count'
    )
    fig1.update_traces(texttemplate='%{text:.2s}', textposition='outside')
    fig1.update_layout(
        uniformtext_minsize=8, 
        uniformtext_mode='hide',
        width=width,
        height=height
    )

    # Plot 2: Distribution of Reasons Grouped by Questionnaire Choice
    df['Reasons'] = df[reason_column].str.split(';')
    df_exploded = df.explode('Reasons')
    df_exploded['Reasons'] = df_exploded['Reasons'].str.strip()
    df_exploded = df_exploded[df_exploded['Reasons'] != '']

    reason_counts = df_exploded.groupby([choice_column, 'Reasons']).size().reset_index(name='Count')

    fig2 = px.bar(
        reason_counts, 
        x=choice_column, 
        y='Count', 
        color='Reasons', 
        title=f'{topic} - Distribution of Reasons for AI Generated Questionnaire Choices',
        labels={choice_column: 'Questionnaire', 'Count': 'Count of Reasons'}, 
        text='Count'
    )
    fig2.update_traces(texttemplate='%{text:.2s}', textposition='outside')
    fig2.update_layout(
        uniformtext_minsize=8, 
        uniformtext_mode='hide', 
        barmode='stack',
        width=width,
        height=height
    )

    return fig1, fig2

In [15]:
fig1, fig2 = plot_questionnaire_analysis(km_results, "Company Kick-off", "Company Kick-off - Which Questionnaire is AI Generated?", "Why the questionnaire you chose is, according to you, AI generated?", "B")

fig1.show()
fig2.show()

In [16]:
fig1, fig2 = plot_questionnaire_analysis(ef_results, "Employee's feedback", "Employee's feedback - Which Questionnaire is AI Generated?", "Why the questionnaire you chose is, according to you, AI generated?", "A")

fig1.show()
fig2.show()

In [17]:
fig1, fig2 = plot_questionnaire_analysis(st_results, "Stress tolerance", "Stress tolerance feedback - Which Questionnaire is AI Generated?", "Why the questionnaire you chose is, according to you, AI generated?", "B")

fig1.show()
fig2.show()