# Set up environment (Google Colab)

In [2]:
!pip install selenium
!pip install pandas
!pip install ollama

Collecting selenium
  Downloading selenium-4.39.0-py3-none-any.whl.metadata (7.5 kB)
Collecting trio<1.0,>=0.31.0 (from selenium)
  Downloading trio-0.32.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket<1.0,>=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting outcome (from trio<1.0,>=0.31.0->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket<1.0,>=0.12.2->selenium)
  Downloading wsproto-1.3.2-py3-none-any.whl.metadata (5.2 kB)
Downloading selenium-4.39.0-py3-none-any.whl (9.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.7/9.7 MB[0m [31m32.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trio-0.32.0-py3-none-any.whl (512 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m512.0/512.0 kB[0m [31m42.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trio_websocket-0.12.2-py3-none-any.whl (21 kB)
Downloadin

In [8]:
!sudo apt update
!sudo apt install -y pciutils
!curl -fsSL https://ollama.com/install.sh | sh

[33m0% [Working][0m            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
[33m0% [Connecting to archive.ubuntu.com (185.125.190.83)] [Connecting to security.[0m[33m0% [Connecting to archive.ubuntu.com (185.125.190.83)] [Connecting to security.[0m                                                                               Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 https://cli.github.com/packages stable InRelease
Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [83.6 kB]
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [2,201 kB]
Hit:6 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:7 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:8 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:9 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:10 

In [29]:
import threading
import subprocess
import time

def run_ollama_serve():
  subprocess.Popen(["ollama", "serve"])

thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)

# Prepare data/files for testing

In [30]:
import ollama
import json as js
import pandas as pd
import requests

In [31]:
ollama.pull("qwen3-vl:8b")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [40]:
# Path to your image file
chatgpt_homepage_img_path = "images/chatgpt_loggedin.jpeg"
chatgpt_languages_img_path = "images/chatgpt_languages.jpeg"
claude_homepage_img_path = "images/claude_loggedin.jpeg"
claude_languages_img_path = "images/claude_languages.jpeg"

In [41]:
# Open the image file in binary read mode
with open(chatgpt_homepage_img_path, "rb") as f:
    chatgpt_bytes = f.read()

with open(chatgpt_languages_img_path, "rb") as f:
    chatgpt_languages_bytes = f.read()

with open(claude_homepage_img_path, "rb") as f:
    claude_bytes = f.read()

with open(claude_languages_img_path, "rb") as f:
    claude_languages_bytes = f.read()

# Test L4 Indicators using Qwen3-vl:2B - ChatGPT

## Create an empty dataframe for storing test results

In [63]:
chat_gpt_test_results_df = pd.DataFrame(columns=["Website", "L4_Indicator", "Assigned_Score", "Max_Score", "Reasoning"])

## L3 Subdimension: The AI is accessible and inclusive across abilities and language

### L4: WCAG-aligned accessibility features available

This L4 category covers a wide variety of possible accessibility features as defined in the [WCAG 2.1 Guidelines](https://www.w3.org/TR/WCAG21/). Given my work as an individual and not a group for this project, the scope of these guidelines is too broad to be sufficiently covered by this work.

Instead, I have chosen to evalute this L4 category by making it a composite score of all the following L4 categories that are evalutated by this notebook. The reasoning for this being that every rating that is being assigned is based on one or more off the Success Criterias listed in the WCAG version 2.1 guidelines, so they fall under the umbrella of "WCAG-aligned accessibility features."

This notebook will act as a proof of concept that additional guidelines and features can be implemented into this evaluation pipeline, following the same structure and build up this composite WCAG-alignment score, so long as the ratings are based on WCAG-guidance as I have done for this project.

As a result, this section will be evaluated again at the end of this notebook once all other L4s have been scored. It is placed here for the time being to remain within the logical grouping as defined in the AI Ethics Index Tree (under the L3 Subdimension: The AI is accessible and inclusive accross abilities and langauges).

### L4: Multilingual support for intended locales

In [64]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. help page for ChatGPT regarding what languages the platform is localized into. Your task is to scan the HTML page for the list of supported langauges, count the number, then score the page based on the number of supported websites on the following scale:

0: The website supports 5 or fewer languages
1: The website supports 10 or fewer languages
2: The website supports 20 or fewer languages
3: The website supports 32 or fewer languages
4: The website supports 33 or more languages.

The response should be written in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Multilingual support for intended locales",
        "Assigned_Score": ,
        "Max_Score": 4,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_languages_bytes]
)

# Save the model's response as json
eval_result = response['response']

Print the results of the prompt for easy reading

In [65]:
print(eval_result)

{
    "results": [
        {
            "Website": "ChatGPT",
            "L4_Indicator": "Multilingual support for intended locales",
            "Assigned_Score": 4,
            "Max_Score": 4,
            "Reasoning": "The page lists 39 supported languages, which meets the criterion for 33 or more languages (score 4)."
        }
    ]
}


Append results to the final dataframe for comparison later

In [66]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Multilingual support for intended locales',
   'Assigned_Score': 4,
   'Max_Score': 4,
   'Reasoning': 'The page lists 39 supported languages, which meets the criterion for 33 or more languages (score 4).'}]}

In [67]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."


In [68]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df,new_df])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."


### L4: Motor and voice accessibility options (voice input, large targets, reduced precision)

This L4 indicator is functionally identical to the L4 Indicator "Senior motor and voice accessibility options (voice inputs, large targets, reduced precisions.) so scores given to that L4 will be duplicated for this L4 category.

## L3 Subdimension: The AI supports accessibility & cognitive-load mitigation for seniors

### L4: Senior focused simplification and stepwise assistance available

In [69]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of help menu functionality on this home page based on the unmodified presentation of the homepage upon accessing it.

Based only on what you see in the image of the homepage, rate the accessibility of help functionality on the following scale ranging from 0 to 2:

0: There are no obvious help icons or links present on the homepage.
1: There are help icons or links available on the page, but they are not immediately obvious (for example, they are presented with only an icon and no text, or are in a very small size, smaller than 44 by 44 CSS pixels).
2: A help icon or link is available on the home page and is easily visible with text and in a size larger than 44 by 44 CSS pixels.

The response should be in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Senior-focused simplification & stepwise guidance availabile",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

Print the results of the prompt for easy reading

In [70]:
print(eval_result)

{
    "results": [
        {
            "Website": "ChatGPT",
            "L4_Indicator": "Senior-focused simplification & stepwise guidance availabile",
            "Assigned_Score": 0,
            "Max_Score": 2,
            "Reasoning": "No help icons or links are present on the homepage. The top-right corner contains user and settings icons, but these do not constitute a help functionality and are not accompanied by descriptive text or sized appropriately for accessibility."
        }
    ]
}


In [71]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Senior-focused simplification & stepwise guidance availabile',
   'Assigned_Score': 0,
   'Max_Score': 2,
   'Reasoning': 'No help icons or links are present on the homepage. The top-right corner contains user and settings icons, but these do not constitute a help functionality and are not accompanied by descriptive text or sized appropriately for accessibility.'}]}

In [72]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...


In [73]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df,new_df])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...


### L4: Senior motor and voice accessibility options (voice input, large targets, reduced precision)

In [74]:
l4_results = []

In [75]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of motor accessibility and voice input options.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All input targets (buttons, links, images, etc.) are below the WCAG recommended size of 44 by 44 CSS pixels.
1: Some input targets (buttons, links, images, etc.) are below the WCAG recommended size of 44 by 44 CSS pixels.
2: All input targets (buttons, links, images, etc.) are at or above the WCAG recommended size of 44 by 44 CSS pixels.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Senior motor and voice accessibility options (voice input, large targets, reduced precision)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [76]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': "The microphone icon for voice input and the '+' button next to 'Ask anything' are likely smaller than 44x44 CSS pixels, indicating some input targets are below the WCAG recommended size for motor accessibility."}]}

In [77]:
l4_results.append(out)

In [78]:
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of motor accessibility and voice input options.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: The website provides no visible voice input accessibility options.
1: The website provides a voice input mode, but does not indicate it clearly (ex. uses an image but does not label it with text)
2: The website provides a voice input mode that is clearly identifiable by both image and text.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Senior motor and voice accessibility options (voice input, large targets, reduced precision)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [79]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': "The page includes a microphone icon (image) for voice input, but there is no explicit text label indicating its function (e.g., 'Voice Input' or 'Speak'). While the microphone icon is a common visual cue for voice input, the lack of accompanying text means it does not clearly indicate the voice input mode as per the criteria for score 2."}]}

In [80]:
l4_results.append(out)

Aggregate scores and reasoning for the different subsections evaluated for this L4 indicator

In [81]:
combined_results = {}
combined_results["Website"] = ""
combined_results["L4_Indicator"] = ""
combined_results["Assigned_Score"] = 0
combined_results["Max_Score"] = 0
combined_results["Reasoning"] = ""

for json_obj in l4_results:
    for result in json_obj["results"]:
        for field in result:
            if field == "Assigned_Score":
                combined_results[field] += result[field]
            elif field == "Reasoning":
                combined_results[field] += " "
                combined_results[field] += result[field]
            elif field == "Max_Score":
                combined_results[field] += result[field]
            else:
                combined_results[field] = result[field]

print(combined_results)

{'Website': 'ChatGPT', 'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)', 'Assigned_Score': 2, 'Max_Score': 4, 'Reasoning': " The microphone icon for voice input and the '+' button next to 'Ask anything' are likely smaller than 44x44 CSS pixels, indicating some input targets are below the WCAG recommended size for motor accessibility. The page includes a microphone icon (image) for voice input, but there is no explicit text label indicating its function (e.g., 'Voice Input' or 'Speak'). While the microphone icon is a common visual cue for voice input, the lack of accompanying text means it does not clearly indicate the voice input mode as per the criteria for score 2."}


In [82]:
new_df = pd.DataFrame(combined_results, index=[0])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...


In [83]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df,new_df])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
0,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...


### L4: Plain-language & readability thresholds met (senior-appropriate)

In [84]:
l4_results = []

In [85]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text items have line spacing below 1.5 times the font size.
1: Some of the visible text items have line spacing below 1.5 times the font size.
2: All of the visible text items have line spacing at or above 1.5 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [86]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ChatGPT",\n            "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n            "Assigned_Score": 2,\n            "Max_Score": 2,\n            "Reasoning": "All visible text items with multiple lines (e.g., sidebar menu items, \'Greg Knapp\' and \'Free\' in the bottom left) exhibit line spacing at or above 1.5 times the font size. Single-line text elements (e.g., \'What\'s on your mind today?\', \'Ask anything\', \'Get Plus\') do not require line spacing evaluation, as line spacing applies to multi-line text. No visible text items violate the line spacing threshold of 1.5x font size, resulting in a maximum score of 2."\n        }\n    ]\n}'

In [87]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "All visible text items with multiple lines (e.g., sidebar menu items, 'Greg Knapp' and 'Free' in the bottom left) exhibit line spacing at or above 1.5 times the font size. Single-line text elements (e.g., 'What's on your mind today?', 'Ask anything', 'Get Plus') do not require line spacing evaluation, as line spacing applies to multi-line text. No visible text items violate the line spacing threshold of 1.5x font size, resulting in a maximum score of 2."}]}

In [88]:
l4_results.append(out)

In [89]:
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text items have spacing between paragraphs below 2 times the font size.
1: Some of the visible text items have spacing between paragraphs below 2 times the font size.
2: All of the visible text items have spacing between paragraphs at or above 2 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [90]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ChatGPT",\n            "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n            "Assigned_Score": 0,\n            "Max_Score": 2,\n            "Reasoning": "All visible text items (sidebar menu items, main heading, and input field placeholder) have spacing between paragraphs below 2 times the font size. The spacing between menu items in the left sidebar, between the main heading and input field, and other adjacent text elements consistently fall below the 2x font size threshold."\n        }\n    ]\n}'

In [91]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 0,
   'Max_Score': 2,
   'Reasoning': 'All visible text items (sidebar menu items, main heading, and input field placeholder) have spacing between paragraphs below 2 times the font size. The spacing between menu items in the left sidebar, between the main heading and input field, and other adjacent text elements consistently fall below the 2x font size threshold.'}]}

In [92]:
l4_results.append(out)

In [93]:
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text characters have spacing between characters below 0.12 times the font size.
1: Some of the visible text characters have spacing between characters below 0.12 times the font size.
2: All of the visible text characters have spacing between characters at or above 0.12 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [96]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ChatGPT",\n            "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n            "Assigned_Score": 2,\n            "Max_Score": 2,\n            "Reasoning": "The visible text elements in the ChatGPT homepage (e.g., menu items like \'New chat\', heading text, and input prompts) display adequate character spacing (kerning) consistent with standard accessibility design practices. No visible text elements exhibit character spacing below 0.12 times the font size, as typical UI text styling for readability and accessibility ensures sufficient inter-character spacing."\n        }\n    ]\n}'

In [94]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "The visible text elements in the ChatGPT homepage (e.g., menu items like 'New chat', heading text, and input prompts) display adequate character spacing (kerning) consistent with standard accessibility design practices. No visible text elements exhibit character spacing below 0.12 times the font size, as typical UI text styling for readability and accessibility ensures sufficient inter-character spacing."}]}

In [95]:
l4_results.append(out)

In [97]:
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text words have spacing between words below 0.16 times the font size.
1: Some of the visible text words have spacing between words below 0.16 times the font size.
2: All of the visible text words have spacing between words below 0.16 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [100]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ChatGPT",\n            "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n            "Assigned_Score": 2,\n            "Max_Score": 2,\n            "Reasoning": "The visible text elements (e.g., \\"What\'s on your mind today?\\", \\"Ask anything\\", sidebar items like \\"New chat\\", \\"Search chats\\", and footer elements) show adequate word spacing. In standard UI design, proper word spacing typically meets or exceeds the 0.16× font size threshold for readability. The spacing between words appears sufficient, indicating no words fall below the 0.16× font size criterion, thus qualifying for a maximum score of 2."\n        }\n    ]\n}'

In [98]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': 'The visible text elements (e.g., "What\'s on your mind today?", "Ask anything", sidebar items like "New chat", "Search chats", and footer elements) show adequate word spacing. In standard UI design, proper word spacing typically meets or exceeds the 0.16× font size threshold for readability. The spacing between words appears sufficient, indicating no words fall below the 0.16× font size criterion, thus qualifying for a maximum score of 2.'}]}

In [99]:
l4_results.append(out)

In [101]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of the reading level of the text.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text is above a 7th grade reading level.
1: Some of the visible text is above a 7th grade reading level
2: All of the visible text is at or below a 7th grade reading level.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']


In [102]:
eval_result


'{\n    "results": [\n        {\n        "Website": "ChatGPT",\n        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n        "Assigned_Score": 2,\n        "Max_Score": 2,\n        "Reasoning": "All visible text on the page (e.g., \'What\'s on your mind today?\', \'Ask anything\', navigation items like \'New chat\', \'Search chats\', \'Library\', \'Projects\') uses simple, common vocabulary and short phrases that are easily understood at a basic reading level. None of the text appears to require a 7th grade reading level or higher; the language is conversational and accessible to younger readers."\n        }\n    ]\n}'

In [103]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "All visible text on the page (e.g., 'What's on your mind today?', 'Ask anything', navigation items like 'New chat', 'Search chats', 'Library', 'Projects') uses simple, common vocabulary and short phrases that are easily understood at a basic reading level. None of the text appears to require a 7th grade reading level or higher; the language is conversational and accessible to younger readers."}]}

In [104]:
l4_results.append(out)

Aggregate scores and reasoning for the different subsections evaluated for this L4 indicator

In [105]:
combined_results = {}
combined_results["Website"] = ""
combined_results["L4_Indicator"] = ""
combined_results["Assigned_Score"] = 0
combined_results["Max_Score"] = 0
combined_results["Reasoning"] = ""

for json_obj in l4_results:
    for result in json_obj["results"]:
        for field in result:
            if field == "Assigned_Score":
                combined_results[field] += result[field]
            elif field == "Reasoning":
                combined_results[field] += " "
                combined_results[field] += result[field]
            elif field == "Max_Score":
                combined_results[field] += result[field]
            else:
                combined_results[field] = result[field]

print(combined_results)

{'Website': 'ChatGPT', 'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)', 'Assigned_Score': 8, 'Max_Score': 10, 'Reasoning': ' All visible text items with multiple lines (e.g., sidebar menu items, \'Greg Knapp\' and \'Free\' in the bottom left) exhibit line spacing at or above 1.5 times the font size. Single-line text elements (e.g., \'What\'s on your mind today?\', \'Ask anything\', \'Get Plus\') do not require line spacing evaluation, as line spacing applies to multi-line text. No visible text items violate the line spacing threshold of 1.5x font size, resulting in a maximum score of 2. All visible text items (sidebar menu items, main heading, and input field placeholder) have spacing between paragraphs below 2 times the font size. The spacing between menu items in the left sidebar, between the main heading and input field, and other adjacent text elements consistently fall below the 2x font size threshold. The visible text elements in the ChatGPT hom

In [106]:
new_df = pd.DataFrame(combined_results, index=[0])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...


In [107]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df,new_df])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
0,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
0,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...


### L4: Cognitive-load reduction patterns present (chunking, progressive disclosure)

In [108]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ChatGPT's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of cognitive load reduction through proper header and label text.

Based only on what you see in the image of the homepage, rate the accessibility of help functionality on the following scale ranging from 0 to 2:

0: All visible text headers and labels do not clearly explain the purpose of their associated interactive elements.
1: Some visible text headers and labels do not clearly explain the purpose of their associated interactive elements.
2: All visible text headers and labels clearly explain the purpose of their associated interactive elements.

The response should be in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ChatGPT",
        "L4_Indicator": "Cognitive-load reduction patterns present (chunking, progressive disclosure)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[chatgpt_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [109]:
eval_result

'{\n    "results": [\n        {\n        "Website": "ChatGPT",\n        "L4_Indicator": "Cognitive-load reduction patterns present (chunking, progressive disclosure)",\n        "Assigned_Score": 2,\n        "Max_Score": 2,\n        "Reasoning": "All visible text headers and labels (e.g., \'New chat\', \'Search chats\', \'Ask anything\') clearly explain the purpose of their associated interactive elements, reducing cognitive load by providing immediate context for user actions."\n        }\n    ]\n}'

In [110]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ChatGPT',
   'L4_Indicator': 'Cognitive-load reduction patterns present (chunking, progressive disclosure)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "All visible text headers and labels (e.g., 'New chat', 'Search chats', 'Ask anything') clearly explain the purpose of their associated interactive elements, reducing cognitive load by providing immediate context for user actions."}]}

Append results to the final dataframe for comparison later

In [112]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."


In [113]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df,new_df])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
0,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
0,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
0,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."


## Finalize ChatGPT Test Results

In [115]:
chat_gpt_test_results_df.reset_index(inplace=True)
chat_gpt_test_results_df

Unnamed: 0,index,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
1,0,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
2,0,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
3,0,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
4,0,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."


In [116]:
chat_gpt_test_results_df.drop(columns=['index'], inplace=True)
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
1,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
2,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
3,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
4,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."


### Duplicate L4 Motor and voice accessibility options (voice input, large targets, reduced precision) L4

Since the L4 for Senior motor and voice accessibility options (voice input, large targets, reduced precision) was essentially identical to the L4 in the other L3 Subcategory, the score for the senior focused L4 is duplicated to reflect its presence in two separate L4s on the tree

In [120]:
# Replicate each row 3 times
seniors_row = chat_gpt_test_results_df.iloc[[2]]
seniors_row['L4_Indicator'] = "Motor and voice accessibility options (voice input, large targets, reduced precision)"
seniors_row

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  seniors_row['L4_Indicator'] = "Motor and voice accessibility options (voice input, large targets, reduced precision)"


Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
2,ChatGPT,Motor and voice accessibility options (voice i...,2,4,The microphone icon for voice input and the '...


In [125]:
chat_gpt_test_results_df = pd.concat([chat_gpt_test_results_df, seniors_row]).reset_index().drop(columns=['index'])
chat_gpt_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
1,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
2,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
3,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
4,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."
5,ChatGPT,Motor and voice accessibility options (voice i...,2,4,The microphone icon for voice input and the '...


### Return to L4 WCAG-aligned accessibility features available L4

Now that we have completed the other L4 indicators, we can aggregate them to get a score for the WCAG compliance metric

In [126]:
wcag_alignment_assigned_score = chat_gpt_test_results_df['Assigned_Score'].sum()
wcag_alignment_max_score = chat_gpt_test_results_df['Max_Score'].sum()

In [127]:
wcag_alignment_values = {
    "Website": "ChatGPT",
    "L4_Indicator": "WCAG-aligned accessibility features available",
    "Assigned_Score": wcag_alignment_assigned_score,
    "Max_Score": wcag_alignment_max_score,
    "Reasoning": "This value is the summation of all the other L4 indicators handled in this test, since they are meant to be aligned with WCAG guidance."
}

In [129]:
new_df = pd.DataFrame(wcag_alignment_values, index=[6])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
6,ChatGPT,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


In [130]:
final_chat_gpt_scores_df = pd.concat([chat_gpt_test_results_df, new_df])

In [131]:
final_chat_gpt_scores_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
1,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
2,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
3,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
4,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."
5,ChatGPT,Motor and voice accessibility options (voice i...,2,4,The microphone icon for voice input and the '...
6,ChatGPT,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


# Test L4 Indicators using Qwen3-vl:2B - ClaudeAI

## Create dataframe for storing Claude test results

In [162]:
# Create a dataframe to hold ClaudeAI test results
claudeai_test_results_df = pd.DataFrame(columns=["Website", "L4_Indicator", "Assigned_Score", "Max_Score", "Reasoning"])

## L3 Subdimension: The AI is accessible and inclusive across abilities and language

### L4: WCAG-aligned accessibility features available

This L4 category covers a wide variety of possible accessibility features as defined in the [WCAG 2.1 Guidelines](https://www.w3.org/TR/WCAG21/). Given my work as an individual and not a group for this project, the scope of these guidelines is too broad to be sufficiently covered by this work.

Instead, I have chosen to evalute this L4 category by making it a composite score of all the following L4 categories that are evalutated by this notebook. The reasoning for this being that every rating that is being assigned is based on one or more off the Success Criterias listed in the WCAG version 2.1 guidelines, so they fall under the umbrella of "WCAG-aligned accessibility features."

This notebook will act as a proof of concept that additional guidelines and features can be implemented into this evaluation pipeline, following the same structure and build up this composite WCAG-alignment score, so long as the ratings are based on WCAG-guidance as I have done for this project.

As a result, this section will be evaluated again at the end of this notebook once all other L4s have been scored. It is placed here for the time being to remain within the logical grouping as defined in the AI Ethics Index Tree (under the L3 Subdimension: The AI is accessible and inclusive accross abilities and langauges).

### L4: Multilingual support for intended locales

In [163]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAIs's chat model interaction interface help page for ClaudeAI regarding what languages the platform is localized into. Your task is to scan the HTML page for the list of supported langauges, count the number, then score the page based on the number of supported websites on the following scale:

0: The website supports 5 or fewer languages
1: The website supports 10 or fewer languages
2: The website supports 20 or fewer languages
3: The website supports 32 or fewer languages
4: The website supports 33 or more languages.

The response should be written in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Multilingual support for intended locales",
        "Assigned_Score": ,
        "Max_Score": 4,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_languages_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [164]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ClaudeAI",\n            "L4_Indicator": "Multilingual support for intended locales",\n            "Assigned_Score": 2,\n            "Max_Score": 4,\n            "Reasoning": "The page lists 11 supported languages, which is within the 20 or fewer languages category."\n        }\n    ]\n}'

In [165]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Multilingual support for intended locales',
   'Assigned_Score': 2,
   'Max_Score': 4,
   'Reasoning': 'The page lists 11 supported languages, which is within the 20 or fewer languages category.'}]}

In [166]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."


In [167]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df,new_df])
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."


### L4: Motor and voice accessibility options (voice input, large targets, reduced precision)

This L4 indicator is functionally identical to the L4 Indicator "Senior motor and voice accessibility options (voice inputs, large targets, reduced precisions.) so scores given to that L4 will be duplicated for this L4 category.

## L3 Subdimension: The AI supports accessibility & cognitive-load mitigation for seniors

### L4: Senior focused simplification and stepwise assistance available

In [168]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAIs's chat model interaction interface. You are tasked with evaluating the accessibility of help menu functionality on this home page based on the unmodified presentation of the homepage upon accessing it.

Based only on what you see in the image of the homepage, rate the accessibility of help functionality on the following scale ranging from 0 to 2:

0: There are no obvious help icons or links present on the homepage.
1: There are help icons or links available on the page, but they are not immediately obvious (for example, they are presented with only an icon and no text, or are in a very small size, smaller than 44 by 44 CSS pixels).
2: A help icon or link is available on the home page and is easily visible with text and in a size larger than 44 by 44 CSS pixels.

The response should be in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Senior-focused simplification & stepwise guidance availabile",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [169]:
print(eval_result)

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Senior-focused simplification & stepwise guidance availabile",
        "Assigned_Score": 2,
        "Max_Score": 2,
        "Reasoning": "The 'Help me write' button is clearly visible with text and serves as a help-related link. It meets the criteria of being easily visible with text and sized larger than 44 by 44 CSS pixels, as standard interactive elements in web interfaces typically adhere to touch target sizes (44x44 or larger)."
        }
    ]
}


In [170]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Senior-focused simplification & stepwise guidance availabile',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "The 'Help me write' button is clearly visible with text and serves as a help-related link. It meets the criteria of being easily visible with text and sized larger than 44 by 44 CSS pixels, as standard interactive elements in web interfaces typically adhere to touch target sizes (44x44 or larger)."}]}

In [171]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...


In [172]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df,new_df])
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...


### L4: Senior motor and voice accessibility options (voice input, large targets, reduced precision)

In [173]:
l4_results = []

In [174]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of motor accessibility and voice input options.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All input targets (buttons, links, images, etc.) are below the WCAG recommended size of 44 by 44 CSS pixels.
1: Some input targets (buttons, links, images, etc.) are below the WCAG recommended size of 44 by 44 CSS pixels.
2: All input targets (buttons, links, images, etc.) are at or above the WCAG recommended size of 44 by 44 CSS pixels.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Senior motor and voice accessibility options (voice input, large targets, reduced precision)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [175]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': "Several input targets such as 'Help me write', 'Learn about', 'Analyze Image', 'Summarize text', and '+ See More' buttons appear to be smaller than the 44x44 CSS pixel recommendation for touch targets. While elements like 'Start new' and voice input icons may meet or exceed the size requirement, the presence of multiple smaller buttons indicates 'some input targets are below the WCAG recommended size'."}]}

In [176]:
l4_results.append(out)

In [177]:
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of motor accessibility and voice input options.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: The website provides no visible voice input accessibility options.
1: The website provides a voice input mode, but does not indicate it clearly (ex. uses an image but does not label it with text)
2: The website provides a voice input mode that is clearly identifiable by both image and text.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Senior motor and voice accessibility options (voice input, large targets, reduced precision)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [178]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': 'The page includes a microphone icon (image) for voice input, but no accompanying text label to explicitly indicate its function. While the microphone icon is a standard visual cue for voice input, the absence of descriptive text means it is not clearly identifiable by both image and text.'}]}

In [179]:
l4_results.append(out)

Aggregate scores and reasoning for the different subsections evaluated for this L4 indicator

In [180]:
combined_results = {}
combined_results["Website"] = ""
combined_results["L4_Indicator"] = ""
combined_results["Assigned_Score"] = 0
combined_results["Max_Score"] = 0
combined_results["Reasoning"] = ""

for json_obj in l4_results:
    for result in json_obj["results"]:
        for field in result:
            if field == "Assigned_Score":
                combined_results[field] += result[field]
            elif field == "Reasoning":
                combined_results[field] += " "
                combined_results[field] += result[field]
            elif field == "Max_Score":
                combined_results[field] += result[field]
            else:
                combined_results[field] = result[field]

print(combined_results)

{'Website': 'ClaudeAI', 'L4_Indicator': 'Senior motor and voice accessibility options (voice input, large targets, reduced precision)', 'Assigned_Score': 2, 'Max_Score': 4, 'Reasoning': " Several input targets such as 'Help me write', 'Learn about', 'Analyze Image', 'Summarize text', and '+ See More' buttons appear to be smaller than the 44x44 CSS pixel recommendation for touch targets. While elements like 'Start new' and voice input icons may meet or exceed the size requirement, the presence of multiple smaller buttons indicates 'some input targets are below the WCAG recommended size'. The page includes a microphone icon (image) for voice input, but no accompanying text label to explicitly indicate its function. While the microphone icon is a standard visual cue for voice input, the absence of descriptive text means it is not clearly identifiable by both image and text."}


In [181]:
new_df = pd.DataFrame(combined_results, index=[0])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...


In [182]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df,new_df])
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
0,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...


### L4: Plain-language & readability thresholds met (senior-appropriate)

In [None]:
l4_results = []

In [183]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text items have line spacing below 1.5 times the font size.
1: Some of the visible text items have line spacing below 1.5 times the font size.
2: All of the visible text items have line spacing at or above 1.5 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [184]:
eval_result

'{\n    "results": [\n        {\n            "Website": "ClaudeAI",\n            "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n            "Assigned_Score": 2,\n            "Max_Score": 2,\n            "Reasoning": "Line spacing is not applicable to any visible text items, as all text elements (e.g., headings, buttons, input placeholders) are single-line. The criteria requires evaluating line spacing (vertical space between multiple lines of text), which does not exist here. Thus, the criteria is not applicable, and the maximum score is assigned."\n        }\n    ]\n}'

In [185]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': 'Line spacing is not applicable to any visible text items, as all text elements (e.g., headings, buttons, input placeholders) are single-line. The criteria requires evaluating line spacing (vertical space between multiple lines of text), which does not exist here. Thus, the criteria is not applicable, and the maximum score is assigned.'}]}

In [186]:
l4_results.append(out)

In [187]:
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text items have spacing between paragraphs below 2 times the font size.
1: Some of the visible text items have spacing between paragraphs below 2 times the font size.
2: All of the visible text items have spacing between paragraphs at or above 2 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [188]:
eval_result

'{\n    "results": [\n        {\n        "Website": "ClaudeAI",\n        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n        "Assigned_Score": 1,\n        "Max_Score": 2,\n        "Reasoning": "Some visible text items (e.g., spacing between \'No Chat History\' and \'Gregory Knapp\' in the sidebar) have minimal vertical spacing that appears to be below 2 times the font size. This indicates that not all text spacing meets the threshold of at or above 2 times the font size."\n        }\n    ]\n}'

In [189]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': "Some visible text items (e.g., spacing between 'No Chat History' and 'Gregory Knapp' in the sidebar) have minimal vertical spacing that appears to be below 2 times the font size. This indicates that not all text spacing meets the threshold of at or above 2 times the font size."}]}

In [190]:
l4_results.append(out)

In [191]:
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text characters have spacing between characters below 0.12 times the font size.
1: Some of the visible text characters have spacing between characters below 0.12 times the font size.
2: All of the visible text characters have spacing between characters at or above 0.12 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [192]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "All visible text elements (e.g., 'How can I help you?', 'Type a message...', menu options) exhibit standard, non-restricted letter spacing. The UI follows typical web design conventions where character spacing exceeds 0.12× font size, ensuring readability without visible character crowding or overlapping."}]}

In [193]:
l4_results.append(out)

In [194]:
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of text spacing features.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text words have spacing between words below 0.16 times the font size.
1: Some of the visible text words have spacing between words below 0.16 times the font size.
2: All of the visible text words have spacing between words below 0.16 times the font size.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [195]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 0,
   'Max_Score': 2,
   'Reasoning': "Based on the visible text elements (e.g., 'How can I help you?', 'Type a message...', button labels), the spacing between words is below 0.16 times the font size for all visible text. For example, with a typical font size of 14px for body text, 0.16 * 14 = 2.24px; standard word spacing of ~2px is below this threshold. Similarly, larger headings (e.g., 24px font size) have word spacing below 0.16x. Thus, all visible text words meet the condition for a score of 0."}]}

In [196]:
l4_results.append(out)

In [197]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of the reading level of the text.

Based only on what you see in the image of the homepage, rate the webpage on the following scale ranging from 0 to 2:

0: All of the visible text is above a 7th grade reading level.
1: Some of the visible text is above a 7th grade reading level
2: All of the visible text is at or below a 7th grade reading level.

If any criteria is not applicable to the webpage, the maximum score should be given and this should be mentioned in the reasonsing.

The response should be in JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You should insert your values for Assigned_Score and Reasoning based on the results of your analysis of the image.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning":
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']


In [198]:
eval_result


'{\n    "results": [\n        {\n        "Website": "ClaudeAI",\n        "L4_Indicator": "Plain-language & readability thresholds met (senior-appropriate)",\n        "Assigned_Score": 1,\n        "Max_Score": 2,\n        "Reasoning": "The visible text includes \'Analyze Image\', which contains the word \'Analyze\' (Flesch-Kincaid Grade Level ~7.5), placing it above a 7th grade reading level. All other visible text elements (e.g., \'Start new\', \'How can I help you?\', \'Create an image\', \'Help me write\', etc.) are at or below the 7th grade level. Thus, some visible text is above 7th grade while others are not, resulting in a score of 1."\n        }\n    ]\n}'

In [199]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)',
   'Assigned_Score': 1,
   'Max_Score': 2,
   'Reasoning': "The visible text includes 'Analyze Image', which contains the word 'Analyze' (Flesch-Kincaid Grade Level ~7.5), placing it above a 7th grade reading level. All other visible text elements (e.g., 'Start new', 'How can I help you?', 'Create an image', 'Help me write', etc.) are at or below the 7th grade level. Thus, some visible text is above 7th grade while others are not, resulting in a score of 1."}]}

In [200]:
l4_results.append(out)

Aggregate scores and reasoning for the different subsections evaluated for this L4 indicator

In [201]:
combined_results = {}
combined_results["Website"] = ""
combined_results["L4_Indicator"] = ""
combined_results["Assigned_Score"] = 0
combined_results["Max_Score"] = 0
combined_results["Reasoning"] = ""

for json_obj in l4_results:
    for result in json_obj["results"]:
        for field in result:
            if field == "Assigned_Score":
                combined_results[field] += result[field]
            elif field == "Reasoning":
                combined_results[field] += " "
                combined_results[field] += result[field]
            elif field == "Max_Score":
                combined_results[field] += result[field]
            else:
                combined_results[field] = result[field]

print(combined_results)

{'Website': 'ClaudeAI', 'L4_Indicator': 'Plain-language & readability thresholds met (senior-appropriate)', 'Assigned_Score': 8, 'Max_Score': 14, 'Reasoning': " Several input targets such as 'Help me write', 'Learn about', 'Analyze Image', 'Summarize text', and '+ See More' buttons appear to be smaller than the 44x44 CSS pixel recommendation for touch targets. While elements like 'Start new' and voice input icons may meet or exceed the size requirement, the presence of multiple smaller buttons indicates 'some input targets are below the WCAG recommended size'. The page includes a microphone icon (image) for voice input, but no accompanying text label to explicitly indicate its function. While the microphone icon is a standard visual cue for voice input, the absence of descriptive text means it is not clearly identifiable by both image and text. Line spacing is not applicable to any visible text items, as all text elements (e.g., headings, buttons, input placeholders) are single-line. T

In [202]:
new_df = pd.DataFrame(combined_results, index=[0])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...


In [203]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df,new_df])
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
0,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
0,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...


### L4: Cognitive-load reduction patterns present (chunking, progressive disclosure)

In [204]:
# Prompt for this L4 Indicator
prompt_in = """
This image shows the homepage for ClaudeAI's chat model interaction interface. You are tasked with evaluating the accessibility of this page in terms of cognitive load reduction through proper header and label text.

Based only on what you see in the image of the homepage, rate the accessibility of help functionality on the following scale ranging from 0 to 2:

0: All visible text headers and labels do not clearly explain the purpose of their associated interactive elements.
1: Some visible text headers and labels do not clearly explain the purpose of their associated interactive elements.
2: All visible text headers and labels clearly explain the purpose of their associated interactive elements.

The response should be in valid JSON format. The following example below shows properly formatted output. The structure, Website, L4_Indicator and Max_Score properties should not be changed. You are only writing the value for Score and Reasoning in the response.

{
    "results": [
        {
        "Website": "ClaudeAI",
        "L4_Indicator": "Cognitive-load reduction patterns present (chunking, progressive disclosure)",
        "Assigned_Score": ,
        "Max_Score": 2,
        "Reasoning": ""
        }
    ]
}
"""

# Interact with the vision model
response = ollama.generate(
    model="qwen3-vl:8b",  # Use the name of the vision model you pulled
    prompt=prompt_in,
    images=[claude_bytes]
)

# Save the model's response as json
eval_result = response['response']

In [205]:
print(eval_result)

{
    "results": [
        {
            "Website": "ClaudeAI",
            "L4_Indicator": "Cognitive-load reduction patterns present (chunking, progressive disclosure)",
            "Assigned_Score": 2,
            "Max_Score": 2,
            "Reasoning": "All visible text headers and labels clearly explain the purpose of their associated interactive elements. The main header 'How can I help you?' provides a clear context. The input field label 'Type a message...' explicitly describes the expected action. Interactive elements like 'Create an image', 'Help me write', 'Analyze image', and 'Summarize text' each have labels that directly state their function, eliminating ambiguity about their purpose."
        }
    ]
}


In [206]:
out = js.loads(eval_result)
out

{'results': [{'Website': 'ClaudeAI',
   'L4_Indicator': 'Cognitive-load reduction patterns present (chunking, progressive disclosure)',
   'Assigned_Score': 2,
   'Max_Score': 2,
   'Reasoning': "All visible text headers and labels clearly explain the purpose of their associated interactive elements. The main header 'How can I help you?' provides a clear context. The input field label 'Type a message...' explicitly describes the expected action. Interactive elements like 'Create an image', 'Help me write', 'Analyze image', and 'Summarize text' each have labels that directly state their function, eliminating ambiguity about their purpose."}]}

In [208]:
new_df = pd.DataFrame(out['results'])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...


In [209]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df,new_df])

In [210]:
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
0,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
0,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...
0,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...


## Finalize ClaudeAI Test Results

In [211]:
claudeai_test_results_df.reset_index(inplace=True)
claudeai_test_results_df

Unnamed: 0,index,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,0,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,0,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,0,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...
4,0,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...


In [212]:
claudeai_test_results_df.drop(columns=['index'], inplace=True)
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...
4,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...


### Duplicate L4 Motor and voice accessibility options (voice input, large targets, reduced precision) L4

Since the L4 for Senior motor and voice accessibility options (voice input, large targets, reduced precision) was essentially identical to the L4 in the other L3 Subcategory, the score for the senior focused L4 is duplicated to reflect its presence in two separate L4s on the tree

In [213]:
# Replicate each row 3 times
seniors_row = claudeai_test_results_df.iloc[[2]]
seniors_row['L4_Indicator'] = "Motor and voice accessibility options (voice input, large targets, reduced precision)"
seniors_row

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  seniors_row['L4_Indicator'] = "Motor and voice accessibility options (voice input, large targets, reduced precision)"


Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
2,ClaudeAI,Motor and voice accessibility options (voice i...,2,4,Several input targets such as 'Help me write'...


In [214]:
claudeai_test_results_df = pd.concat([claudeai_test_results_df, seniors_row]).reset_index().drop(columns=['index'])
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,ClaudeAI,Plain-language & readability thresholds met (s...,8,14,Several input targets such as 'Help me write'...
4,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...
5,ClaudeAI,Motor and voice accessibility options (voice i...,2,4,Several input targets such as 'Help me write'...


In [223]:
claudeai_test_results_df.iloc[3, 3] = 10 # For some reason the max score was counted incorrectly, even though the assigned was correct

In [224]:
claudeai_test_results_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,ClaudeAI,Plain-language & readability thresholds met (s...,8,10,Several input targets such as 'Help me write'...
4,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...
5,ClaudeAI,Motor and voice accessibility options (voice i...,2,4,Several input targets such as 'Help me write'...


### Return to L4 WCAG-aligned accessibility features available L4

Now that we have completed the other L4 indicators, we can aggregate them to get a score for the WCAG compliance metric

In [225]:
wcag_alignment_assigned_score = claudeai_test_results_df['Assigned_Score'].sum()
wcag_alignment_max_score = claudeai_test_results_df['Max_Score'].sum()

In [226]:
wcag_alignment_values = {
    "Website": "ClaudeAI",
    "L4_Indicator": "WCAG-aligned accessibility features available",
    "Assigned_Score": wcag_alignment_assigned_score,
    "Max_Score": wcag_alignment_max_score,
    "Reasoning": "This value is the summation of all the other L4 indicators handled in this test, since they are meant to be aligned with WCAG guidance."
}

In [227]:
new_df = pd.DataFrame(wcag_alignment_values, index=[6])
new_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
6,ClaudeAI,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


In [228]:
final_claudeai_gpt_scores_df = pd.concat([claudeai_test_results_df, new_df])

In [229]:
final_claudeai_gpt_scores_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,ClaudeAI,Plain-language & readability thresholds met (s...,8,10,Several input targets such as 'Help me write'...
4,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...
5,ClaudeAI,Motor and voice accessibility options (voice i...,2,4,Several input targets such as 'Help me write'...
6,ClaudeAI,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


# View Final Results of both tests

In [230]:
final_chat_gpt_scores_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ChatGPT,Multilingual support for intended locales,4,4,"The page lists 39 supported languages, which m..."
1,ChatGPT,Senior-focused simplification & stepwise guida...,0,2,No help icons or links are present on the home...
2,ChatGPT,Senior motor and voice accessibility options (...,2,4,The microphone icon for voice input and the '...
3,ChatGPT,Plain-language & readability thresholds met (s...,8,10,All visible text items with multiple lines (e...
4,ChatGPT,Cognitive-load reduction patterns present (chu...,2,2,"All visible text headers and labels (e.g., 'Ne..."
5,ChatGPT,Motor and voice accessibility options (voice i...,2,4,The microphone icon for voice input and the '...
6,ChatGPT,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


In [231]:
final_claudeai_gpt_scores_df

Unnamed: 0,Website,L4_Indicator,Assigned_Score,Max_Score,Reasoning
0,ClaudeAI,Multilingual support for intended locales,2,4,"The page lists 11 supported languages, which i..."
1,ClaudeAI,Senior-focused simplification & stepwise guida...,2,2,The 'Help me write' button is clearly visible ...
2,ClaudeAI,Senior motor and voice accessibility options (...,2,4,Several input targets such as 'Help me write'...
3,ClaudeAI,Plain-language & readability thresholds met (s...,8,10,Several input targets such as 'Help me write'...
4,ClaudeAI,Cognitive-load reduction patterns present (chu...,2,2,All visible text headers and labels clearly ex...
5,ClaudeAI,Motor and voice accessibility options (voice i...,2,4,Several input targets such as 'Help me write'...
6,ClaudeAI,WCAG-aligned accessibility features available,18,26,This value is the summation of all the other L...


Given that this is a somewhat simple metric (with plenty of room for expansion through the implementation of additional WCAG guidelines, either in here or in other L3 Subcategories throughout the AI Ethics Index, it is not surprising that the results were quite similar.

Both landing pages take inspiration for each other and focus on simplicity, since it is a more common modern design philosophy especially in tech companies, and to keep the focus on the chat models themselves.

Interestingly, although the overall L4 Metric, WCAG alignment resulted in an equal score, indicating that both models had the same aggregate score across all the other L4s, there was one difference between the two.

ChatGPT received a 4/4 for the language support, since it supports over 40 different langauges, at least as of 12/9/2025, while the language support for Claude is quite low in comparison. On the other hand, ClaudeScored better for help functionality and guidance given the presence of a help me write function while ChatGPT provides no visible guidance for a user beyond what is presented on the screen.

This result shows that at least from a reasoning perspective, Qwen is able to view the images, find discernable differences and grade accordingly based on the instructions given.