# About this surveyeval-lite notebook (local version)

This notebook provides a simple example of an automated AI workflow. It's a much-simplified version of [the surveyeval toolkit available here in GitHub](https://github.com/higherbar-ai/survey-eval). This version, self-contained in this single notebook, uses [the ai-workflows package](https://github.com/higherbar-ai/ai-workflows), along with an OpenAI LLM, to:

1. Parse a survey file into a series of questions

2. Loop through each question to:

    1. Evaluate the question for potential phrasing issues
    2. Evaluate the question for potential bias issues

3. Assemble and output all findings and recommendations

See [the ai-workflows GitHub repo](https://github.com/higherbar-ai/ai-workflows) for more details.

## Configuration

Before attempting to run this notebook, be sure to set up your Python environment using the code in `initial-setup.ipynb` and configure the `.ini` file as discussed below.

The notebook begins by loading credentials and configuration from an `.ini` file stored in `~/.hbai/ai-workflows.ini`. The `~` in the path refers to the current user's home directory, and the `.ini` file contents should follow this format (with keys, models, and paths as appropriate):

    [openai]
    openai-api-key=keyhere-with-sk-on-front
    openai-model=gpt-4o
    azure-api-key=keyhere-or-blank
    azure-api-base=azure-base-url-here
    azure-api-engine=gpt-4o
    azure-api-version=2024-02-01

    [langsmith]
    langsmith-api-key=leave-blank-unless-you're-using-langsmith

    [files]
    input-dir=~/Files/ai-workflows/inputs
    output-dir=~/Files/ai-workflows/outputs

You can leave the Azure settings blank if you're using OpenAI (or vice versa). You also don't need to supply a Langsmith API key unless you're using Langsmith. The `input-dir` and `output-dir` settings are used to specify the directories where input and output files are stored, respectively.

## Initializing

This next code block initializes the notebook, reading parameters from the configuration file and initializing an LLM interface.

In [1]:
# for convenience, auto-reload modules when they've changed
%load_ext autoreload
%autoreload 2

import logging
import configparser
import os
from ai_workflows.llm_utilities import LLMInterface 
from ai_workflows.document_utilities import DocumentInterface

# set log level to WARNING
logging.basicConfig(level=logging.INFO)

# load credentials and other configuration from a local ini file
inifile_location = os.path.expanduser("~/.hbai/ai-workflows.ini")
inifile = configparser.RawConfigParser()
inifile.read(inifile_location)

# load configuration
openai_api_key = inifile.get("openai", "openai-api-key")
openai_model = inifile.get("openai", "openai-model")
azure_api_key = inifile.get("openai", "azure-api-key")
azure_api_base = inifile.get("openai", "azure-api-base")
azure_api_engine = inifile.get("openai", "azure-api-engine")
azure_api_version = inifile.get("openai", "azure-api-version")
input_dir = os.path.expanduser(inifile.get("files", "input-dir"))
output_dir = os.path.expanduser(inifile.get("files", "output-dir"))
langsmith_api_key = inifile.get("langsmith", "langsmith-api-key")

# initialize LangSmith API (if key specified)
if langsmith_api_key:
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_PROJECT"] = "local"
    os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
    os.environ["LANGCHAIN_API_KEY"] = langsmith_api_key

# initialize the LLM
llm = LLMInterface(
    openai_api_key=openai_api_key,
    openai_model=openai_model,
    azure_api_key=azure_api_key,
    azure_api_base=azure_api_base,
    azure_api_engine=azure_api_engine,
    azure_api_version=azure_api_version,
    langsmith_api_key=langsmith_api_key
)

# initialize our document processor
doc_interface = DocumentInterface(llm_interface=llm)

# report success
print("Local configuration loaded, LLM and document processor initialized.")

Local configuration loaded, LLM and document processor initialized.


## Reading survey file and converting to Markdown

When you run this next code cell, it will grab the first file in the input directory and use that for the survey evaluation, converting it to Markdown format and outputting the converted text so that you can verify that it looks okay.

If you don't have a survey file handy, you can use the DHS example in the resources folder.

In [6]:
from ai_workflows.document_utilities import DocumentInterface

# find the first file in the input directory
file_path = None
for filename in os.listdir(input_dir):
    if os.path.isfile(os.path.join(input_dir, filename)) and not filename.startswith('.'):
        file_path = os.path.join(input_dir, filename)
        break

if not file_path:
    raise ValueError("No files found in the input directory.")

# extract file text as Markdown
doc_processor = DocumentInterface(llm_interface=llm)
survey_text = doc_processor.convert_to_markdown(file_path)

# report results
print(f"Processing this survey file: {file_path}")
print()
print(f"Extracted Markdown text: {survey_text}")

Processing /Users/crobert/Files/ai-workflows/inputs/sample_dhs_questions.txt...
Found the following file to use as the survey file: /Users/crobert/Files/ai-workflows/inputs/sample_dhs_questions.txt

Extracted Markdown text: ```
pubs/pdf/DHSQ8/DHS8_Womans_QRE_EN_14Feb2023_DHSQ8.pdf on Oct. 5, 2024, following a
link from https://dhsprogram.com/publications/publication-DHSQ8-DHS-Questionnairesand-Manuals.cfm.
**Section 11: Other Health Issues**
- **Now I would like to ask you some questions on smoking and tobacco use. Do you
currently smoke cigarettes every day, some days, or not at all?**
 - (1) Every day
 - (2) Some days
 - (3) Not at all
- **On average, how many cigarettes do you currently smoke each day?**
 - [Numeric response expected]
- **Do you currently smoke or use any other type of tobacco every day, some days, or
not at all?**
 - (1) Every day
 - (2) Some days
 - (3) Not at all
- **What other type of tobacco do you currently smoke or use?**
 - (A) Kreteks
 - (B) Pipes full of t

## Parsing the survey text

The next code block will use the LLM to parse the survey text into a list of questions.

In [8]:
# configure our parameters for JSON conversion

json_context = "The file contains a survey instrument or digital form."

json_job = f"""Your job is to extract survey questions or form fields from the file's content, including question IDs, instructions, and multiple-choice options, and to return it all in a specific JSON format. More specifically:

* **Your job is to extract verbatim text:** In the JSON you return, only ever include text content, directly quoted without modification, from the survey text you are supplied (i.e., never add or invent any text and never revise or rephrase any text).

* **Only respond with valid JSON that precisely follows the format specified below:** Your response should only include valid JSON and nothing else; if you cannot find any questions to return, simply return an empty questions list.

* **Treat translations as separate questions:** If you see one or more translated versions of a question, include them as separate questions in the JSON you return."""

json_output_spec = f"""Return JSON with the following fields (and only the following fields):

* `questions` (list): The list of questions extracted, or an empty list if none found. Each question should be a dictionary with the following keys:

  * `question_id` (string): The numeric or alphanumeric identifier or short variable name identifying the question (if any), usually located just before or at the beginning of the question. "" if none found.

  * `question` (string): The exact text of the question or form field, including any introductory text that provides context or explanation. Often follows a unique question ID of some sort, like "2.01." or "gender:". Should not include response options, which should be included in the 'options' field, or extra enumerator or interviewer instructions (including interview probes), which should be included in the 'instructions' field. Be careful: the same question might be asked in multiple languages, and each translation should be included as a separate question. Never translate between languages or otherwise alter the question text in any way.

  * `instructions` (string): Instructions or other guidance about how to ask or answer the question (if any), including enumerator or interviewer instructions. If the question includes a list of specific response options, do NOT include those in the instructions. However, if there is guidance as to how to fill out an open-ended numeric or text response, or guidance about how to choose among the options, include that guidance here. "" if none found.

  * `options` (string): The list of specific response options for multiple-choice questions in a single string, including both the label and the internal value (if specified) for each option. For example, a 'Male' label might be coupled with an internal value of '1', 'M', or even 'male'. Separate response options with a space, three pipe symbols ('|||'), and another space, and, if there is an internal value, add a space, three # symbols ('###'), and the internal value at the end of the label. For example: 'Male ### 1 ||| Female ### 2' (codes included) or 'Male ||| Female' (no codes); 'Yes ### yes ||| No ### no', 'Yes ### 1 ||| No ### 0', 'Yes ### y ||| No ### n', or 'YES ||| NO'. Do NOT include fill-in-the-blank content here, only multiple-choice options. "" if the question is open-ended (i.e., does not include specific multiple-choice options)."""

# process the file
all_responses = doc_processor.markdown_to_json(survey_text, json_context, json_job, json_output_spec)

# combine all responses into a single list of questions
questions = []
for response in all_responses:
    if 'questions' in response:
        questions += response['questions']

# output results
if questions:
    # output summary of results
    num_questions = len(questions)
    num_question_ids = len(set([q['question_id'] for q in questions]))
    num_instructions = len(set([q['instructions'] for q in questions]))
    num_options = len(set([q['options'] for q in questions]))
    print(f"Parsed {num_questions} questions ({num_question_ids} with IDs, {num_instructions} with instructions, and {num_options} with multiple-choice options)")
else:
    print(f"Failed to parse any questions from file.")

INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
INFO:root:Extracted JSON from Markdown: {
  "questions": [
    {
      "question_id": "",
      "question": "Now I would like to ask you some questions on smoking and tobacco use. Do you currently smoke cigarettes every day, some days, or not at all?",
      "instructions": "",
      "options": "Every day ### 1 ||| Some days ### 2 ||| Not at all ### 3"
    },
    {
      "question_id": "",
      "question": "On average, how many cigarettes do you currently smoke each day?",
      "instructions": "[Numeric response expected]",
      "options": ""
    },
    {
      "question_id": "",
      "question": "Do you currently smoke or use any other type of tobacco every day, some days, or not at all?",
      "instructions": "",
      "options": "Every day ### 1 ||| Some days ### 2 ||| Not at all ### 3"
    },
    {
      "question_id": 

Parsed 19 questions (1 with IDs, 6 with instructions, and 8 with multiple-choice options)


## Reviewing the survey questions

This next code block will review each question in the survey, asking the LLM for advice re: question phrasing as well as potential biased or stereotypical language.

In [9]:
# loop through every question, reviewing them and saving results as we go
all_results = []
for question in questions:
    # format our question for the LLM
    question_text = f"""* Question ID: {question['question_id']}
* Instructions: {question['instructions']}
* Question: {question['question']}
* Options: {question['options']}"""

    # set up our phrasing-review prompt for the LLM
    phrasing_prompt = f"""You are an AI designed to evaluate questionnaires and other survey instruments used by researchers and M&E professionals. You are an expert in survey methodology with training equivalent to a member of the American Association for Public Opinion Research (AAPOR) with a Ph.D. in survey methodology from University of Michigan’s Institute for Social Research. You consider primarily the content, context, and questions provided to you, and then content and methods from the most widely-cited academic publications and public and nonprofit research organizations.

You always give truthful, factual answers. When asked to give your response in a specific format, you always give your answer in the exact format requested. You never give offensive responses. If you don’t know the answer to a question, you truthfully say you don’t know.

You will be given the raw text from a questionnaire or survey instrument between |!| and |!| delimiters. You will also be given a specific question from that text to evaluate between |@| and |@| delimiters. The question will be supplied in the following format:

* Question ID: ID (if any)
* Instructions: Instructions (if any)
* Question: Question text
* Options: Multiple-choice options (if any), with each separated by three pipe symbols (|||) and option values (if any) separated from option labels by three hash symbols (###)

Evaluate the question only, but also consider its context within the larger survey.

Assume that this survey will be administered by a trained enumerator who asks each question and reads each prompt or instruction as indicated in the excerpt. Your job is to anticipate the phrasing or translation issues that would be identified in a rigorous process of pre-testing (with cognitive interviewing) and piloting.

When evaluating the question, DO:

1. Ensure that the question will be understandable by substantially all respondents.

2. Consider the question in the context of the excerpt, including any instructions, related questions, or prompts that precede it.

3. Ignore question numbers and formatting.

4. Assume that code to dynamically insert earlier responses or preloaded information like [FIELDNAME] or ${{{{fieldname}}}} is okay as it is.

5. Ignore HTML or other formatting, and focus solely on question phrasing (assume that HTML tags will be for visual formatting only and will not be read aloud).

When evaluating the question, DON'T:

1. Recommend translating something into another language (i.e., suggestions for rephrasing should always be in the same language as the original text).

2. Recommend changes in the overall structure of a question (e.g., changing from multiple choice to open-ended or splitting one question into multiple), unless it will substantially improve the quality of the data collected.

3. Comment on HTML tags or formatting.

Respond in JSON format with all of the following fields:

* `Phrases` (list): a list containing all phrases from the excerpt that pre-testing or piloting is likely to identify as problematic (each phrase should be an exact quote)

* `Number of phrases` (number): the exact number of phrases in Phrases [ Note that this key must be exactly "Number of phrases", with exactly that capitalization and spacing ]

* `Recommendations` (list): a list containing suggested replacement phrases, one for each of the phrases in Phrases (in the same order as Phrases; each replacement phrase should be an exact quote that can exactly replace the corresponding phrase in Phrases; and each replacement phrase should be in the same language as the original phrase)

* `Explanations` (list): a list containing explanations for why the authors should consider revising each phrase, one for each of the phrases in Phrases (in the same order as Phrases). Do not repeat the entire phrase in the explanation, but feel free to reference specific words or parts as needed.

* `Severities` (list): a list containing the severity of each identified issue, one for each of the phrases in Phrases (in the same order as Phrases); each severity should be expressed as a number on a scale from 1 for the least severe issues (minor phrasing issues that are very unlikely to substantively affect responses) to 5 for the most severe issues (problems that are likely to substantively affect responses in a way that introduces bias and/or variance)

Raw text:
|!|
{survey_text}
|!|

Question to evaluate:
|@|
{question_text}
|@|

Your JSON response following the format described above:"""

    # call out to the LLM
    print()
    print(f"Evaluating question for phrasing: {question['question']}")
    response_text, response_dict = llm.process_json_response(llm.llm_json_response_with_timeout(phrasing_prompt))

    # save and output results
    if response_dict is not None:
        if 'Number of phrases' in response_dict and response_dict['Number of phrases'] > 0:
            print(f"  Identified {response_dict['Number of phrases']} issue(s)")
            all_results += [response_dict]
        else:
            print("  No issues identified")
    else:
        print(f"  Failed to get a valid response. Response text: {response_text}")

    # set up our bias-review prompt for the LLM
    # Note that this prompt was inspired by the example in this blog post:
    # https://www.linkedin.com/pulse/using-chatgpt-counter-bias-prejudice-discrimination-johannes-schunter/
    bias_prompt = f"""You are an AI designed to evaluate questionnaires and other survey instruments used by researchers and M&E professionals. You are an expert in survey methodology with training equivalent to a member of the American Association for Public Opinion Research (AAPOR) with a Ph.D. in survey methodology from University of Michigan’s Institute for Social Research. You are also an expert in the areas of gender equality, discrimination, anti-racism, and anti-colonialism. You consider primarily the content, context, and questions provided to you, and then content and methods from the most widely-cited academic publications and public and nonprofit research organizations.

You always give truthful, factual answers. When asked to give your response in a specific format, you always give your answer in the exact format requested. You never give offensive responses. If you don’t know the answer to a question, you truthfully say you don’t know.

You will be given the raw text from a questionnaire or survey instrument between |!| and |!| delimiters. You will also be given a specific question from that text to evaluate between |@| and |@| delimiters. The question will be supplied in the following format:

* Question ID: ID (if any)
* Instructions: Instructions (if any)
* Question: Question text
* Options: Multiple-choice options (if any), with each separated by three pipe symbols (|||) and option values (if any) separated from option labels by three hash symbols (###)

Evaluate the question only, but also consider its context within the larger survey.

Assume that this survey will be administered by a trained enumerator who asks each question and reads each prompt or instruction as indicated in the excerpt. Your job is to review the question for:

a. Stereotypical representations of gender, ethnicity, origin, religion, or other social categories.

b. Distorted or biased representations of events, topics, groups, or individuals.

c. Use of discriminatory or insensitive language towards certain groups or topics.

d. Implicit or explicit assumptions made in the text or unquestioningly adopted that could be based on prejudices.

e. Prejudiced descriptions or evaluations of abilities, characteristics, or behaviors.

Respond in JSON format with all of the following fields:

* `Phrases`: a list containing all problematic phrases from the excerpt that you found in your review (each phrase should be an exact quote from the excerpt)

* `Number of phrases`: the exact number of phrases in Phrases [ Note that this key must be exactly "Number of phrases", with exactly that capitalization and spacing ]

* `Recommendations`: a list containing suggested replacement phrases, one for each of the phrases in Phrases (in the same order as Phrases; each replacement phrase should be an exact quote that can exactly replace the corresponding phrase in Phrases)

* `Explanations`: a list containing explanations for why the phrases are problematic, one for each of the phrases in Phrases (in the same order as Phrases)

* `Severities`: a list containing the severity of each identified issue, one for each of the phrases in Phrases (in the same order as Phrases); each severity should be expressed as a number on a scale from 1 for the least severe issues (minor phrasing issues that are very unlikely to offend respondents or substantively affect their responses) to 5 for the most severe issues (problems that are very likely to offend respondents or substantively affect responses in a way that introduces bias and/or variance)

Raw text:
|!|
{survey_text}
|!|

Question to evaluate:
|@|
{question_text}
|@|

Your JSON response following the format described above:"""

    # call out to the LLM
    print()
    print(f"Evaluating question for bias: {question['question']}")
    response_text, response_dict = llm.process_json_response(llm.llm_json_response_with_timeout(bias_prompt))

    # save and output results
    if response_dict is not None:
        if 'Number of phrases' in response_dict and response_dict['Number of phrases'] > 0:
            print(f"  Identified {response_dict['Number of phrases']} issue(s)")
            all_results += [response_dict]
        else:
            print("  No issues identified")
    else:
        print(f"  Failed to get a valid response. Response text: {response_text}")


Evaluating question for phrasing: Now I would like to ask you some questions on smoking and tobacco use. Do you currently smoke cigarettes every day, some days, or not at all?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Now I would like to ask you some questions on smoking and tobacco use. Do you currently smoke cigarettes every day, some days, or not at all?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: On average, how many cigarettes do you currently smoke each day?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: On average, how many cigarettes do you currently smoke each day?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: Do you currently smoke or use any other type of tobacco every day, some days, or not at all?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Do you currently smoke or use any other type of tobacco every day, some days, or not at all?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: What other type of tobacco do you currently smoke or use?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 9 issue(s)

Evaluating question for bias: What other type of tobacco do you currently smoke or use?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: Now I would like to ask you some questions about drinking alcohol. Have you ever consumed any alcohol, such as beer, wine, spirits, or [ADD OTHER LOCAL EXAMPLES]?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Now I would like to ask you some questions about drinking alcohol. Have you ever consumed any alcohol, such as beer, wine, spirits, or [ADD OTHER LOCAL EXAMPLES]?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: During the last one month, on how many days did you have an alcoholic drink?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: During the last one month, on how many days did you have an alcoholic drink?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: We count one drink of alcohol as one can or bottle of beer, one glass of wine, one shot of spirits, or one cup of [ADD OTHER LOCAL EXAMPLES]. In the last one month, on the days that you drank alcohol, how many drinks did you usually have per day?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 2 issue(s)

Evaluating question for bias: We count one drink of alcohol as one can or bottle of beer, one glass of wine, one shot of spirits, or one cup of [ADD OTHER LOCAL EXAMPLES]. In the last one month, on the days that you drank alcohol, how many drinks did you usually have per day?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Many different factors can prevent women from getting medical advice or treatment for themselves. When you are sick and want to get medical advice or treatment, is each of the following a big problem or not a big problem?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 2 issue(s)

Evaluating question for bias: Many different factors can prevent women from getting medical advice or treatment for themselves. When you are sick and want to get medical advice or treatment, is each of the following a big problem or not a big problem?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Getting permission to go to the doctor


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Getting permission to go to the doctor


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Getting money needed for advice or treatment


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Getting money needed for advice or treatment


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: The distance to the health facility


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: The distance to the health facility


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Not wanting to go alone


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Not wanting to go alone


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Are you covered by any health insurance?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for bias: Are you covered by any health insurance?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: What type of health insurance are you covered by?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: What type of health insurance are you covered by?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: How long does it take in minutes to go from your home to the nearest healthcare facility, which could be a hospital, a health clinic, a medical doctor, or a health post?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: How long does it take in minutes to go from your home to the nearest healthcare facility, which could be a hospital, a health clinic, a medical doctor, or a health post?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: How do you travel to this healthcare facility from your home?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: How do you travel to this healthcare facility from your home?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: Now I’m going to ask you about tests a healthcare worker can do to check for cervical cancer, which is cancer in the cervix. The cervix connects the womb to the vagina. To be checked for cervical cancer, a woman is asked to lie on her back with her legs apart. Then the healthcare worker will use a brush or swab to collect a sample from inside her. The sample is sent to a laboratory for testing. This test is called a Pap smear or HPV test. Another method is called a VIA or Visual Inspection with Acetic Acid. In this test, the healthcare worker puts vinegar on the cervix to see if there is a reaction.


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 6 issue(s)

Evaluating question for bias: Now I’m going to ask you about tests a healthcare worker can do to check for cervical cancer, which is cancer in the cervix. The cervix connects the womb to the vagina. To be checked for cervical cancer, a woman is asked to lie on her back with her legs apart. Then the healthcare worker will use a brush or swab to collect a sample from inside her. The sample is sent to a laboratory for testing. This test is called a Pap smear or HPV test. Another method is called a VIA or Visual Inspection with Acetic Acid. In this test, the healthcare worker puts vinegar on the cervix to see if there is a reaction.


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for phrasing: Has a doctor or other healthcare worker ever tested you for cervical cancer?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Has a doctor or other healthcare worker ever tested you for cervical cancer?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified

Evaluating question for phrasing: Has a doctor or other healthcare provider examined your breasts to check for breast cancer?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  Identified 1 issue(s)

Evaluating question for bias: Has a doctor or other healthcare provider examined your breasts to check for breast cancer?


INFO:httpx:HTTP Request: POST https://hbai-openai-useast2.openai.azure.com//openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"


  No issues identified


## Organizing and outputting the results

This final code block organizes and outputs final results, saving them in an output file named `survey-review-results.txt`.

In [11]:
# generate report
if len(all_results) == 0:
    report = "No results to save"
else:
    report = "Survey review results:\n"
    for result in all_results:
        if 'Phrases' in result and result['Number of phrases'] > 0:
            # loop through all recommendations, treating lists as parallel arrays
            for phrase, recommendation, explanation, severity in zip(result['Phrases'], result['Recommendations'], result['Explanations'], result['Severities']):
                report += f"\n---\n\nSuggest replacing this: {phrase}\n\nWith this: {recommendation}\n\n{explanation}\n\nImportance: {severity} out of 5\n"

# save the report to file
output_path = os.path.join(output_dir, "survey-review-results.txt")
with open(output_path, "w") as f:
    f.write(report)

print(f"All recommendations saved to {output_path}")

All recommendations saved to /Users/crobert/Files/ai-workflows/outputs/survey-review-results.txt
