# Analyzing course evaluations
This notebook provides sample EDSL code for using a language model to analyze a set of course evaluations. The analysis is designed as a survey of questions about the evaluations that we prompt an AI agent to answer, using a language model to generate the responses as a dataset.

[EDSL](https://pypi.org/project/edsl/) is an open-source Python package for simulating surveys and experiments with AI agents and language models. Please [see our docs](https://docs.expectedparrot.com/en/latest/index.html#) for tips on getting started.

## Technical setup
Before running the code below, please see instructions for [installing EDSL](https://docs.expectedparrot.com/en/latest/installation.html) and [storing API keys](https://docs.expectedparrot.com/en/latest/api_keys.html) for the language models that you want to use. 

In [1]:
# pip install edsl

## Create questions
We start by creating questions about the evaluations for an agent to answer. EDSL comes with a [variety of question types](https://docs.expectedparrot.com/en/latest/questions.html) (multiple choice, free text, etc.) that we can choose from based on the desired format of the response (e.g., a selection from a list of options, unstructured text, etc.). We can use a `{{ placeholder }}` in each question text in order to parameterize it with each evaluation. This allows us to create different "scenarios" of the questions that we can administer together.

Here we select some question types:

In [2]:
from edsl.questions import QuestionList, QuestionMultipleChoice

Here we compose some questions in the relevant question type templates (see [examples of all types](https://docs.expectedparrot.com/en/latest/questions.html#question-type-classes) in the docs):

In [3]:
q_sentiment = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = "What is the overall sentiment of the following evaluation: {{ evaluation }}",
    question_options = ["Positive", "Neutral", "Negative"]
)

q_themes = QuestionList(
    question_name = "themes",
    question_text = "Identify the key points in the following evaluation and summarize each point individually: {{ evaluation }}",
    max_list_items = 3 # Optional 
)

q_improvements = QuestionList(
    question_name = "improvements",
    question_text = "Identify areas for improvement of the course based on the following evaluation and summarize them individually: {{ evaluation }}",
    max_list_items = 3
)

## Construct a survey
Next we combine our questions into a survey. This allows us to administer the questions asynchronously (by default), or according to any desired [survey logic or rules](https://docs.expectedparrot.com/en/latest/surveys.html) that we want to add, such as skip/stop rules or giving an agent "memories" of other questions in the survey. Here we create a simple asynchronous survey by passing the list of questions to a `Survey` object:

In [4]:
from edsl import Survey

survey = Survey(questions = [q_sentiment, q_themes, q_improvements])

## Select data for review
Next we identify the data to be analyzed. Here we use some mock evaluations for an Econ 101 course stored as a list of texts:

In [5]:
evaluations = [
    "I found the course very engaging and informative. The professor did an excellent job breaking down complex concepts, making them accessible to those of us new to economics. However, the pace was a bit fast, and I sometimes struggled to keep up with the weekly readings.",
    "This class was a struggle for me. The material felt dry and difficult to connect with real-world applications, which I think could have made it more interesting. More examples from current events would definitely have helped spark my interest.",
    "Excellent introductory course! The professor was enthusiastic and always willing to offer extra help during office hours. The interactive lectures and the practical assignments made the theory much more digestible and engaging.",
    "As someone with a strong background in math, I appreciated the analytical rigor of this course. However, I wish there had been more discussions that connected the theories we learned to everyday economic issues. It felt a bit isolated from practical realities at times.",
    "I enjoyed the course, especially the group projects, which were both challenging and rewarding. It was great to apply economic concepts to solve real-life problems. I did feel, however, that the feedback on assignments could be more detailed to help us understand our mistakes.",
    "The course content was well-organized, but the lectures were somewhat monotonous and hard to follow. I would suggest incorporating more visual aids and maybe some guest lectures from industry professionals to liven up the sessions.",
    "This was my favorite class this semester! The mix of theory and case studies was perfect, and the exams were fair. I also really appreciated the diversity of perspectives we explored in class, especially in terms of global economic policies.",
    "I found the textbook to be overly complex for an introductory course. It often used jargon that hadn't been explained in lectures, which was confusing. Simpler reading materials or more explanatory lectures would make a big difference for newcomers to economics.",
    "The professor was knowledgeable and clearly passionate about economics, but I felt the course relied too heavily on tests rather than more creative forms of assessment. More varied assignments would make the course more accessible to students with different learning styles.",
    "This class was a solid introduction to economics, though it leaned heavily on theoretical aspects. I would have liked more opportunities to discuss the real-world implications of economic theories, which I believe would enhance understanding and retention of the material.",
]

## Add data to the questions
Next we create a `Scenario` for each evaluation that we will add to the questions when we run the survey:

In [6]:
from edsl import Scenario

scenarios = [Scenario({"evaluation":e}) for e in evaluations]

## Design AI agents
Next we design agents with relevant traits and personas for the model to use in answering the questions. This can be useful if we want to compare responses among different audiences. We do this by passing a dictionaries of `traits` to `Agent` objects. We can also choose whether to give an agent additional instructions for ansering the survey (independent of individual question texts). Here we create a persona for the professor of the course and pass it some special instructions:

In [7]:
from edsl import Agent

persona = "You are a professor reviewing student evaluations for your recent Econ 101 course."
instruction = "Be very specific and constructive in providing feedback and suggestions."

agent = Agent(traits = {"persona": persona}, instruction = instruction)

## Select language models
EDSL works with many popular language models that we can use to generate responses for our survey. We can see a current list of all available models:

In [8]:
from edsl import Model

Model.available()

[['01-ai/Yi-34B-Chat', 'deep_infra', 0],
 ['Austism/chronos-hermes-13b-v2', 'deep_infra', 1],
 ['Gryphe/MythoMax-L2-13b', 'deep_infra', 2],
 ['Gryphe/MythoMax-L2-13b-turbo', 'deep_infra', 3],
 ['HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1', 'deep_infra', 4],
 ['Phind/Phind-CodeLlama-34B-v2', 'deep_infra', 5],
 ['bigcode/starcoder2-15b', 'deep_infra', 6],
 ['bigcode/starcoder2-15b-instruct-v0.1', 'deep_infra', 7],
 ['claude-3-haiku-20240307', 'anthropic', 8],
 ['claude-3-opus-20240229', 'anthropic', 9],
 ['claude-3-sonnet-20240229', 'anthropic', 10],
 ['codellama/CodeLlama-34b-Instruct-hf', 'deep_infra', 11],
 ['codellama/CodeLlama-70b-Instruct-hf', 'deep_infra', 12],
 ['cognitivecomputations/dolphin-2.6-mixtral-8x7b', 'deep_infra', 13],
 ['databricks/dbrx-instruct', 'deep_infra', 14],
 ['deepinfra/airoboros-70b', 'deep_infra', 15],
 ['gemini-pro', 'google', 16],
 ['google/codegemma-7b-it', 'deep_infra', 17],
 ['google/gemma-1.1-7b-it', 'deep_infra', 18],
 ['gpt-3.5-turbo', 'openai', 19],


We select models to use with a survey by creating `Model` objects for them. The default model is GPT 4 Preview, meaning that EDSL will use it to run our survey if we do not specify a different model (with API keys stored). For purposes of demontration, we'll explicitly specify this model the way that we do any other model:

In [9]:
model = Model('gpt-4-1106-preview')

Learn more about available [language models and methods](https://docs.expectedparrot.com/en/latest/language_models.html).

## Run the survey
Now we add the scenarios and agent to the survey, and then run it with the specified model. This will generate a dataset of responses that we can store and begin analyzing:

In [10]:
results = survey.by(scenarios).by(agent).by(model).run()

## Inspect the responses
EDSL comes with [built-in methods for analyzing results](https://docs.expectedparrot.com/en/latest/results.html) in data tables, dataframes, SQL queries and other formats. We can print a list of all the components that can be accessed:

In [11]:
results.columns

['agent.agent_instruction',
 'agent.agent_name',
 'agent.persona',
 'answer.improvements',
 'answer.sentiment',
 'answer.themes',
 'comment.improvements_comment',
 'comment.sentiment_comment',
 'comment.themes_comment',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.improvements_system_prompt',
 'prompt.improvements_user_prompt',
 'prompt.sentiment_system_prompt',
 'prompt.sentiment_user_prompt',
 'prompt.themes_system_prompt',
 'prompt.themes_user_prompt',
 'question_options.improvements_question_options',
 'question_options.sentiment_question_options',
 'question_options.themes_question_options',
 'question_text.improvements_question_text',
 'question_text.sentiment_question_text',
 'question_text.themes_question_text',
 'question_type.improvements_question_type',
 'question_type.sentiment_question_type',
 'question_type.themes_

For example, we can transform the results into a dataframe:

In [12]:
df = results.to_pandas()
df.head()

Unnamed: 0,agent.agent_instruction,agent.agent_name,agent.persona,answer.improvements,answer.sentiment,answer.themes,comment.improvements_comment,comment.sentiment_comment,comment.themes_comment,iteration.iteration,...,question_text.themes_question_text,question_type.improvements_question_type,question_type.sentiment_question_type,question_type.themes_question_type,raw_model_response.improvements_raw_model_response,raw_model_response.sentiment_raw_model_response,raw_model_response.themes_raw_model_response,scenario.edsl_class_name,scenario.edsl_version,scenario.evaluation
0,Be very specific and constructive in providing...,Agent_0,You are a professor reviewing student evaluati...,"['Adjust course pace', 'Balance workload', 'Su...",Positive,"['engaging and informative', 'excellent at bre...",The feedback indicates that while the course c...,The evaluation reflects a positive sentiment o...,The student appreciated the engaging nature of...,0,...,Identify the key points in the following evalu...,list,multiple_choice,list,{'id': 'chatcmpl-9OUMTjhvDgNCiUjpl4CgIYrZY1skB...,{'id': 'chatcmpl-9OU6G4h9uiBF1AVtRRWXADirLiFOe...,{'id': 'chatcmpl-9OUMTHdbeFQlQfILIu1e8Sm3pj2gl...,Scenario,0.1.21,I found the course very engaging and informati...
1,Be very specific and constructive in providing...,Agent_0,You are a professor reviewing student evaluati...,"['Incorporate current events', 'Real-world app...",Negative,"['material felt dry', 'difficult to connect wi...",To enhance student engagement and understandin...,The student expressed difficulty engaging with...,The student found the course content to be une...,0,...,Identify the key points in the following evalu...,list,multiple_choice,list,{'id': 'chatcmpl-9OUMT5o8NwVgPoqtczXFo3V8wjVTf...,{'id': 'chatcmpl-9OU6Gw9lJaZI1Gj1jIatspMFWyCHJ...,{'id': 'chatcmpl-9OUMTci4D30J69XFQ2BpKrZui0Req...,Scenario,0.1.21,This class was a struggle for me. The material...
2,Be very specific and constructive in providing...,Agent_0,You are a professor reviewing student evaluati...,[],Positive,"['Enthusiastic teaching', 'Availability for ex...",The student evaluation is positive without any...,The evaluation is positive as it praises the c...,The evaluation reflects a positive reception o...,0,...,Identify the key points in the following evalu...,list,multiple_choice,list,{'id': 'chatcmpl-9OUMT63f973BB5zLGGnWFukAc3qnG...,{'id': 'chatcmpl-9OU6Gf0qaSmq4br1TpDQS5Al0itC8...,{'id': 'chatcmpl-9OUMTHZRGCiX90dEQ9jOAMJO5zKBM...,Scenario,0.1.21,Excellent introductory course! The professor w...
3,Be very specific and constructive in providing...,Agent_0,You are a professor reviewing student evaluati...,"['Incorporate real-world applications', 'Facil...",Neutral,"['appreciation of analytical rigor', 'desire f...",The student's feedback suggests a need for the...,The evaluation reflects a mixed sentiment. The...,The student valued the analytical depth of the...,0,...,Identify the key points in the following evalu...,list,multiple_choice,list,{'id': 'chatcmpl-9OUMTtCDUXqVZYPqSxZexLuX7YkQa...,{'id': 'chatcmpl-9OU6GToQ479RGEuW3MMM3kGHHQ961...,{'id': 'chatcmpl-9OUMTgpZ7IyVjivK5QTE9VTejpZsM...,Scenario,0.1.21,"As someone with a strong background in math, I..."
4,Be very specific and constructive in providing...,Agent_0,You are a professor reviewing student evaluati...,"['Detailed feedback', 'Understanding mistakes'...",Positive,"['enjoyed group projects', 'application of con...",The student appreciates the practical applicat...,The evaluation expresses a positive sentiment ...,The student appreciated the practical applicat...,0,...,Identify the key points in the following evalu...,list,multiple_choice,list,{'id': 'chatcmpl-9OUMT1OqPDQEYOQAT2HH3vYElYZcK...,{'id': 'chatcmpl-9OU6Gnzy1ZlyjtjRyjIDGPvEb7qrG...,{'id': 'chatcmpl-9OUMTo07KeqUqLBQbk6YGXbVvhCRb...,Scenario,0.1.21,"I enjoyed the course, especially the group pro..."


Here we select just the responses to the questions and display them in a table:

In [13]:
results.select("sentiment", "themes", "improvements").print(format="rich")

We can do a quick tally of the sentiments:

In [14]:
df_sentiment = results.to_pandas()['answer.sentiment']
df_sentiment.value_counts()

answer.sentiment
Positive    4
Neutral     4
Negative    2
Name: count, dtype: int64

## Use responses to construct new questions
We can use the responses to our initial questions to construct more questions about the texts. For example, we can prompt a model to condense the individual lists of themes and areas for improvement into short lists, and then use the new lists to quantify the topics across the set of evaluations.

Here we take the lists of themes in each evaluation, flatten them into a (duplicative) list, and then create a new question prompting a model to condense it for us:

In [15]:
themes = results.select("themes").to_list(flatten=True)
themes

['engaging and informative',
 'excellent at breaking down complex concepts',
 'pace too fast',
 'material felt dry',
 'difficult to connect with real-world applications',
 'lack of current event examples',
 'Enthusiastic teaching',
 'Availability for extra help',
 'Interactive and practical approach',
 'appreciation of analytical rigor',
 'desire for practical connections',
 'feeling of isolation from real-world issues',
 'enjoyed group projects',
 'application of concepts',
 'detailed feedback needed',
 'well-organized content',
 'monotonous lectures',
 'use more visual aids and guest lectures',
 'Engaging course content',
 'Balanced assessment methods',
 'Inclusive curriculum',
 'textbook complexity',
 'unexplained jargon',
 'need for simpler materials or clearer lectures',
 'knowledgeable and passionate',
 'reliance on tests',
 'need for varied assignments',
 'solid introduction',
 'theoretical focus',
 'lack of real-world discussion']

Next we construct a question to condense the list into a new list:

In [16]:
q_condensed_themes = QuestionList(
    question_name = "condensed_themes",
    question_text = """Combine the following list of themes extracted from the evaluations 
    into a consolidated, non-redundant list: """ + ", ".join(themes),
    max_list_items = 10
)

Now we run the question and select the new list. Note that we can choose whether we want to use the agent for this question by not adding it to the question when we run it:

In [17]:
condensed_themes = q_condensed_themes.run().select("condensed_themes").to_list()[0]
condensed_themes

['Engaging and informative',
 'Excellent at breaking down complex concepts',
 'Interactive and practical approach',
 'Need for real-world applications and discussions',
 'Pace and complexity adjustments',
 'Knowledgeable and enthusiastic teaching',
 'Variety in assessments and assignments',
 'Inclusivity and organization of content',
 'Enhanced learning aids (visuals, guest lectures)',
 'Availability for extra help and detailed feedback']

Now we can create a question to identify all the themes in the list that appear in each evaluation (our new list becomes the list of answer options):

In [18]:
from edsl.questions import QuestionCheckBox

q_themes_list = QuestionCheckBox(
    question_name = "themes_list",
    question_text = "Select all of the themes that are mentioned in this evaluation: {{ evaluation }}",
    question_options = condensed_themes
)

Here we run the question and show a table listing all the themes for each evaluation in the results:

In [19]:
themes_lists = q_themes_list.by(scenarios).by(agent).run()
themes_lists.select("evaluation", "themes_list").print(format="rich")

Now we can count the number of evaluations that mention each of the themes:

In [20]:
import pandas as pd
from collections import Counter

themes_lists = themes_lists.select("themes_list").to_list()

flat_list = [(theme, idx) for idx, themes in enumerate(themes_lists) for theme in themes]
count = Counter(theme for theme, idx in set(flat_list))

df_themes = pd.DataFrame(list(count.items()), columns=['Theme', 'Evaluations'])
print(df_themes.sort_values(by='Evaluations', ascending=False))

                                               Theme  Evaluations
4   Need for real-world applications and discussions            5
0                           Engaging and informative            4
2                 Interactive and practical approach            3
3  Availability for extra help and detailed feedback            2
5                    Pace and complexity adjustments            2
6            Inclusivity and organization of content            2
7            Knowledgeable and enthusiastic teaching            2
1             Variety in assessments and assignments            1
8   Enhanced learning aids (visuals, guest lectures)            1
9        Excellent at breaking down complex concepts            1


We can do the same thing with the areas of improvement:

In [21]:
improvements = results.select("improvements").to_list(flatten=True)
improvements

['Adjust course pace',
 'Balance workload',
 'Supplemental resources',
 'Incorporate current events',
 'Real-world applications',
 'Engaging content',
 'Incorporate real-world applications',
 'Facilitate more discussions',
 'Bridge theory and practice',
 'Detailed feedback',
 'Understanding mistakes',
 'Assignment clarity',
 'Incorporate visual aids',
 'Invite guest lecturers',
 'Improve lecture delivery',
 'Simplify textbook',
 'Clarify jargon in lectures',
 'Provide introductory reading materials',
 'Diversify assessments',
 'Incorporate varied assignments',
 'Accommodate different learning styles',
 'Incorporate real-world examples',
 'Interactive discussions',
 'Application-focused activities']

In [22]:
q_condensed_improvements = QuestionList(
    question_name = "condensed_improvements",
    question_text = """Combine the following list of areas for improvement from the evaluations 
    into a consolidated, non-redundant list: """ + ", ".join(improvements),
    max_list_items = 10
)

In [23]:
condensed_improvements = q_condensed_improvements.run().select("condensed_improvements").to_list()[0]
condensed_improvements

['Adjust course pace',
 'Balance workload',
 'Supplemental resources',
 'Incorporate current events and real-world examples',
 'Engage through interactive content and discussions',
 'Provide detailed feedback and clarify assignments',
 'Use visual aids and invite guest lecturers',
 'Bridge theory with practical application',
 'Diversify and accommodate various learning styles',
 'Simplify and clarify course materials']

In [24]:
q_improvements_list = QuestionCheckBox(
    question_name = "improvements_list",
    question_text = "Select all of the improvements that are mentioned in this evaluation: {{ evaluation }}",
    question_options = condensed_improvements
)

In [25]:
improvements_lists = q_improvements_list.by(scenarios).by(agent).run()
improvements_lists.select("evaluation", "improvements_list").print(format="rich")

In [26]:
import pandas as pd
from collections import Counter

improvements_lists = improvements_lists.select("improvements_list").to_list()

flat_list = [(theme, idx) for idx, themes in enumerate(improvements_lists) for theme in themes]
count = Counter(theme for theme, idx in set(flat_list))

df_improvements = pd.DataFrame(list(count.items()), columns=['Improvement', 'Evaluations'])
print(df_improvements.sort_values(by='Evaluations', ascending=False))

                                         Improvement  Evaluations
1           Bridge theory with practical application            5
2  Incorporate current events and real-world exam...            4
0  Diversify and accommodate various learning styles            2
4  Engage through interactive content and discuss...            2
3                                   Balance workload            1
5                                 Adjust course pace            1
6              Simplify and clarify course materials            1
7  Provide detailed feedback and clarify assignments            1
8         Use visual aids and invite guest lecturers            1
9                             Supplemental resources            1


## Summarize the review
Here we create another question prompting the agent to summarize the analysis that was done, using the results of the prior steps:

In [27]:
from edsl.questions import QuestionFreeText

q_summary = QuestionFreeText(
    question_name = "summary",
    question_text = "Consider the following analyses of the evaluations and draft a paragraph summarizing them." +
    "Evaluation counts by theme: " + df_themes.to_string() +
    "Evaluation counts by area of improvement:" + df_improvements.to_string()
)

summary = q_summary.by(agent).run()
summary.select("summary").print(format="rich")

## Other examples
Please check out the [EDSL Docs](https://docs.expectedparrot.com/en/latest/index.html) for examples of other methods and templates for use cases, and [join our Discord channel](https://discord.com/invite/mxAYkjfy9m) to ask questions and with other users!