# Using EDSL for concept induction
This notebook shows how [EDSL](https://docs.expectedparrot.com) can be used to perform a "concept induction" on some unstructured text. In a series of steps we use EDSL to create some simple methods for:

* Synthesizing topics from a set of unstructured texts
* Creating criteria for determining whether a text addresses a given topic
* Scoring how well each text satisfies each criteria

This example is inspired by Michelle Lam & team's fascinating recent work on concept induction: "Analyzing Unstructured Text with High-Level Concepts Using LLooM" (https://twitter.com/michelle123lam/status/1781031027567390749).

<i>[EDSL](https://docs.expectedparrot.com) is an open-source Python package for conducting surveys and experiments with language models. Please see our docs for details on [getting started](https://docs.expectedparrot.com/en/latest/starter_tutorial.html).</i>

## Extending the analysis
These methods can easily be modified to perform a different analysis by editing the <i>question_text</i> instructions for the language model. The analysis can also be extended to compare results using any of the [available models for EDSL](https://docs.expectedparrot.com/en/latest/language_models.html), and to examine how results change when different AI agent personas are used to conduct the analysis (e.g., we can create agents with personas relevant to the review that we prompt the language model to reference in conducting the analysis--please see details on [creating AI agents](https://docs.expectedparrot.com/en/latest/agents.html) in our docs).

### Importing the tools
EDSL comes with a variety of standard question types (see examples of all [question types](https://docs.expectedparrot.com/en/latest/questions.html#question-type-classes)). Here we import the ones that we want to use based on the desired form of the response. `QuestionList` will format the response as a list of strings, `QuestionFreeText` will return unstructured text and `QuestionNumerical` will return a numerical value:

In [1]:
from edsl.questions import QuestionList, QuestionFreeText, QuestionNumerical

In [2]:
# Return a list of topics addressed in a piece of text
def get_topics(text):
    q = QuestionList(
        question_name = "topics",
        question_text = f"""Return a list of topics addressed in the following text: {text}"""
    )
    results = q.run().select("topics").to_list()[0]
    return results

# Condense a list of topics
def condense_topics(topics):
    q = QuestionList(
        question_name = "condense",
        question_text = f"""Return a condensed non-duplicative list of the following topics: {topics}"""
    )
    results = q.run().select("condense").to_list()[0]
    return results

# Create criteria for a given topic
def get_criteria(topic):
    q = QuestionFreeText(
        question_name = "criteria",
        question_text = f"""Consider the following topic: {topic}. 
        Briefly describe some criteria for determining whether a given text addresses this topic."""
    )
    results = q.run().select("criteria").to_list()[0]
    return results

# Score how well a text satisfies the criteria for a topic
def get_score(topic, criteria, text):
    q = QuestionNumerical(
        question_name = "score",
        question_text = f"""Consider the following topic and criteria for determining whether a given text addresses the topic:
        Topic: {topic}
        Criteria: {criteria}
        On a scale from 0 to 100, how well does the following text satisfy these criteria such that we are confidant 
        that the text addresses the topic? (0 = Not at all, 100 = Perfectly)
        Text: {text}"""
    )
    results = q.run().select("score").to_list()[0]
    return results

In [3]:
# Run the methods for a set of texts

texts = [
    "Sean Duffy. Honestly, I'm disgusted by these remarks! Democrats are mad at Biden, but for the wrong reason! I thought we were better than this but apparently, we're not there yet...",
    "Senator Chuck Schumer. It's been incredible to meet with so many New Yorkers today as we celebrate #MLKDay and as we work to honor the life and legacy of Reverend Dr. Martin Luther King, Jr. by continuing his march toward equality for all.",
    "U.S. Senator Elizabeth Warren. House Republicans want to impose a national sales tax, while giving more tax breaks to the rich. The result? Higher costs for working families on everything from gas to groceries. It's outrageous.",
    "Ron Johnson. Congrats Ronna McDaniel. Looking forward to your continued leadership.",
    "Senator Michael Bennet. The fatal beating of Tyre Nichols is horrifying. I'm devastated for his family and the Memphis community. We must fight for a world that ends this injustice and inhumane brutality at last.",
    "Kyrsten Sinema. 'Close to 80% of the graduates are women and people of color, fulfilling a key diversity marker the airline aimed to achieve.' This is awesome - sending a warm congraulations to this talented class of future pilots! 'United Aviate Academy in Arizona graduates its 1st class of future pilots. Why that's big.'",
    "Senator Chuck Schumer. Today, we mark international Holocaust Remembrance Day. We will never forget the 6 million Jewish victims and other victims of the Nazis. And on a day when innocent victims are murdered in a terror attack in a Jerusalem synagogue, we must continue to fight antisemitism and hatred."
]

def analyze_texts(texts):
    topics = []
    for text in texts:
        topics.append(get_topics(text))

    condensed_topics = condense_topics(topics)

    topics_criteria = {}
    for topic in condensed_topics:
        topics_criteria[topic] = get_criteria(topic)

    analyzed_texts = []
    for text in texts:
        topics_scores = {}
        topics_scores["text"] = text
        for topic, criteria in topics_criteria.items():
            topics_scores[topic] = get_score(topic, criteria, text)
        analyzed_texts.append(topics_scores)

    return analyzed_texts

In [None]:
analyze_texts(texts)

### Using AI agents
This analysis can be extended by comparing results for different personas that we prompt the language model to reference in answering the questions. This is done by passing a dictionary of desired traits to an `Agent` object. For example, we could create an agent representing a political strategist to answer the questions in the above methods as follows:

In [4]:
# Create an Agent with some traits
from edsl import Agent

agent = Agent(name = "Political strategist", traits = {"persona": "You are a political strategist..."}) 

q = QuestionFreeText(
    question_name = "important_topics",
    question_text = "What are the most important topics in the following texts: {{ texts }}"
)

# We can use Scenario objects to parameterize questions
from edsl import Scenario

scenario = Scenario({"texts": texts})

# Run the question with the specified agent
results = q.by(scenario).by(agent).run()

# Inspect results
results.select("persona", "important_topics").print()

agent.persona,answer.important_topics
You are a political strategist...,"The important topics in the texts include criticism of President Biden by members of his own party, the celebration and continuation of Dr. Martin Luther King Jr.'s legacy, opposition to a proposed national sales tax and its impact on working families, leadership within the Republican party, the call for justice and an end to police brutality following the death of Tyre Nichols, the achievement of diversity goals in a pilot training academy, and the commemoration of International Holocaust Remembrance Day coupled with a denunciation of a recent terror attack and a call to fight antisemitism and hatred."


Learn more about creating AI agents to use with surveys [here](https://docs.expectedparrot.com/en/latest/agents.html). Learn more about working with survey results [here](https://docs.expectedparrot.com/en/latest/results.html).

### Selecting language models
In the methods above we did not specify a language model, so the default model, GPT 4, was used to generate results. To compare results using different language models, we create `Model` objects for desired models and modify the methods to use those models.

To see currently available models:

In [5]:
from edsl import Model

Model.available()

['claude-3-haiku-20240307',
 'claude-3-opus-20240229',
 'claude-3-sonnet-20240229',
 'dbrx-instruct',
 'gemini_pro',
 'gpt-3.5-turbo',
 'gpt-4-1106-preview',
 'llama-2-13b-chat-hf',
 'llama-2-70b-chat-hf',
 'mixtral-8x7B-instruct-v0.1']

To specify a model to use:

In [6]:
model = Model('gpt-3.5-turbo')

To run a question or survey with a specified model we append the `by` method as we do for scenarios and agents:

In [7]:
results = q.by(scenario).by(agent).by(model).run()

Learn more about specifying models to use with surveys [here](https://docs.expectedparrot.com/en/latest/language_models.html).