Skip to content

mrpg/ego

Repository files navigation

alter_ego

Code style: black

alter_ego is a library that allows you to run experiments with LLMs. ego is a command-line helper included with this package.

alter_ego allows you to run microexperiments using a simple shorthand. You can also create more advanced experiments. For turn-based interactive experiments, a builder is available.

Read our paper.

You can browse our source code in this repository. Here are autogenerated docs that can make it easier to find what you're looking for.

Getting started

Prerequisites

  1. Install Python, at least version 3.8. If you are on Windows, make sure to install Python into the PATH.
  2. Create a virtual environment and activate it. On Linux and macOS, this is very simple. Just open a terminal and execute the following commands:
user@host:~$ python -m venv env
user@host:~$ source env/bin/activate

Note: In this document, you may have to replace python by python3 and pip by pip3. This depends on your system's settings.

On Windows, consider using this tutorial to create and activate a virtual environment.

Some editors also do this for you.

  1. Install alter_ego using
(env) user@host:~$ pip install -U alter_ego_llm

Note how (env) signals that we are in the virtual environment created earlier.

For the remainder of this document, we assume that your editor's current directory is also your terminal's present working directory. From within your terminal, you can find out the present working directory using pwd — this should show the very same directory as opened in your editor.

Using alter_ego with GPT

Note: If you do not want to use GPT for now, simply change GPTThread to CLIThread in the examples below and skip this section.

  1. Obtain an API key from OpenAI. Here is more information. Your API key looks as follows: sk-*** Copy this to your clipboard.
  2. Create a new file in your editor.
  3. Put the content of your clipboard into the file openai_key in your current directory. The file must not have a file extension—it is literally just called openai_key.

Developing a simple microexperiment

New: 📺 WATCH VIDEO TUTORIAL

Let's create a minimal experiment using alter_ego's shorthand feature.

  1. Create a new file in your editor, first_experiment.py. Here's its code:
import alter_ego.agents
from alter_ego.utils import extract_number
from alter_ego.experiment import factorial

def agent():
  return alter_ego.agents.GPTThread(model="gpt-3.5-turbo", temperature=1.0)

prompt = "Estimate the public approval rating of {{politician}} during the {{time}} of their presidency. Only return a single percentage from 0 to 100."

data = factorial(
  prompt,
  politician=["George W. Bush", "Barack Obama"],
  time=["1st year", "8th year"]
).run(agent, extract_number, times=1)

for row in data:
  print(row)

Note how we use variables within the prompt. The crucial feature of alter_ego is how these variables are automatically replaced based on treatment.

  1. In the terminal, run
(env) user@host:~$ python first_experiment.py
  1. This will take a few seconds and give you output similar to this:
{'politician': 'George W. Bush', 'time': '1st year', 'result': None}
{'politician': 'George W. Bush', 'time': '8th year', 'result': None}
{'politician': 'Barack Obama', 'time': '1st year', 'result': 63}
{'politician': 'Barack Obama', 'time': '8th year', 'result': 8}

As you see, GPT did not give a valid response for George W. Bush. Let's debug by changing the line with run to:

).run(agent, extract_number, times=1, keep_retval=True)

Rerunning our script gives:

{'politician': 'George W. Bush', 'time': '1st year', 'result': 62, 'retval': 'Approximately 62%.'}
{'politician': 'George W. Bush', 'time': '8th year', 'result': None, 'retval': "It is difficult to provide an accurate estimate without conducting a specific poll or analysis. However, based on historical data and trends, it is common for a president's approval rating to decline over the course of their second term. Taking into account various factors such as the economic recession and the ongoing Iraq War during George W. Bush's final year in office (2008), it is reasonable to estimate his public approval rating to be around 25-35%. Please note that this estimation is subjective and might not perfectly reflect the actual public sentiment at that time."}

(Here, I have shown only two rows of the output.)

As you see, in the cases where GPT returned only a single number, alter_ego was able to correctly extract it. Unfortunately, GPT 3.5 tends to refuse requests to just return a single number. GPT 4 works better. If you have access to GPT 4 over the API, you can change model="gpt-3.5-turbo" to model="gpt-4". This is the resulting output (where I have omitted the retval once again):

{'politician': 'George W. Bush', 'time': '1st year', 'result': 57}
{'politician': 'George W. Bush', 'time': '8th year', 'result': 34}
{'politician': 'Barack Obama', 'time': '1st year', 'result': 57}
{'politician': 'Barack Obama', 'time': '8th year', 'result': 55}

Here you can view the documentation for run. run allows you to quickly execute an experiment defined by what highfalutin scientists call a “factorial design.” This is because the possibilities of politician (George W. Bush, Barack Obama) were “multiplied” by the possibilities for time (1st year, 8th year).

The nice thing about these microexperiments is that you can easily carry the output forward to Pandas, Polars, etc.—this is because data is only a “list of dicts,” and as such it is trivial to convert to a DataFrame. This allows you to analyze data received straight from an LLM.

Of course, you will often want to set the temperature to 0.0 or another low value. This depends on the nature of your use-case.

Using the builder to construct a turn-based experiment

New: 📺 WATCH VIDEO TUTORIAL

We offer a web app to build simple experiments between multiple LLMs. The builder can be found here, with its source code being available here.

  1. For now, just read through the app (it showcases an example of a framed ultimatum game) and scroll down.

  2. Copy the code shown below “Export or import scenario” on the web app into a new file. Call that file built.json in your current project directory. The file must be called built.json.

  3. Open a terminal and execute

(env) user@host:~$ ego run built
  1. This will show “System instructions” for two separate players. Note how they vary: One player (the first one) is the proposer and the second player is the responder.

  2. The proposer is now asked to put in a proposal in JSON. Let's do it:

{"keep": 4.2}
  1. As you see, the responder is notified and can now ACCEPT or REJECT. Let's accept:
ACCEPT
  1. This completes the experiment. You will see something like:
Experiment c6627c4e-f17f-4cdc-ba47-462eced3e489 OK
  1. Let's look at the data that was generated. We can get it in CSV format by executing:
(env) user@host:~$ ego data built c6627c4e-f17f-4cdc-ba47-462eced3e489 > data.csv

(You need to replace c6627c4e-f17f-4cdc-ba47-462eced3e489 with your actual experiment ID)

This should tell you that 2 lines were written. If you open data.csv in your preferred spreadsheet calculator, you will see the following output:

choice convo experiment i round tainted thread thread_type treatment
{"keep": 4.2} 03ba9edf-99c0-46d6-8c26-42c26683197c c6627c4e-f17f-4cdc-ba47-462eced3e489 1 1 False 1ce5faaf-cc1f-438f-8231-8b7e0d96fb07 CLIThread take
"ACCEPT" 03ba9edf-99c0-46d6-8c26-42c26683197c c6627c4e-f17f-4cdc-ba47-462eced3e489 2 1 False 9043cd57-3e83-42d2-8d62-c783725e05e7 CLIThread take

This is obviously easy to post-process in whatever statistics software you use.

If you re-run the experiment, enter garbage instead of the expected inputs and re-export the data, you will see that the tainted column becomes True. You can check for tainted to verify that inputs were received and processed as expected. Note that once any Thread responds invalidly, the Conversation will be stopped and all Threads will have tainted set to True. Thus, ego data's output may be partial.

You can run your scenario five times by doing

(env) user@host:~$ ego run -n 5 built

Needless to say, but you can replace 5 by any integer whatsoever.

Feel free to experiment with our builder.

Note: Experts can set the environment variable BUILT_FILE to have ego use a different file name from built.json.

Using alter_ego with oTree

New: 📺 WATCH VIDEO TUTORIAL

oTree is a relatively popular framework for web-based experiments.

You can attach Threads (i.e., LLMs) and Conversations (i.e., bundles of LLMs) to oTree objects (participants, players, subsessions or sessions). This basically works as follows (in your app's __init__.py:

from alter_ego.agents import *
from alter_ego.utils import from_file
from alter_ego.exports.otree import link as ai

...

def creating_session(subsession):
    for player in subsession.get_players():
        ai(player).set(GPTThread(model="gpt-3.5-turbo", temperature=1.0))

Here, each player would get their own personal GPT agent. If you want to assign such an agent to a group, just do this:

def creating_session(subsession):
    for group in subsession.get_groups():
        ai(group).set(GPTThread(model="gpt-3.5-turbo", temperature=1.0))

Then, within your code, you can access the agent using a context manager. Here's an example of a simple live_method-based chat if we attached the Thread to the player object:

class Chat(Page):
    def live_method(player, data):
        if isinstance(data, str):
            with ai(player) as llm:
                # this submits the player's message and gets the response
                response = llm.submit(data, max_tokens=500)

                # note: if you put the AI on the "group" object or somewhere other than
                # the player, you may want to change this
                return {player.id_in_group: response}

If you want to set the system prompt, you can do:

def before_next_page(player, timeout_happened):
    with ai(player) as llm:
        # this sets the "system" prompt
        llm.system("You are participating in an experiment.")

You can do whatever you want, but always remember to open the LLM's context (using with ai(...) as llm) before performing any action. (This extra step is necessary because of oTree's ORM, which otherwise couldn't notice changes deep down in the Thread.)

You can also attach a whole Conversation to the aforementioned oTree objects.

We provide a simple Chat in this repository, see the directory otree/ego_chat.

Remember to put your API key into your oTree project folder. alter_ego saves message histories automatically in .ego_output in your oTree project folder.

Developing full-fledged experiments

New: 📺 WATCH VIDEO TUTORIAL

You can use the primitives exposed by this library to develop full-fledged experiments that go beyond the capabilities of our builder. The directory scenarios/ contains a bunch of examples, including the code for our paper's machine--machine interaction example (ego_prereg.py). Watch the video tutorial to get a feeling for what's possible.

Citation

When using any part of alter_ego in a scientific context, cite the following work:

@article{ego,
  title={Integrating Machine Behavior into Human Subject Experiments: A User-friendly Toolkit and Illustrations},
  author={Engel, Christoph and Grossmann, Max R. P. and Ockenfels, Axel},
  year={2023},
}

License

alter_ego is © Max R. P. Grossmann et al., 2023. It is licensed under LGPLv3+. Please see LICENSE for details.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU Lesser General Public License for more details.