Skip to content

1. Environment 🗺️

John Yang edited this page Jun 27, 2023 · 1 revision

Inter(active) Code(-ing)

Intercode presents a framework for defining an interactive environment where the main form of interaction is code. In this environment, an agent must modify the execution environment and produce standard output to accomplish a task described by a natural language query. The Intercode framework adopts and builds upon the classic OpenAI gym feedback loop.

In this setting, an agent is first presented with a natural language query that describes a coding task to complete within the context of the environment. The agent can then submit executable code as an action to:

  1. Explore and understand the given context
  2. Get standard output and feedback from executing specific commands

Upon receiving a line of code from the agent, the Intercode environment will execute the command within the given context and respond to the agent with an observation, a reward, and miscellaneous info.

  • observation is the standard output from the execution of the agent's action
  • reward is a value between 0 and 1 that quantifies the correctness of the standard output and environment's state so far with respect to accomplishing the task described by the natural language instruction.
  • info is a dictionary that serves as an additional, optional store of information for environment signals that fall outside the purpose of observation and reward. For instance, in a bash system, the current working directory might be reflected here.

At a high level, the task formulation and feedback loop presented by the Intercode framework is akin to the development practices of programmers and software engineers.

Framework Overview

Given the potential wealth of interaction and reasoning-related challenges of coding tasks, along with an ever-present and trending interest in the decision-making capabilities of digital agents, Intercode aims to be a tool and testbed for training, evaluating, and augmenting such abilities.

The engineering approaches for creating an interactive coding environment can often be complex and variegated. Initial attempts built with a specific task, coding language, or execution context in mind may not work when one of these settings is changed. This results in a landscape of benchmarks and tasks that are hard to compare with one another.

Intercode aims to unify the formulation for and abstract away the foundational engineering challenges of such benchmarks, making it easier for practitioners to focus on designing worthwhile code understanding and reasoning challenges via unique, customizable settings and datasets.

The primary deliverable for this goal is the IntercodeEnv abstraction. IntercodeEnv inherits from the OpenAI gym package to frame code interaction as a action-observation loop. On top of this, IntercodeEnv features logic underneath the hood to:

  • Expressively define a coding environment via Dockerfile
  • Automate dataset management and logging
  • Configure and contextualize the environment for each task.

Features

  • The IntercodeEnv class defines an abstraction that makes it easy to set up an interactive environment that can be configured via a Dockerfile to any coding language and execution context of your choice.
  • Intercode currently features IntercodeEnv-standardized environments for bash and SQL.
  • IntercodeEnv environments can be used in a variety of ways. This repository includes documentation for how to use IntercodeEnv as:
    • A training + evaluation environment for NL-to-code generation agents
    • A wrapper for connecting code agents to real world code tasks and settings
    • A tool that language models can use for code-adjacent downstream tasks

The next several sections will discuss how to quickly set up an interactive code environment using InterCode, detail additional features of this framework, and demonstrate the variety of ways in which a IntercodeEnv class can be used.