In-context_Learning_for_Automated_Driving 🤔

The code in this repository is based on the paper Reward Design with Language Models.

This repository contains the prompts that we used for each domain as well as code to train an RL agent with an LLM in the loop using those prompts. Each domain (Ultimatum Game, Matrix Games, DealOrNoDeal) has a separate directory and will need a separate conda/virtual environment.

Please check out the READMEs in each directory for more information on how to run things.

Description

We implemented a framework that utilizes LLMs to assist Reinforcement Learning (RL) agents in better simulating human driving behavior. First, we define the desired driving style and output format and input them into the agent. Then, we describe the current scenario the ego car faces to the LLM. Except for the first round, in each subsequent round, we provide the LLM with the decision output from the previous round, enabling better context understanding. By comparing the LLM's output decision with the decision made by the RL agent, we generate a reward function to train the LLM-RL agent.

Model Setup

We use GPT3 for our experiments. You will need an API key from them saved in a file named MY_KEY.txt.
Use requirement.txt for model environment setup： pip install -r requirements.txt

Model Work Flow

Our framework integrates a Language Model (LM) to process textual prompts as input and generate a reward signal. The input prompt consists of three primary components: the Task Description, the User Objective, and the Last Outcome. Specifically, the LM functions by accepting a concatenated version of these components and returns a textual string. This output is then analyzed by a parser, which translates the text into a binary reward signal. The binary reward signal is subsequently utilized to train a Reinforcement Learning (RL) agent. By employing the LM as a proxy for the traditional reward function, our framework is compatible with various RL training algorithms, enhancing adaptability and potential applications.

Prompt Structure

A prompt for one interaction consists of the last action, the predefined prompt, and the scene description. This description includes available actions, a general overview, and a safety evaluation.

Conservative Model and Aggressive Model

An example of conservative model：

An example of aggressive model：

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Auto_Driving_Highway		Auto_Driving_Highway
Results		Results
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto_Driving_Highway

Auto_Driving_Highway

Results

Results

LICENSE

LICENSE

README.md

README.md

Repository files navigation

In-context_Learning_for_Automated_Driving 🤔

Description

Model Setup

Model Work Flow

Prompt Structure

Conservative Model and Aggressive Model

About

Releases

Packages

Contributors 3

Languages

License

JingYue2000/In-context_Learning_for_Automated_Driving

Folders and files

Latest commit

History

Repository files navigation

In-context_Learning_for_Automated_Driving 🤔

Description

Model Setup

Model Work Flow

Prompt Structure

Conservative Model and Aggressive Model

About

Resources

License

Stars

Watchers

Forks

Languages