Skip to content
KPIRL is a non-linear extension to Abbeel and Ng's Projection IRL algorithm (detailed in "Apprenticeship Learning via Inverse Reinforcement Learning"). The code base also includes a specially made RL algorithm (KLA) designed for large state-action spaces with unknown rewards.
MATLAB R Objective-C M
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
algorithms
domains
shared
.gitignore
LICENSE
README.md
qs_compare.m
qs_inverse.m
qs_paths.m

README.md

KPIRL-KLA

KPIRL is a non-linear extension to Abbeel and Ng's linear Projection IRL algorithm.

KLA is an RL algorithm created specifically to be used with KPIRL in large state/action spaces.

Installation

  1. Clone the repository

Requirements

  • Matlab
    • Statistics and Machine Learning Toolbox (for pdist in k_norm.m)
    • Parallel Computing Toolbox (for parfor in kla.m and trajectories_from_simulations.m)

Directory Structure

  • algorithms - contains all algorithm implementations (many of the implemented algorithms are for comparison purposes only)
    • inverse reinforcement learning
      • PIRL - Projection inverse reinforcemnt learning (paper)
      • KPIRL - Kernel projection inverse reinforcement learning
    • reinforcement learning
      • KLA - Kernel lookup approximation
      • KLSPI - Kernel-based least squares policy iteration (paper)
      • LSPI - Least-squares policy iteration (paper)
  • domains - specific problem domain implementations
    • <domain name> - folder name unique for each domain
      • data - contains the raw data for the specific domain (no standardization here)
      • algos - contains the necessary function implementations for the various algorithms
      • work - catch all folder for domain specific work/research (no standardization here)
  • shared - a collection of utility functions that can be used across domains
    • kernel - implementations of popular kernel methods that can be used with KPIRL
    • basis - utility functions to turn features into the basis forms required by the algorithms

Quick Start

Two example files have been provided in the root directory for a "quick start". These files use the "huge" domain, but could easily be used with any domain. The files should be executable "out-of-the-box". Further documentation is provided in-line within the files.

  • qs_compare.m - compares the performance of three different RL algorithms in the "huge" domain. To compare performance a number of random reward functions are generated, then a policy is learned for each of these functions using the RL algorithms. Using the learned policies a number of random episodes are generated. Each episode's value is calculated, and the expected value for each RL algorithm is output for comparison.

  • qs_inverse.m - uses kpirl on the "huge" domain to calculate the reward function.

  • qs_paths.m - adds all required paths to Matlab for the duration of the current session

Algorithm Functions

KPIRL Functions

* <domain>_reward_basis
	* Input:
		* there is no input for this function
	* Output:
		* r_i -- a function that can:
			* take no input and return the number of basis combinations
			* take a matrix of states and return a row vector of the basis index for each state
			* take a cell array of states and return a row vector of the basis index for each state
		* r_p -- a function that can:
			* take no input and return the number of basis features
			* take a matrix of states or basis indexes and return a matrix of basis features
			* take a cell array of states or basis indexes and return a matrix of basis features
	* examples:
		* given a state _s_ the following predicate is always true `r_p(_s_) == r_p(r_i(_s_))`
		* to get all potential basis permutations one can do `r_p(1:r_i())`
		* to get a random basis permutation one can do `r_p(randi(r_i()))`
		* to pre-allocate a basis matrix for _n_ states one could do `zeros(r_p(), _n_)`

		
* <domain>_reward_trajectories
	* Input:
		* reward -- a function which takes a state and returns a reward value (i.e. @(state) => reward value). The function should be able to take a set of states (represented by a matrix or cell array, [s_1, s_2, s_3 ...]) and return a row vector of rewards ([r_1, r_2, r_3]).
	* Output:
		* trajectories -- a cell array of optimal trajectories when following the given reward function. Trajectories can be represented as either a matrix, whose column vectors are states or as a cell array themselves. (i.e., trajectories = {trajectory_1, trajectory_2, ...} and trajectory_1 = <[s_1, s_2, ...] | {s_1, s_2, ...}>). In order to get an accurate understanding of how the reward function effects behavior the trajectories should be generated using randomly selected initial state within the MDP.

* <domain>_expert_trajectories
	* Output:
		* trajectories -- a cell array of observed expert trajectories. Trajectories can be represented as either a matrix, whose column vectors are states or as a cell array themselves. (i.e., trajectories = {trajectory_1, trajectory_2, ...} and trajectory_1 = <[s_1, s_2, ...] | {s_1, s_2, ...}>)
	
* <domain>_parameters
	* Input:
		* p_in -- an optional struct that will be used to change the existing parameters. If not passed in the current settings are returned.
	* Output:
		* p_out -- a struct which contains the parameters for the various algorithms. This function needs to persist the paramters from call to call in order to work properly. The example domains do this via matlab's `persistent` command though it could be done other ways if necessary.

KLA Functions

* \<domain\>_actions
* \<domain\>_random
* \<domain\>_transitions
* \<domain\>_value_basis
* \<domain\>_parameters

LSPI Functions

* Look at the README file in the LSPI algorithm folder

KLSPI Functions

* All the standard LSPI functions (see above referenced README)
* \<domain\>_value_basis_klspi
You can’t perform that action at this time.