ACTD

ACTD is a scala library that implements the Temporal Difference reinforcement learning algorithm based on Actor/Critic method with Artificial Neural Network.

Getting Started

Implementing Status trait

The Status trait models the environment task as a MDP (Markov Decision Process).

You have to implement three method of the trait:

def toDenseVector: DenseVector[Double]

The returned vector contains the values for each feature describing the status.

def finalStatus: Boolean

In case of episodic task the function should return true if this is a final state.

def apply(action: Action): Feedback

This is the core method of the environment task implementing the MDP rules. It applies the selected action computing the next status and the reward. The feedback is SARS format (Status, Action, Reward, Status): the current status, the applied action, the reward for the transition and the next status.

If the current status is the end episode then the next status should be the initial status of next episode and the reward should be zero.

Example

object MyStatus {
	val S0 = new Status() {
		val toDenseVector = DenseVector(1.0, 0.0)

		def apply(action: Action) = if (action == 0) {
			Feedback(this, action, -1.0, S1)
		} else {
			Feedback(this, action, -1.0, S0)
		}
	}

	val S1 = new Status() {
		val toDenseVector = DenseVector(0.0, 1.0)

		def apply(action: Action) = if (action == 0) {
			Feedback(this, action, 1.0, S2)
		} else {
			Feedback(this, action, -1.0, S1)
		}
	}

	val S2 = new Status() {
		val toDenseVector = DenseVector(0.0, 0.0)

		val finalStatus = true

		def apply(action: Action) = Feedback(this, action, 0.0, S0)
	}
}

Creating Agent

The Agent is the intelligent component of the system.

It can be created with

val parms = TDParms(
	alpha = 10e-3,
	beta = 1.0,
	gamma = 0.99,
	epsilon = 0.1,
	lambda = 0.1,
	eta = 1.0,
	maxTrainingSample = 1)

val agent = TDAgent(
	parms = parms,
	sigma = 1.0,
	statusSize = 10,
	actionCount = 4,
	10)

To have the explanations of parameters see ...

Interacting the environment

The environment interaction consists of a cycle where the agent selects an action by the current status, the action is applied to the status to get the environment feedback and the feedback is used by current agent to train a new agent. The feedback contains the new status of environment.

val status : Status = ...
val agent : TDAgent = ...

val action = agent.action(status)
val feedback = status(action)
val (agent1, error) = agent.train(feedback)
val Feedback(_, _, reward, status1) = feedback

Saving agent

The agent may be saved into a set of csv files

agent.write("myagent")

The method writes the files:

myagent-parms.csv with the parameter set
myagent-critic-n.csv with the number of critic layers
myagent-critic-0.csv with the weight of critic network for first layer
myagent-critic-{i}.csv with the weight of critic network for (i+1) layer
myagent-critic-{n-1}.csv with the weight of critic network for last layer
myagent-actor-n.csv with the number of actor layers
myagent-actor-0.csv with the weight of actor network for first layer
myagent-actor-{i}.csv with the weight of actor network for (i+1) layer
myagent-actor-{n-1}.csv with the weight of actor network for last layer

Loading agent

The agent may be created reading a set of csv files

val agent = TDAgent("myagent")

The method read the same file used to save.

Akka actors implementation

The environment iteration is implemented by akka actors, which allow to run concurrent training while interacting the environment.

To use the akka implementation you need to create a environment actor:

val initStatus: Status = ...
val agent : TDAgent = ...

val environmentActor = system.actorOf(EnvironmentActor.props(initStatus, agent))

Then you send an Interact message to the actor

environmentActor ! Interact

The actor will reply with a Step(feedback, error, agent) message containing the feedback, the error and the new agent generated.

You may query the current agent by sending a QueryAgent message

environmentActor ! QueryAgent

The actor will reply with a CurrentAgent(agent) message containing the current agent.

The akka implementation creates a trainer akka actor that trains in background mode the critic neural network.

The maxTrainingSample of the TDParms sets the number of recent samples used in training process.

If the maxTrainingSample is set to zero no batch trainer will be created.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
docs		docs
octave		octave
src		src
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
build.sbt		build.sbt
readme.md		readme.md
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACTD

Getting Started

Implementing Status trait

Example

Creating Agent

Interacting the environment

Saving agent

Loading agent

Akka actors implementation

About

Releases

Packages

Languages

License

m-marini/actd

Folders and files

Latest commit

History

Repository files navigation

ACTD

Getting Started

Implementing Status trait

Example

Creating Agent

Interacting the environment

Saving agent

Loading agent

Akka actors implementation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages