Skip to content

martin-fabbri/reinforcement-learning-playground

Repository files navigation

Reinforcement Learning Playground

What is reinforcement learning?

  • Reinforcement learning is a branch of machine learning.
  • Involves an agent and environment.
  • Agents learns optimal for maximizing rewards.

When should we worry about sequential decision making?

Limited supervision: you know what you want, but not how to get it.

Late consequences?

Why learn RL?

  • Not just for games
  • Make optimal decisions
  • Maximize efficiency

What are RL applications?

  • Robotics
  • Self-driving cars
  • Inventory management
  • Finantial investments
  • Decision-based situations

RL terminology

What is the agent?

  • The agent is the algorithm
  • Decides which action to tale
  • Agent monitors the environment
  • Who is learning
  • It's only outcome are decisions(actions, controls)

What is an environment?

  • The environment is everything the agent can interact with.
  • Agent's actions affect the environment.
  • It responds to actor's actions with consequences(observations, rewards estimation)

What is a state?

  • The state is a representation of what the agent can sense.
  • Does not always involve the entire environment. It's limited to what the agent can sense.

What is an action?

  • An action is what an agent can do is a given state.
  • Actions are limited by the environment.
  • The action's goal is to maximize reward.

What is the reward?

  • Result from making an action.
  • Feedback from the environment.
  • It can be positive or negative.
  • Helps encourage or discourage certain actions, policies or behaivours.
  • Is what the agent tries to optimize.
  • Rewards are hard to formulate.

Where do rewards come from?

  • When playing video games, rewards come from scores.

Are there other forms of supervision?

  • Learning from demostrations.

    • Directly copying observed behavior.
    • Inferring rewards from observed behavior.
  • Learning from observing the world.

    • Learning to predict.
    • Unsupervised Learning
  • Learning from other tasks

    • Transfer learning

What is the standard reinforcement loop?

  • TODO

What is Deep Reinforcement Learning?

  • Deep learning: end-to-end training of expressive, multi-layer models.
  • Deep models are what allow RL algorithms to solve complex problems end-to-end.

Why Deep Reinforcement Learning?

  • Deep = can process complex sensory input

What can deep learning & RL do well now?

  • Adquire high degree of proficiency in domains governed by simple, known rules.
  • Learn simple skills with raw sensory inputs, given enough experience.
  • Learn from imitating enough human-provided expert behavior.

What has proven challenging so far?

  • Humans can learn incredibly quickly
  • Humans can reuse past knowledge
    • Transfer learning in deep RL is an open problem
  • Not clear what the reward function should be

How do we build intelligent machines?

Learning as the basis of intelligence.

  • Some things we can all do.
  • Some things we can only learn.