# 1 Introduction

* 싸이그래머 / 의사결정RL : 파트 6 - 강화학습 대화형 에이전트 [1]
* 김무성

# Contents
* 1.1 The Design Problem for Spoken Dialogue Systems
* 1.2 Overview 
* 1.3 Structure of the Book

#### 참고

##### WOZ
* [2] Wizard of Oz experiment (Wikipedia) - https://en.wikipedia.org/wiki/Wizard_of_Oz_experiment
* [3] Language Technology II Language-Based Interaction: Dialogue design and evaluation - http://www.coli.uni-saarland.de/courses/late2/slides/NLI%20-%20Design%20and%20evaluation_Web.ppt

##### 최신 동향
* [4] Chat Bots - http://web.stanford.edu/class/cs124/lec/chatbot.pdf
* [5] Deep Reinforcement Learning for Dialogue Generation - https://arxiv.org/abs/1606.01541

The past decade has seen something of a revolution in the field of spoken dialogue systems. 
* As in other areas of Computer Science and Artificial Intelligence, data-driven methods are being used to drive new methodologies for system development and evaluation. 
* These methods are proving to be more robust, flexible, and adaptive than the rule-based approaches which preceded them.
* This book is therefore intended as a guide which navigates through a detailed case study in data-driven methods for development and evaluation of spoken dialogue systems. 
* It focusses on Dialogue Management and Natural Language Gener- ation, rather than speech recognition and spoken language understanding.

# 1.1 The Design Problem for Spoken Dialogue Systems

#### dialogue strategies

* The design of Spoken Dialogue Systems (SDS) is not only concerned with integrating speech and language processing modules such as 
    - Automatic Speech Recognition (ASR), 
    - Spoken Language Understanding (SLU), 
    - Natural Language Generation (NLG), and 
    - Text-to-Speech (TTS) synthesis systems.
* It also requires the development of skills for “what to say next”:
    - dialogue strategies 
        - which take into account the performance of these components, 
            - the nature of the user’s tasks 
                - (e.g.information-seeking, 
                - tutoring, or 
                - robot control), and 
            - other features of the operating environment such as 
                - the user’s behaviour and 
                - preferences.

####  rule-based, hand-coded strategies

* In conventional, rule-based, dialogue development many expensive iterations of manual design and re-design are necessary in order to produce good strategies. 
* In addition, such hand-coded strategies are not re-usable from task to task, are not scalable, require a substantial amount of human labour and expertise, and are not guaranteed to be optimal.

#### machine learning methods

For these reasons machine learning methods (such as Reinforcement Learning) for dialogue strategy design have been a leading research area for several years.
* a data-driven automatic development cycle
* provably optimal action policies
* a principled mathematical model for action selection
* possibilities for generalisation to unseen states
* reduced development and deployment costs.

#### chicken-and-egg

* However, in cases where a system is designed from scratch, there is often no suitable in-domain data to enable such a design. 
* Collecting dialogue data without a working prototype is problematic, leaving the developer with a classic “chicken-and-egg” problem.

<font color="red">One of the main issues that this book addresses is how to use a data-driven development methodology when little or no in-domain data exists.</font>

# 1.2 Overview

#### simulation-based Reinforcement Learning

In this book we propose to learn dialogue strategies by simulation-based Reinforcement Learning (RL) (Sutton and Barto, 1998), 
* where a simulated environment is learned 
    - from small amounts of Wizard-of-Oz (WOZ) data.

#### Wizard-of-Oz (WOZ) data

* Using WOZ data rather than data from real Human-Computer Interaction (HCI) allows us to learn optimal strategies for domains where no working dialogue system already exists. 

#### bootstrapping

* Automatic strategy learning has been applied to dialogue systems which have already been deployed in the real world using handcrafted strategies
    - In such work, strategy learning was performed based on already present extensive online-operation experience, e.g. (Henderson et al, 2005, 2008; Singh et al, 2002).
* In contrast to this preceding work, our approach enables strategy learning in domains where no prior system is available. 
    - Optimised learned strategies are then available from the first moment of online-operation, and labour-intensive handcrafting of dialogue strategies is avoided. 
* <font color="red">This independence from large amounts of in-domain dialogue data allows researchers to apply RL to new application areas beyond the scope of existing dialogue systems. We call this method “bootstrapping”.</font>

#### 5-step procedure

* This book first provides the general proof-of-concept that RL-based strategies outperform handcrafted strategies which are manually tuned for a wide spectrum of application scenarios. 
* We propose to learn dialogue strategies by simulation-based RL, where the simulated environment is learned from small amounts of WOZ data.
* We therefore introduce a 5-step procedure:

1. Collect data 
    - in a WOZ experiment.
2. Use this data to construct a simulated learning environment 
    - using data-driven methods only.
3. Traina RL-based dialogue policy 
    - by interacting with the simulated environment.
    - We compare this policy 
        - against a supervised baseline. 
    - This comparison allows us to 
        - measure the relative improvements 
            - over the WOZ strategies contained in the training data.
4. Evaluate the learned policy 
    - with real users.
5. Show that “bootstrapping” from WOZ data 
    - is a valid estimate of real HCI 
        - by comparing different aspects of 
            - the 3 corpora gathered so far: 
                - the WOZ study, 
                - the dialogues generated in simulation, and 
                - the final user tests.

<font color="red">We apply this framework to optimise multimodal Dialogue Management strategies and Natural Language Generation.</font>

#### multimodal Dialogue Management strategies
* In the first case we consider Dialogue Mangement and content selection as two closely interrelated problems for information seeking dialogues:
    - the decision of <font color="red">when to present information</font> 
    - depends on <font color="red">how many pieces of information to present</font> 
        - and the available options for how to present them, and vice versa.
* We therefore formulate the problem as a <font color="blue">hierarchy of joint learning decisions</font> which are optimised together.

#### Natural Language Generation (NLG)
* The second study describes a new approach to generating Natural Language in interactive systems. 
* Natural Language Generation (NLG) addresses the problem of 
    - <font color="red">“how to say”</font> an utterance, 
        - once “what to say” has been determined by the Dialogue Manager.
* We treat NLG as planning under uncertainty 
    - for information-seeking dialogue systems, 
    - where the strategy for 
        - information presentation and
        - its associated attributes 
        - are incrementally selected 
            - using hierarchical learning.

RL vs SL
* Our results in both studies show that RL significantly outperforms supervised learning (SL) when interacting in simulation as well as for interactions with real users.

#### objective funtion

* One focus of this book is to optimise dialogue strategies with respect to real user preferences. A major advantage of RL-based dialogue strategy development is that the dialogue strategy can be automatically trained and evaluated using the same objective function 
* This book is the first to explore learning with data-driven, non-linear objective functions. 
* We also propose a new method for meta-evaluation of the objective function.

# 1.3 Structure of the Book

#### Chapter 2 (Background)
* This chapter provides the reader with relevant background knowledge for the re- search. After introducing some general information about Spoken Dialogue Sys- tems, we contrast different methods applied in research and industry to develop dialogue strategies.

#### Chapter 3 (Reinforcement Learning)
* This chapter provides technical background on RL for dialogue strategy develop- ment and discusses simulation-based learning in particular.

#### Chapter 4 (Proof-of-Concept: Information Seeking Strategies)
* He we develop the theoretical proof-of-concept that RL-based strategies outperform hand-coded strategies, which are tuned to the same objective function.
* We show this for a wide range of application scenarios, e.g. for different user types and noise conditions. 
* This chapter also demonstrates how to apply simulation-based RL to solve a complex and challenging problem for information-seeking dialogue systems

#### Chapter 5 (A Bootstrapping Approach to Develop Reinforcement Learning-based Strategies)
* This chapter introduces a 5-step procedure model to bootstrap optimal RL-based strategies for WOZ data.

#### Chapter 6 (Data Collection in a Wizard-of-Oz Experiment)
* Here we describe the experimental setup of the WOZ experiment. 
* We explain which changes to the conventional WOZ method are necessary for strategy learning.

#### Chapter 7 (Building a Simulated Learning Environment from Wizard-of-Oz Data)
* This chapter uses the WOZ data to construct a simulated learning environment. 
* We therefore introduce methods suited to build and validate simulations from small amounts of data.

#### Chapter 8 (Comparing Reinforcement and Supervised Learning of Dialogue Policies with Real Users)
* In this chapter we evaluate the learned strategy with real users. 
* We therefore develop a music-player dialogue system using a rapid development tool, where the learned strategy is implemented using a table look-up between states and learned actions. 
* We report detailed results from the real user tests.

#### Chapter 9 (Natural Language Generation)
* This chapter further develops the methodology to encompass elements of policy learning for adaptive Natural Language Generation in spoken dialogue systems.

#### Chapter 10 (Conclusion)
* Finally, we conclude by summarising the main contributions of this work. * We also report on “lessons learned” to provide guidance for future researchers. 

# 참고자료
* [1] Reinforcement Learning for Adaptive Dialogue Systems: A Data-driven Methodology for Dialogue Management and Natural Language Generation - https://www.amazon.com/Reinforcement-Learning-Adaptive-Dialogue-Systems/dp/3642439845
* [2] Wizard of Oz experiment (Wikipedia) - https://en.wikipedia.org/wiki/Wizard_of_Oz_experiment
* [3] Language Technology II Language-Based Interaction: Dialogue design and evaluation - http://www.coli.uni-saarland.de/courses/late2/slides/NLI%20-%20Design%20and%20evaluation_Web.ppt
* [4] Chat Bots - http://web.stanford.edu/class/cs124/lec/chatbot.pdf
* [5] Deep Reinforcement Learning for Dialogue Generation - https://arxiv.org/abs/1606.01541