Skip to content

Latest commit

ย 

History

History
285 lines (243 loc) ยท 20.9 KB

part3.md

File metadata and controls

285 lines (243 loc) ยท 20.9 KB

์ถœ์ฒ˜

๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ 3 : https://www.hanbit.co.kr/store/books/look.php?p_code=B6627606922

์†Œ์Šค์ฝ”๋“œ : https://github.com/WegraLee/deep-learning-from-scratch-3

[DeZero] 3. ๊ณ ์ฐจ ๋ฏธ๋ถ„ ๊ณ„์‚ฐ

  • ๊ณ ์ฐจ ๋ฏธ๋ถ„ : ์–ด๋–ค ํ•จ์ˆ˜๋ฅผ 2๋ฒˆ ์ด์ƒ ๋ฏธ๋ถ„ํ•œ ๊ฒƒ

1. ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™”

Graphviz์™€ DOT ์–ธ์–ด

  • Graphviz : ๊ทธ๋ž˜ํ”„(๋…ธ๋“œ์™€ ํ™”์‚ดํ‘œ๋กœ ์ด๋ค„์ง„ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ)๋ฅผ ์‹œ๊ฐํ™”ํ•ด์ฃผ๋Š” ๋„๊ตฌ
  • Graphviz ์„ค์น˜ํ•˜๊ธฐ
    1. Windows์šฉ ๋‹ค์šด๋กœ๋“œ ๋ฐ ์„ค์น˜
    2. ํ™˜๊ฒฝ๋ณ€์ˆ˜ path์— C:\Program Files\Graphviz\bin๋ฅผ ์ถ”๊ฐ€ํ•ด์ค€๋‹ค.
    3. ์•„๋‚˜์ฝ˜๋‹ค ํ”„๋กฌํ”„ํŠธ์—์„œ graphviz๋ฅผ ์„ค์น˜ ํ›„ dot ๋ช…๋ น์„ ์‹คํ–‰ํ•ด๋ณธ๋‹ค.
      conda install python-graphviz
      dot -V
  • DOT ์–ธ์–ด : ๊ฐ„๋‹จํ•œ ๋ฌธ๋ฒ•์œผ๋กœ ๊ทธ๋ž˜ํ”„๋ฅผ ์ž‘์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.
    • ๊ธฐ๋ณธ ๊ตฌ์กฐ : digraph g {...}
    • ๊ฐ ๋…ธ๋“œ๋ฅผ ์ค„๋ฐ”๊ฟˆ์œผ๋กœ ๊ตฌ๋ถ„
    • ๋…ธ๋“œID๋Š” 0 ์ด์ƒ์˜ ์ •์ˆ˜์ด๋ฉฐ, ๋‹ค๋ฅธ ๋…ธ๋“œ์™€ ์ค‘๋ณต ๋ถˆ๊ฐ€๋Šฅ
    • ์˜ˆ์‹œ ์ฝ”๋“œ (๊ฒฐ๊ณผ : x -> Exp -> y)
      digraph g{
      1 [label='x', color=orange, style=filled]
      2 [label='y', color=orange, style=filled]
      3 [label='Exp', color=lightblue, style=filled, shape=box]
      1 -> 3
      3 -> 2
      }
    • ์ด๋ฏธ์ง€๋กœ ๋ณ€ํ™˜ : dot sample.dot -T png -o smaple.png

๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™” ์ฝ”๋“œ

  • dezero/utils.py
    • get_dot_graph์—์„œ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๋ณด์กฐ ํ•จ์ˆ˜๋Š” ์ด๋ฆ„ ์•ž์— ๋ฐ‘์ค„(_)์„ ๋ถ™์˜€๋‹ค.
    • id() : ํŒŒ์ด์ฌ ๋‚ด์žฅํ•จ์ˆ˜๋กœ, ์ฃผ์–ด์ง„ ๊ฐ์ฒด์˜ ID๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ๊ณ ์œ ํ•œ ๋…ธ๋“œID๋กœ ์‚ฌ์šฉํ•˜๊ธฐ์— ์ ํ•ฉํ•˜๋‹ค.
    • get_dot_graph๋Š” Variable.backward() ๋ฉ”์„œ๋“œ์™€ ๊ฑฐ์˜ ๋น„์Šทํ•œ ํ๋ฆ„์œผ๋กœ ๊ตฌํ˜„ํ•œ๋‹ค.
  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ ์‹œ๊ฐํ™” ์˜ˆ์‹œ

2. ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ๋ฏธ๋ถ„

ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ์ด๋ก 

  • ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜(Taylor Series) : ์–ด๋–ค ํ•จ์ˆ˜๋ฅผ ๋‹คํ•ญ์‹์œผ๋กœ ๊ทผ์‚ฌํ•˜๋Š” ๋ฐฉ๋ฒ•
    • : ์  ์—์„œ ์˜ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜, ํ•ญ์ด ๋งŽ์•„์งˆ์ˆ˜๋ก ๊ทผ์‚ฌ์˜ ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง
  • ๋งคํด๋กœ๋ฆฐ ์ „๊ฐœ(Maclaurin's series) : ์ผ ๋•Œ์˜ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜
  • ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜ ๊ตฌํ˜„ ์˜ˆ์ œ
    • ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜์˜ ์ž„๊ณ—๊ฐ’(threshhold)์„ ์ž‘๊ฒŒํ•  ์ˆ˜๋ก ์ด๋ก ์ƒ์œผ๋กœ ๊ทผ์‚ฌ ์ •๋ฐ€๋„๊ฐ€ ๋†’์•„์ง„๋‹ค. ํ•˜์ง€๋งŒ ์ปดํ“จํ„ฐ ๊ณ„์‚ฐ์—์„œ๋Š” ์ž๋ฆฟ์ˆ˜ ๋ˆ„๋ฝ์ด๋‚˜ ๋ฐ˜์˜ฌ๋ฆผ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค.

3. ํ•จ์ˆ˜ ์ตœ์ ํ™”

์ตœ์ ํ™”

  • ์ตœ์ ํ™” : ์–ด๋–ค ํ•จ์ˆ˜๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ทธ ์ตœ์†Ÿ๊ฐ’(ํ˜น์€ ์ตœ๋Œ“๊ฐ’)์„ ๋ฐ˜ํ™˜ํ•˜๋Š” '์ž…๋ ฅ(ํ•จ์ˆ˜์˜ ์ธ์ˆ˜)'์„ ์ฐพ๋Š” ์ผ, ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ๋ชฉํ‘œ๋Š” ์†์‹ค ํ•จ์ˆ˜์˜ ์ตœ์ ํ™”์ด๋‹ค.
  • ํ•จ์ˆ˜ ์˜ˆ์‹œ
    • ๋กœ์  ๋ธŒ๋ก ํ•จ์ˆ˜(Rosenbrock function, Banana function)
      • (๋‹จ, ๋Š” ์ •์ˆ˜)
      • ๋ฒค์น˜๋งˆํฌ๋กœ ์ž์ฃผ ์“ฐ์ด๋Š” ์ด์œ  : ๊ณจ์งœ๊ธฐ๋กœ ํ–ฅํ•˜๋Š” ๊ธฐ์šธ๊ธฐ์— ๋น„ํ•ด ๊ณจ์งœ๊ธฐ ๋ฐ”๋‹ฅ์—์„œ ์ „์—ญ ์ตœ์†Ÿ๊ฐ’์œผ๋กœ ๊ฐ€๋Š” ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ์ž‘์•„์„œ ์ตœ์ ํ™”ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•

  • ๊ธฐ์šธ๊ธฐ(gradient) : ๊ฐ ์ง€์ ์—์„œ ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ์„ (์ ์–ด๋„ ๊ตญ์†Œ์ ์œผ๋กœ๋Š”) ๊ฐ€์žฅ ํฌ๊ฒŒ(+)/์ž‘๊ฒŒ(-) ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.
  • ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(gradient descent) : ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์— ๋งˆ์ด๋„ˆ์Šค๋ฅผ ๊ณฑํ•œ ๋ฐฉํ–ฅ์œผ๋กœ ์ผ์ • ๊ฑฐ๋ฆฌ๋งŒํผ ์ด๋™ํ•˜์—ฌ ๋‹ค์‹œ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜๋Š” ์ž‘์—…์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์›ํ•˜๋Š” ์ง€์ ์— ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ•
    • ๋Š” ์ˆ˜๋™ ์„ค์ •, ์˜ ๊ฐ’๋งŒํผ ๊ธฐ์šธ๊ธฐ(1์ฐจ ๋ฏธ๋ถ„) ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰ํ•˜์—ฌ ์˜ ๊ฐ’์„ ๊ฐฑ์‹ ํ•œ๋‹ค.
  • ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ์ตœ์ ํ™” ์˜ˆ์‹œ

๋‰ดํ„ด ๋ฐฉ๋ฒ•

  • 2์ฐจ๊นŒ์ง€ ํ…Œ์ผ๋Ÿฌ ๊ธ‰์ˆ˜๋กœ ๊ทผ์‚ฌ
    • ์–ด๋–ค ํ•จ์ˆ˜๋ฅผ ์˜ 2์ฐจ ํ•จ์ˆ˜๋กœ ๊ทผ์‚ฌํ•œ ๊ฒƒ์„ ๋ผ๊ณ  ํ•˜์ž.
    • ๊ทผ์‚ฌํ•œ 2์ฐจ ํ•จ์ˆ˜์˜ ์ตœ์†Ÿ๊ฐ’์€ 2์ฐจ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ๊ฐ€ 0์ธ ์œ„์น˜์— ์žˆ๋‹ค.
  • ๋‰ดํ„ด ๋ฐฉ๋ฒ•(Newton's method) : 2์ฐจ ๋ฏธ๋ถ„์„ ์ด์šฉํ•˜์—ฌ ๋ฅผ ์ž๋™์œผ๋กœ ์กฐ์ •ํ•œ๋‹ค.
  • ๋‰ดํ„ด ๋ฐฉ๋ฒ• ์ตœ์ ํ™” ์˜ˆ์‹œ

4. ๊ณ ์ฐจ ๋ฏธ๋ถ„

๋ชฉํ‘œ

๋ฐฉ๋ฒ•

  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ์—ฐ๊ฒฐ์€ Function ํด๋ž˜์Šค์˜ __call__๋ฉ”์„œ๋“œ์—์„œ ์ˆœ์ „ํŒŒ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ๋งŒ๋“ค์–ด์ง„๋‹ค.
  • ์—ญ์ „ํŒŒ๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ๋„ ์—ฐ๊ฒฐ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค๋ฉด, ๊ณ ์ฐจ ๋ฏธ๋ถ„ ๊ตฌํ˜„ ๊ฐ€๋Šฅ
  • ๋”ฐ๋ผ์„œ ๋ฏธ๋ถ„๊ฐ’(๊ธฐ์šธ๊ธฐ)์„ Variable ์ธ์Šคํ„ด์Šค๋กœ ๋งŒ๋“ค๊ธฐ
    1. output๋ณ€์ˆ˜์˜ grad๋ฅผ Variable ์ธ์Šคํ„ด์Šค๋กœ ๋งŒ๋“ ๋‹ค.
    2. Function์˜ backward์—์„œ ๊ณ„์‚ฐํ•˜๋Š” ๋ณ€์ˆ˜๋„ Variable ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
    3. ๊ทธ๋Ÿฌ๋ฉด ๋‹ค์Œ ์„ธ๋Œ€์—์„œ๋„ grad๊ฐ€ ๊ณ„์† Variable ์ธ์Šคํ„ด์Šค๊ฐ€ ๋œ๋‹ค.
  • ์—ญ์ „ํŒŒ ํ™œ์„ฑ/๋น„ํ™œ์„ฑ ๋ชจ๋“œ(create_graph)
    • ์—ญ์ „ํŒŒ ํ™œ์„ฑํ™”
      • (Function์˜ backward๋ฉ”์„œ๋“œ->(๋‹ค๋ฅธ Function ์ธ์Šคํ„ด์Šค์˜)__call__๋ฉ”์„œ๋“œ) ๊ณผ์ •์—์„œ ๊ทธ ๋‹ค์Œ ์—ญ์ „ํŒŒ๋ฅผ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ €์žฅํ•˜๊ณ , ์—ฐ๊ฒฐ์„ ๋งŒ๋“ ๋‹ค.
      • ๋”ฐ๋ผ์„œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์—์„œ์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฐ๋‹ค. (๊ณ ์ฐจ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅ)
      • y.backward(create_graph=True)์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
    • ์—ญ์ „ํŒŒ ๋น„ํ™œ์„ฑํ™”
      • (Function์˜ backward๋ฉ”์„œ๋“œ->(๋‹ค๋ฅธ Function ์ธ์Šคํ„ด์Šค์˜)__call__๋ฉ”์„œ๋“œ) ๊ณผ์ •์—์„œ ๊ทธ ๋‹ค์Œ ์—ญ์ „ํŒŒ๋ฅผ ์œ„ํ•œ ๋ณ€์ˆ˜๋Š” ์ €์žฅํ•˜์ง€ ์•Š๊ณ , ์—ฐ๊ฒฐ๋„ ๋งŒ๋“ค์ง€ ์•Š๋Š”๋‹ค.
      • ๋”ฐ๋ผ์„œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •์—์„œ์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค. (๊ณ ์ฐจ ๋ฏธ๋ถ„ ํ•„์š” ์—†์Œ)
      • gx.backward()์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
  • ๋ฏธ๋ถ„๊ฐ’ ์žฌ์„ค์ •
    • ๋ฌธ์ œ : ๋ฏธ๋ถ„๊ฐ’ ๋ˆ„์ 
      y.backward(create_graph=True)
      print(x.grad) # x.grad = dy/dx
      gx = x.grad
      gx.backward()
      print(x.grad) # x.grad = dy/dx + dy/d^2x
    • ํ•ด๊ฒฐ : ๋ฏธ๋ถ„๊ฐ’ ์žฌ์„ค์ •
      y.backward(create_graph=True)
      print(x.grad) # x.grad = dy/dx
      gx = x.grad
      x.cleargrad() # ๋ฏธ๋ถ„๊ฐ’ ์žฌ์„ค์ •(x.grad = None)
      gx.backward()
      print(x.grad) # x.grad = dy/d^2x

๊ณ ์ฐจ ๋ฏธ๋ถ„ ๊ตฌํ˜„

class Variable:
    def __init__(self, data, name=None):
        if data is not None:
            if not isinstance(data, np.ndarray):
                raise TypeError('{}์€(๋Š”) ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.'.format(type(data)))

        self.data = data
        self.name = name
        self.grad = None
        self.creator = None
        self.generation = 0

    def set_creator(self, func):
        self.creator = func
        self.generation = func.generation + 1
    
    def cleargrad(self):
      self.grad = None
    
    def backward(self, retain_grad=False, create_graph=False): 
        if self.grad is None:
            #self.grad = np.ones_like(self.data)
            self.grad = Variable(np.ones_like(self.data)) # grad๋ฅผ Variable ์ธ์Šคํ„ด์Šค๋กœ ๋งŒ๋“ค๊ธฐ

        funcs = []
        seen_set = set()
        def add_func(f):
            if f not in seen_set:
                funcs.append(f)
                seen_set.add(f)
                funcs.sort(key=lambda x : x.generation)

        add_func(self.creator)
        while funcs:
            f = funcs.pop()
            gys = [output().grad for output in f.outputs]

            with using_config('enable_backprop', create_graph): # ์—ญ์ „ํŒŒ ํ™œ์„ฑ/๋น„ํ™œ์„ฑ ๋ชจ๋“œ ์ „ํ™˜
                gxs = f.backward(*gys)
                if not isinstance(gxs, tuple):
                    gxs = (gxs,)
            
                for x, gx in zip(f.inputs, gxs):
                    if x.grad is None:
                        x.grad = gx
                    else:
                        x.grad = x.grad + gx
                    if x.creater is not None:
                        add_func(x.creator)
            
            if not retain_grad:
                for y in f.outputs:
                    y().grad = None

# ===========================================================
class Config:
    enable_backprop = True

@contextlib.contextmanager
def using_config(name, value):
    old_value = getattr(Config, name)
    setattr(Config, name, value)
    try:
        yield
    finally:
        setattr(Config, name, old_value)

# ===========================================================
def as_array(x):
    if np.isscalar(x):
        return np.array(x)
    return x

def as_variable(obj):
    if isinstance(obj, Variable):
        return obj
    return Variable(obj)

class Function:
    def __call__(self, *inputs):
        inputs = [as_variable(x) for x in inputs]

        xs = [x.data for x in inputs]
        ys = self.forward(*xs)
        if not isinstance(ys, tuple):
            ys = (ys,)
        outputs = [Variable(as_array(y)) for y in ys]

        if Config.enable_backprop: # ์—ญ์ „ํŒŒ ํ™œ์„ฑ/๋น„ํ™œ์„ฑ ๋ชจ๋“œ ์ œ์–ด ๋ถ€๋ถ„
            for output in outputs:
                output.set_creator(self) # ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ ์—ฐ๊ฒฐ
            self.inputs = inputs
            self.outputs = [weakref.ref(output) for output in outputs]
            self.generation = max([x.generation for x in inputs])
        
        return outputs if len(outputs) > 1 else outputs[0]

    def forward(self, xs):
        raise NotImplementedError()
    
    def backward(self, gys):
        raise NotImplementedError()

# === ์—ฐ์‚ฐ์ž ์˜ค๋ฒ„๋กœ๋“œ =======================================
class Mul(Function):
    def forward(self, x0, x1):
        y = x0 * x1
        return y
    def backward(self, gy):
        #x0, x1 = self.inputs[0].data, self.inputs[1].data
        x0, x1 = self.inputs # Variable ์ธ์Šคํ„ด์Šค๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ์—ฐ๊ฒฐ ๋งŒ๋“ค๊ธฐ
        return gy * x1, gy * x0

5. ํ•จ์ˆ˜ ๊ตฌํ˜„

6. ๋ถ€๋ก

๋‰ดํ„ด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„์™€ ๋Œ€์•ˆ

  • ๋‹ค๋ณ€์ˆ˜ ํ•จ์ˆ˜์˜ ๋‰ดํ„ด ๋ฐฉ๋ฒ• (์ฆ‰, ์ผ ๋•Œ, ์— ๋Œ€ํ•œ ๋‰ดํ„ด ๋ฐฉ๋ฒ•)
    • : ๊ธฐ์šธ๊ธฐ(gradient), ์˜ ๊ฐ ์›์†Œ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„
    • : ํ—ค์„ธ ํ–‰๋ ฌ(Hessian matrix)
    • ๋ฅผ ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐฑ์‹ ํ•˜๊ณ , ๊ทธ ์ง„ํ–‰ ๊ฑฐ๋ฆฌ๋ฅผ ํ—ค์„ธ ํ–‰๋ ฌ์˜ ์—ญํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์กฐ์ •ํ•œ๋‹ค.
  • ํ•œ๊ณ„ : ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋ฉด ํ—ค์„ธ ํ–‰๋ ฌ์˜ ์—ญํ–‰๋ ฌ ๊ณ„์‚ฐ์— ๋„ˆ๋ฌด ๋งŽ์€ ์ž์›์ด ์†Œ๋ชจ๋œ๋‹ค.
    • ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐœ๋ฉด ๋ฉ”๋ชจ๋ฆฌ ๋งŒํผ ์‚ฌ์šฉ, ์˜ ์—ญํ–‰๋ ฌ ๊ณ„์‚ฐ์—๋Š” ๋งŒํผ ์‚ฌ์šฉ
  • ๋Œ€์•ˆ
    • ์ค€ ๋‰ดํ„ด ๋ฐฉ๋ฒ•(Quasi-Newton Method, QNM) : ๋‰ดํ„ด ๋ฐฉ๋ฒ• ์ค‘ ํ—ค์„ธ ํ–‰๋ ฌ์˜ ์—ญํ–‰๋ ฌ์„ ๊ทผ์‚ฌํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ์ด์นญ
      • ex) L-BFGS ๋“ฑ
    • ๊ธฐ์šธ๊ธฐ๋งŒ์„ ์‚ฌ์šฉํ•œ ์ตœ์ ํ™”
      • ex) SGD, Momentum, Adam ๋“ฑ

double backprop์˜ ๋‹ค์–‘ํ•œ ์šฉ๋„

  • double backpropagation : ์—ญ์ „ํŒŒ๋ฅผ ์ˆ˜ํ–‰ํ•œ ๊ณ„์‚ฐ์— ๋Œ€ํ•ด ๋˜ ๋‹ค์‹œ ์—ญ์ „ํŒŒํ•˜๋Š” ๊ฒƒ
  • ์šฉ๋„
    • ๊ณ ์ฐจ ๋ฏธ๋ถ„
    • ๋ฏธ๋ถ„์ด ํฌํ•จ๋œ ์‹์—์„œ์˜ ๋ฏธ๋ถ„
    • ํ—ค์„ธ ํ–‰๋ ฌ๊ณผ ๋ฒกํ„ฐ์˜ ๊ณฑ(Hessian-vector product)
      • ์˜ค๋ฅธ์ชฝ์ฒ˜๋Ÿผ, ๋ฒกํ„ฐ์˜ ๋‚ด์ ์„ ๋จผ์ € ๊ตฌํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๋‹ค์‹œ ํ•œ ๋ฒˆ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•จ์œผ๋กœ์จ, ํ•ด์„ธ ํ–‰๋ ฌ์„ ๋งŒ๋“ค์ง€ ์•Š๊ณ ๋„ ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
      • ์˜ˆ์‹œ : TRPO(Trust Region Policy Optimization)์—์„œ๋Š” ํ—ค์„ธ ํ–‰๋ ฌ๊ณผ ๋ฒกํ„ฐ์˜ ๊ณฑ์„ ๊ตฌํ•  ๋•Œ double backprop์„ ์‚ฌ์šฉํ•œ๋‹ค.
        • image
        • image