I read a vert interesting blogs, [How we can make machine learning algorithm tunable](https://engraved.ghost.io/how-we-can-make-machine-learning-algorithms-tunable/), post Engraved by J. Degrave and I. Korshunova. 
Here, I summary the result and give my understanding.

# The Solve-The-Dual method

In [None]:
def loss(θ, λ, ε):
  return loss_1(θ) - λ*(ε - loss_2(θ))

loss_derivative = grad(loss)
ε = 0.3 
λ = solve_dual(ε)  # The crux

for gradient_step in range(200):
  gradient = loss_derivative(θ, λ, ε)
  θ = θ - 0.02 * gradient

This does not work, since this is just the origin linear trade-off between two losses.

# The Hard Constraint First Method

In [None]:
def constraint(θ, ε):
  return ε - loss_2(θ)

optimization_derivative = grad(loss_1)
constraint_derivative = grad(constraint)

ε = 0.7

for gradient_step in range(200):
  while constraint(θ, ε) < 0:
    # maximize until the constraint is positive again
    gradient = constraint_derivative(θ, ε)
    θ = θ + 0.02 * gradient
    
  gradient = optimization_derivative(θ)
  θ = θ - 0.02 * gradient

In each iteration of this method, firstly, $\theta$ is changed iteratively to make the constraint condition being
satisfied. So, actually, loss_1 will not effect until the constraint condition is satisfied.

The authors said:  
"Additionally, this method does not work that well when you want to use stochastic gradient descent rather than gradient descent. Since the constraint is defined on the average loss across all data, you do not want to enforce the hard constraint on a sample of your data where it is not satisfied yet, as long as it is satisfied in the general case. And this issue is hard to overcome."

# Basic Differential Multiplier Method

In [None]:
def lagrangian(θ, λ, ε):
 return loss_1(θ) - λ*(ε - loss_2(θ))

derivative = grad(lagrangian, (0,1))
ε = 0.7
λ = 0.0

for gradient_step in range(200):
  gradient_θ, gradient_λ = derivative(θ,λ,ε)
  θ = θ - 0.02 * gradient_θ  # Gradient descent
  λ = λ + gradient_λ  # Gradient ascent!
  if λ < 0:
    λ = 0

# Modified Differential Method of Multipliers

In [None]:
def lagrangian(θ, λ, ε):
 damp = 10 * stop_gradient(ε-loss_2(θ))
 return loss_1(θ) - (λ-damp) * (ε-loss_2(θ))

derivative = grad(lagrangian, (0,1))
ε = 0.7
λ = 0.0

for gradient_step in range(200):
  gradient_θ, gradient_λ = derivative(θ, λ, ε)
  θ = θ - 0.02 * gradient_θ
  λ = λ + gradient_λ
  if λ < 0:
    λ = 0