### Problem 1 (50 points) 

Vapor-liquid equilibria data are correlated using two adjustable parameters $A_{12}$ and $A_{21}$ per binary
mixture. For low pressures, the equilibrium relation can be formulated as:

$$
\begin{aligned}
p = & x_1\exp\left(A_{12}\left(\frac{A_{21}x_2}{A_{12}x_1+A_{21}x_2}\right)^2\right)p_{water}^{sat}\\
& + x_2\exp\left(A_{21}\left(\frac{A_{12}x_1}{A_{12}x_1+A_{21}x_2}\right)^2\right)p_{1,4 dioxane}^{sat}.
\end{aligned}
$$

Here the saturation pressures are given by the Antoine equation

$$
\log_{10}(p^{sat}) = a_1 - \frac{a_2}{T + a_3},
$$

where $T = 20$($^{\circ}{\rm C}$) and $a_{1,2,3}$ for a water - 1,4 dioxane
system is given below.

|             | $a_1$     | $a_2$      | $a_3$     |
|:------------|:--------|:---------|:--------|
| Water       | 8.07131 | 1730.63  | 233.426 |
| 1,4 dioxane | 7.43155 | 1554.679 | 240.337 |


The following table lists the measured data. Recall that in a binary system $x_1 + x_2 = 1$.

|$x_1$ | 0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 |
|:-----|:--------|:---------|:--------|:-----|:-----|:-----|:-----|:-----|:-----|:-----|:-----|
|$p$| 28.1 | 34.4 | 36.7 | 36.9 | 36.8 | 36.7 | 36.5 | 35.4 | 32.9 | 27.7 | 17.5 |

Estimate $A_{12}$ and $A_{21}$ using data from the above table: 

1. Formulate the least square problem; 
2. Since the model is nonlinear, the problem does not have an analytical solution. Therefore, solve it using the gradient descent or Newton's method implemented in HW1; 
3. Compare your optimized model with the data. Does your model fit well with the data?

---

### Write about the least square formulation

In [123]:
import torch as t
from torch.autograd import Variable
import numpy as np


# Data Points
xi = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

# Binary, calculate x2 from x1
var = lambda x :  [x, 1-x]

# Define a variable, make sure requires_grad=True so that PyTorch can take gradient with respect to this variable
# make (x1, x2) from the given data points
x = t.from_numpy(np.array(list((map(var, xi)))))

# Define variable to optimize
a = Variable(t.tensor([1.0 , 1.0 ]), requires_grad=True)

# calculate psat values for water and 1,4-dioxane
def psat(liq):
    if liq == "water":
        a1 = 8.07131
        a2 = 1730.63
        a3 = 233.426
    else:
        a1 = 7.43155
        a2 = 1554.679
        a3 = 240.337
    
    return (10**(a1 - (a2/(20+a3))))

# Define loss as least square sum 
def lse_loss(x, y, a):
    A = a.clone()
    pi = lambda x, a: x[0]*t.exp(a[0]*(a[1]*x[1])/(a[0]*x[0]+a[1]*x[1])**2)*psat("water") \
        + x[1]*t.exp(a[1]*(a[0]*x[0])/(a[0]*x[0]+a[1]*x[1])**2)*psat("dioxane") 
    
    # stack the estimated value
    p = t.stack([pi(x[i], A) for i in range(x.size()[0])])
    
    return t.mm((p - y).reshape(1, -1), 
                (p - y).reshape(1, -1).t())


def lin_search(f, grad, x, a):
    tilt = 0.8
    alpha = 1
    d = -grad
    x = x[0]
    while f(x, a+alpha*d) > (f(x,a)+tilt*alpha*np.matmul(grad.t(), d)):
        alpha *= tilt
    return alpha
    


    


# measured data : y
y = t.tensor([28.1, 34.4, 36.7, 36.9, 36.8, 36.7, 36.5, 35.4, 32.9, 27.7, 17.5])


# Equlibrium pressure formula
pi = lambda x, a: x[0]*t.exp(a[0]*(a[1]*x[1])/(a[0]*x[0]+a[1]*x[1])**2)*psat("water") \
        + x[1]*t.exp(a[1]*(a[0]*x[0])/(a[0]*x[0]+a[1]*x[1])**2)*psat("dioxane") 

# Fix the step size
alpha = 0.001


# Start gradient descent
for i in range(80):  # TODO: change the termination criterion
    loss = lse_loss(x, y, a)
    
    #loss = a[0]
    loss.backward(retain_graph=True)
    
    # no_grad() specifies that the operations within this context are not part of the computational graph, i.e., we don't need the gradient descent algorithm itself to be differentiable with respect to x
    with t.no_grad():
        #t.autograd.set_detect_anomaly(True)
        #alpha = lin_search(pi, a.grad, x, a)
        a -= alpha * a.grad
        # need to clear the gradient at every step, or otherwise it will accumulate...
        #print(loss.data.numpy())
        a.grad.zero_()
            
print(a.data.numpy())
print(loss.data.numpy())


# Check the gradient. numpy() turns the variable from a PyTorch tensor to a numpy array.
#a.grad.numpy()

[0.8901894 1.1093879]
[[18.6297129]]


In [129]:
print("Estimated Value with A =", a.data.numpy())

print ('------')
print(t.stack([pi(x[i], a) for i in range(x.size()[0])]).detach().numpy())

Estimated Value with A = [0.8901894 1.1093879]
------
[28.82409953 31.90601618 34.44887909 36.37314567 37.57910457 37.9370817
 37.27215448 35.34008284 31.78898603 26.09748822 17.47325208]


In [None]:
## Newton's Method
alpha = 0.005


# Start gradient descent
for i in range(80):  # TODO: change the termination criterion
    loss = lse_loss(x, y, a)
    
    #loss = a[0]
    loss.backward(retain_graph=True)
    
    # no_grad() specifies that the operations within this context are not part of the computational graph, i.e., we don't need the gradient descent algorithm itself to be differentiable with respect to x
    with t.no_grad():
        #t.autograd.set_detect_anomaly(True)
        #alpha = lin_search(pi, a.grad, x, a)
        a -= alpha*t.mm(t.autograd.functional., grad(x))
        # need to clear the gradient at every step, or otherwise it will accumulate...
        print(loss.data.numpy())
        a.grad.zero_()

In [76]:
p = x[0]*t.exp(a[0]*(a[1]*x[1]))#/(a[0]*x[0]+a[1]*x[1])**2)*psat("water")

In [108]:
ans = t.stack([pi(x[i], a) for i in range(x.size()[0])])

In [109]:
ans - y

tensor([ 0.7241, -2.4940, -2.2511, -0.5269,  0.7791,  1.2371,  0.7722, -0.0599,
        -1.1110, -1.6025, -0.0267], dtype=torch.float64,
       grad_fn=<SubBackward0>)

In [None]:
# Here is a code for gradient descent without line search

import torch as t
from torch.autograd import Variable

x = Variable(t.tensor([1.0, 0.0]), requires_grad=True)

# Fix the step size
alpha = 0.01

# Start gradient descent
for i in range(1000):  # TODO: change the termination criterion
    loss = (x[0] - 1)**2 + (x[1] - 2)**2
    loss.backward()
    
    # no_grad() specifies that the operations within this context are not part of the computational graph, i.e., we don't need the gradient descent algorithm itself to be differentiable with respect to x
    with t.no_grad():
        x -= alpha * x.grad
        
        # need to clear the gradient at every step, or otherwise it will accumulate...
        x.grad.zero_()
        
print(x.data.numpy())
print(loss.data.numpy())


### Problem 2 (50 points) 

Solve the following problem using Bayesian Optimization:
$$
    \min_{x_1, x_2} \quad \left(4-2.1x_1^2 + \frac{x_1^4}{3}\right)x_1^2 + x_1x_2 + \left(-4 + 4x_2^2\right)x_2^2,
$$
for $x_1 \in [-3,3]$ and $x_2 \in [-2,2]$. A tutorial on Bayesian Optimization can be found [here](https://thuijskens.github.io/2016/12/29/bayesian-optimisation/).
