# Week 2-3: Logistic Regression, Differential Privacy Basics


We will consider the task of binary classification where we are given labeled data $\{(x_i , y_i )\}_{i \in[n]}$
with $x_i \in \mathbb{R}^d$ and $y_i \in {−1, 1}$, and the goal is to learn a classifier $f_\theta : \mathbb{R}^d \rightarrow \{−1, 1\}$ parameterized by $\theta \in \mathbb{R}^d$ . Given a new sample $x$, our predicted label would be $\hat{y} = f_\theta (x)$. One
way to learn such a classifier is by using the logistic model defined using the conditional
probabilty


$P(y = 1|x, θ) = \frac{e^{\theta^T x}}{1+ e^{\theta^T x}}$

a) For the above model, verify that checking P(y = 1|x) > P(y = −1|x) is equivalent
to checking θ >x > 0. Given this, what would be your strategy to estimate the label
once you know θ?

**Solution:**


$$\begin{align*}
P(y = -1|x, θ) &=  1 - P(y = 1|x, θ) \\
&= 1 -\frac{e^{\theta^T x}}{1+ e^{\theta^T x}} \\ 
&= \frac{1}{1+ e^{\theta^T x}}
\end{align*}$$

Then 
$$P(y = 1|x, θ) > P(y = -1|x, θ)$$
$$\frac{e^{\theta^T x}}{1+ e^{\theta^T x}} > \frac{1}{1+ e^{\theta^T x}}$$
$$e^{\theta^T x} > 1$$
$$\theta^T x >\ln{1}$$
$$\theta^T x > 0$$


This gives us a the intuition that for given sample $x$ we might assign a label $y=1$ if $\theta^T x > 0$ and $y=-1$ if $\theta^T x < 0$


b) Write down the log likelihood function for the above model and describe an algorithm
to find the maximum likelihood estimate for θ.

**Solution**

The maximum likelihood function is
$$\begin{align*}
\mathcal{L}(\theta) &= \prod_{y_i} P(y=1 |x,\theta) \cdot \prod_{y_i} (1 - P(y=1 |x,\theta)) \\
&= \prod P(y=1 |x,\theta)^{y_i} (1 - P(y=1 |x,\theta))^{(1-y_i)}
\end{align*}
$$

Finding the log-likelihood, we get:
$$
\begin{align*}
\mathcal{l}(\theta) &= \sum y_i \log{P(y=1 |x,\theta)} + (1-y_i) \log{1 - P(y=1 |x,\theta)} \\
&= \sum y_i \log{\frac{e^{\theta^T x}}{1+ e^{\theta^T x}}} + (1-y_i) \log{\frac{1}{1+e^{\theta^T x}}} \\ 
& = \sum y_i [\log{\frac{e^{\theta^T x}}{1+e^{\theta^T x}}} - \log{\frac{1}{1+e^{\theta^T x}}}] + \log{\frac{1}{1+e^{\theta^T x}}} \\
&= \sum y_i [\log{(e^{\theta^T x})} - \log{(1+e^{\theta^T x})} + \log{(1+e^{\theta^T x}}] + \log{\frac{1}{1+e^{\theta^T x}}} \\
&= \sum y_i \log{(e^{\theta^T x})} - \log{(1+e^{\theta^T x})} \\ 
&= \sum y_i \theta^T x - \log{(1+e^{\theta^T x})} 
\end{align*}$$
For maximizing the log-likelihood function we take the derivative and find a zero:
$$\begin{align*}
\nabla_{\theta} \mathcal{l}(\theta) &= \nabla_{\theta}[\sum y_i \theta^T x - \log{(1+e^{\theta^T x})}] \\ 
&= \sum y_i x_i - \frac{ e^{\theta^T x}}{1+e^{\theta^T x}} x_i 
\end{align*}
$$

We can find a root using a bisection method.

In [1]:
import numpy as np

In [53]:
def log_likelihood(theta, X, y):
    
    
    linear_combination = np.dot(X, theta)
    
    
    log_ll = np.sum(y * linear_combination - np.log(1 + np.exp(linear_combination)))
    
    return log_ll

In [142]:
def log_likelihood_derivative(theta, X, y):
    
    linear_combination = np.dot(X, theta)

    probabilities = 1 / (1 + np.exp(-linear_combination))
    
    derivative = np.dot(X, y - probabilities)

    return derivative

In [143]:
n = 1
d = 50
X = np.random.normal(loc=0, scale=np.sqrt(0.1), size=(d))

theta_star = np.random.rand(d)

linear_combination = X.dot(theta_star)

probabilities = 1 / (1 + np.exp(-linear_combination))

y = np.array((np.where(probabilities >= 0.5, 1, -1)))

In [144]:
log_likelihood_derivative(theta_star, X, y)

0.2295895308167729


array([-8.88501488e-03,  2.24605949e-01, -2.34908514e-02,  3.89568894e-02,
        2.80001272e-01,  1.34268372e-02,  8.93147388e-01,  3.24878405e-01,
        4.39213670e-01, -1.19095040e-01,  5.94148081e-01, -7.06566562e-02,
        4.64684129e-03, -1.28205363e-01, -3.78811218e-01,  3.59469793e-01,
        1.93390057e-02, -8.76323825e-02,  3.97683869e-01, -3.34196169e-01,
        6.31477698e-01, -2.64627420e-01, -3.93000809e-01, -8.63378317e-02,
       -2.23740179e-01, -9.47762131e-01,  6.41333339e-02,  3.01255089e-01,
        1.47534352e-01,  5.83242033e-01,  8.41582660e-01, -6.98405843e-02,
       -1.03600065e-01,  8.44082666e-01,  4.09991363e-02,  2.20946365e-01,
        5.24242301e-01,  1.63283460e-02, -2.89170247e-01, -3.67078266e-01,
        3.76752893e-01, -3.98665663e-01,  2.72135931e-01, -3.17630736e-01,
        4.31133793e-01,  2.83042354e-04, -1.19155683e-01, -1.73815129e-01,
        9.66818680e-02, -1.14075901e-02])

In [109]:
def newton_raphson(f, J, X, y, theta, tol=1e-6, max_iter=100):
    p0 = theta
    p = np.array(p0, dtype=float)
    for _ in range(max_iter):
        f_val = f(theta, X, y)
        if np.linalg.norm(f_val) < tol:
            return p
        J_val = J(theta, X, y)
        print(f_val)
        print(J_val)
        #p = p0 -
    raise ValueError("Newton-Raphson method did not converge within the maximum number of iterations.")

In [110]:

theta_initial_guess = np.zeros(d)

theta_mle = newton_raphson(log_likelihood, log_likelihood_derivative, X, y, theta_initial_guess)
theta_mle

-0.6931471805599453
[-3.28454961e-01  1.35528512e-01 -1.58931631e-01 -3.67106798e-02
  1.03649970e-01  2.77403379e-01  2.09863353e-01  2.74454249e-01
 -4.89795274e-02 -1.26909578e-02  3.63968966e-01 -9.00472656e-02
  1.68819319e-01  8.37535460e-02  2.75086988e-02  1.10428513e-01
  1.06463123e-01  9.83534578e-02  2.31693027e-01 -4.13813997e-02
  6.68936186e-02  6.29135816e-03 -1.13977795e-01 -2.82916907e-01
  1.64227995e-01  1.33900496e-01 -9.54105448e-02 -2.44654594e-02
  7.41351580e-02 -1.22009742e-01 -1.19612360e-01 -1.28278615e-01
 -5.32659356e-03  2.05140242e-01  2.11875483e-01 -7.92671106e-02
  2.94046867e-01  2.97147431e-01 -2.04338659e-01  2.92098281e-01
 -7.46006733e-02  9.04466814e-02 -1.09142878e-01 -1.99557590e-01
 -3.22927311e-02 -2.72845210e-01 -2.28742457e-05  1.10893204e-01
 -1.36076347e-01  1.48458131e-01]
-0.6931471805599453
[-3.28454961e-01  1.35528512e-01 -1.58931631e-01 -3.67106798e-02
  1.03649970e-01  2.77403379e-01  2.09863353e-01  2.74454249e-01
 -4.89795274e-02

ValueError: Newton-Raphson method did not converge within the maximum number of iterations.

IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed