# Week 2-3: Logistic Regression, Differential Privacy Basics


We will consider the task of binary classification where we are given labeled data $\{(x_i , y_i )\}_{i \in[n]}$
with $x_i \in \mathbb{R}^d$ and $y_i \in {−1, 1}$, and the goal is to learn a classifier $f_\theta : \mathbb{R}^d \rightarrow \{−1, 1\}$ parameterized by $\theta \in \mathbb{R}^d$ . Given a new sample $x$, our predicted label would be $\hat{y} = f_\theta (x)$. One
way to learn such a classifier is by using the logistic model defined using the conditional
probabilty


$P(y = 1|x, θ) = \frac{e^{\theta^T x}}{1+ e^{\theta^T x}}$

a) For the above model, verify that checking P(y = 1|x) > P(y = −1|x) is equivalent
to checking θ >x > 0. Given this, what would be your strategy to estimate the label
once you know θ?

**Solution:**


$$\begin{align*}
P(y = -1|x, θ) &=  1 - P(y = 1|x, θ) \\
&= 1 -\frac{e^{\theta^T x}}{1+ e^{\theta^T x}} \\ 
&= \frac{1}{1+ e^{\theta^T x}}
\end{align*}$$

Then 
$$P(y = 1|x, θ) > P(y = -1|x, θ)$$
$$\frac{e^{\theta^T x}}{1+ e^{\theta^T x}} > \frac{1}{1+ e^{\theta^T x}}$$
$$e^{\theta^T x} > 1$$
$$\theta^T x >\ln{1}$$
$$\theta^T x > 0$$


This gives us a the intuition that for given sample $x$ we might assign a label $y=1$ if $\theta^T x > 0$ and $y=-1$ if $\theta^T x < 0$


b) Write down the log likelihood function for the above model and describe an algorithm
to find the maximum likelihood estimate for θ.

**Solution**

The maximum likelihood function is
$$\begin{align*}
\mathcal{L}(\theta) &= \prod_{y_i} P(y=1 |x,\theta) \cdot \prod_{y_i} (1 - P(y=1 |x,\theta)) \\
&= \prod P(y=1 |x,\theta)^{y_i} (1 - P(y=1 |x,\theta))^{(1-y_i)}
\end{align*}
$$

Finding the log-likelihood, we get:
$$
\begin{align*}
\mathcal{l}(\theta) &= \sum y_i \log{P(y=1 |x,\theta)} + (1-y_i) \log{1 - P(y=1 |x,\theta)} \\
&= \sum y_i \log{\frac{e^{\theta^T x}}{1+ e^{\theta^T x}}} + (1-y_i) \log{\frac{1}{1+e^{\theta^T x}}} \\ 
& = \sum y_i [\log{\frac{e^{\theta^T x}}{1+e^{\theta^T x}}} - \log{\frac{1}{1+e^{\theta^T x}}}] + \log{\frac{1}{1+e^{\theta^T x}}} \\
&= \sum y_i [\log{(e^{\theta^T x})} - \log{(1+e^{\theta^T x})} + \log{(1+e^{\theta^T x}}] + \log{\frac{1}{1+e^{\theta^T x}}} \\
&= \sum y_i \log{(e^{\theta^T x})} - \log{(1+e^{\theta^T x})} \\ 
&= \sum y_i \theta^T x - \log{(1+e^{\theta^T x})} 
\end{align*}$$
For maximizing the log-likelihood function we take the derivative and find a zero:
$$\begin{align*}
\nabla_{\theta} \mathcal{l}(\theta) &= \nabla_{\theta}[\sum y_i \theta^T x - \log{(1+e^{\theta^T x})}] \\ 
&= \sum y_i x_i - \frac{ e^{\theta^T x}}{1+e^{\theta^T x}} x_i 
\end{align*}
$$

We can find a root using a newton raphson method.

In [1]:
import numpy as np

In [157]:
def log_likelihood(theta, X, y):
    
    
    linear_combination = np.dot(X, theta)
    
    
    log_ll = np.sum(y * linear_combination - np.log(1 + np.exp(linear_combination)))
    
    return log_ll

In [158]:
def log_likelihood_derivative(theta, X, y):
    
    linear_combination = np.dot(X, theta)

    probabilities = 1 / (1 + np.exp(-linear_combination))
    
    derivative = np.dot(X, y - probabilities)

    return derivative

In [159]:
n = 1
d = 50
X = np.random.normal(loc=0, scale=np.sqrt(0.1), size=(d))

theta_star = np.random.rand(d)

linear_combination = X.dot(theta_star)

probabilities = 1 / (1 + np.exp(-linear_combination))

y = np.array((np.where(probabilities >= 0.5, 1, -1)))

In [163]:
theta_star

array([7.21976832e-01, 4.23757145e-01, 9.75226835e-01, 9.64227763e-01,
       1.30489622e-01, 7.28939573e-01, 4.70736645e-01, 9.34978389e-01,
       7.42560143e-01, 7.21370679e-02, 7.01804745e-01, 8.51162958e-02,
       4.85854510e-01, 8.61446580e-01, 4.80594813e-01, 8.60402519e-01,
       1.70401160e-01, 8.41748787e-02, 2.93764765e-01, 9.51097056e-01,
       3.82254151e-01, 7.48721538e-02, 3.16105800e-01, 4.28513587e-01,
       6.64364639e-01, 4.95115432e-01, 4.44202765e-01, 1.60771795e-01,
       6.19324042e-01, 5.02171786e-01, 4.91717277e-01, 7.23189401e-01,
       2.76368204e-01, 7.53938236e-01, 8.99719038e-01, 3.12210528e-01,
       6.59394038e-01, 3.58514879e-01, 1.23134678e-01, 9.90829113e-02,
       1.31259072e-01, 1.61318974e-01, 2.90815208e-01, 5.17993698e-01,
       8.12507734e-02, 4.54659952e-05, 6.81215874e-02, 5.06304393e-01,
       5.45531048e-01, 7.87865690e-01])

In [172]:
def newton_raphson(f, J, X, y, theta, tol=1e-6, max_iter=1000):
    p0 = theta
    p = np.array(p0, dtype=float)
    for _ in range(max_iter):
        f_val = f(theta, X, y)
        if np.linalg.norm(f_val) < tol:
            return p
        J_val = J(theta, X, y)
        p -= f_val / J_val
        #print(p)
    raise ValueError("Newton-Raphson method did not converge within the maximum number of iterations.")

In [173]:

theta_initial_guess = theta_star

theta_mle = newton_raphson(log_likelihood, log_likelihood_derivative, X, y, theta_initial_guess)
theta_mle

ValueError: Newton-Raphson method did not converge within the maximum number of iterations.

## 2. Differential Privacy Basics