## Mathematics Behind Multinomial Logistic Regression

### Formulating the Problem
Suppose there are 3 classes in the response variable. Let us denote the classes by Class 1, Class 2 and Class 3.  

We set,
$$
P(Y = 1|\vec{x}_i) = \frac{e^{\vec{x}_i^T.\vec{\beta}_1}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}, \ \ 
P(Y = 2|\vec{x}_i) = \frac{e^{\vec{x}_i^T.\vec{\beta}_2}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}} 
$$


As a result,
$$
\begin{aligned}
P(Y = 3|\vec{x}_i) 
 &= 1 \ - \ P(Y = 1|\vec{x}_i) - P(Y = 2|\vec{x}_i) \\
 \\
 &= 1 - \frac{e^{\vec{x}_i^T.\vec{\beta}_1}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}-\frac{e^{\vec{x}_i^T.\vec{\beta}_2}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}} \\
 \\
 &= \frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}
\end{aligned}
$$

### Likelihood and Log-Likelihood
So, likelihood,
$$\begin{aligned}
L &= \prod_{i=1}^n \left[P(y_i = 1 \ \ | \ \ \vec{x}_i)^{I(y_i=1)}\right].\left[P(y_i = 2 \ \ | \ \ \vec{x}_i)^{I(y_i=2)}\right].\left[P(y_i =  3 \ \ | \ \ \vec{x}_i)^{I(y_i=3)}\right] \\
 &= \prod_{i=1}^n \left[\frac{e^{\vec{x}_i^T.\vec{\beta}_1}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}\right]^{I(y_i=1)}\left[\frac{e^{\vec{x}_i^T.\vec{\beta}_2}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}\right]^{I(y_i=2)}\left[\frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}\right]^{I(y_i=3)} \\
 &= \prod_{i=1}^n \ e^{\vec{x}_i^T.\vec{\beta}_1.I(y_i=1)} \ . \ e^{\vec{x}_i^T.\vec{\beta}_2.I(y_i=2)}.\left[\frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}\right]^{[I(y_i=1) + I(y_i=2) + I(y_i=3)]} \\
 &= \prod_{i=1}^n \ e^{\vec{x}_i^T.\vec{\beta}_1.I(y_i=1)} \ . \ e^{\vec{x}_i^T.\vec{\beta}_2.I(y_i=2)}.\frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}
\end{aligned}
$$


Last equality holds as $I(y_i=1) + I(y_i=2) + I(y_i=3) = 1$.

So,
$$\begin{aligned}
log L &= log \ \prod_{i=1}^n \ e^{\vec{x}_i^T.\vec{\beta}_1.I(y_i=1)} \ . \ e^{\vec{x}_i^T.\vec{\beta}_2.I(y_i=2)}.\frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}} \\
 &= \sum_{i=1}^nlog\left[e^{\vec{x}_i^T.\vec{\beta}_1.I(y_i=1)}\right] + log\left[e^{\vec{x}_i^T.\vec{\beta}_2.I(y_i=2)}\right] - log\left[1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}\right] \\
 &= \sum_{i=1}^n I(y_i=1).\vec{x}_i^T.\vec{\beta}_1 + \sum_{i=1}^nI(y_i=2).\vec{x}_i^T.\vec{\beta}_2 - \sum_{i=1}^nlog\left[1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}\right]
\end{aligned}$$

### Deriving the First Derivative
First derivative,
$$\begin{aligned}\frac{\partial}{\partial \vec{\beta}}log L &= \begin{pmatrix}\frac{\partial}{\partial \vec{\beta}_1}log L \\ 
\frac{\partial}{\partial \vec{\beta}_2}log L \end{pmatrix}
\end{aligned}$$

Now,
$$\begin{aligned}
\frac{\partial}{\partial \vec{\beta_1}}log L &= \frac{\partial}{\partial \vec{\beta_1}} \left[\sum_{i=1}^n I(y_i=1).\vec{x}_i^T.\vec{\beta}_1 + \sum_{i=1}^nI(y_i=2).\vec{x}_i^T.\vec{\beta}_2 - \sum_{i=1}^nlog\left[1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}\right]\right] \\ 
 &= \sum_{i=1}^n I(y_i=1).\vec{x}_i \ + \ 0  \ - \ \sum_{i=1}^n\frac{1}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}.e^{\vec{x}_i^T.\vec{\beta}_1}.\vec{x}_i \\ 
 &= \sum_{i=1}^n I(y_i=1) \ . \ \vec{x}_i \ \ - \ \ \sum_{i=1}^n P(y_i=1|\vec{x}_i) \ . \ \vec{x}_i \\ 
 &= \sum_{i=1}^n \left[ \ \ I(y_i=1) - P(y_i=1|\vec{x}_i) \ \ \right].\vec{x}_i
\end{aligned}
$$

Similarly,
$$\frac{\partial}{\partial \vec{\beta_2}}log L = \sum_{i=1}^n \left[ \ \ I(y_i=2) - P(y_i=2|\vec{x}_i) \ \ \right].\vec{x}_i$$

So,
$$\begin{aligned}
\frac{\partial}{\partial \vec{\beta}}log L &= \begin{pmatrix}\frac{\partial}{\partial \vec{\beta}_1}log L \\ 
\frac{\partial}{\partial \vec{\beta}_2}log L \end{pmatrix} \\ 
 \\
 &= \begin{bmatrix}\sum_{i=1}^n \left[ \ \ I(y_i=1) - P(y_i=1|\vec{x}_i) \ \ \right].\vec{x}_i \\ 
 \sum_{i=1}^n \left[ \ \ I(y_i=2) - P(y_i=2|\vec{x}_i) \ \ \right].\vec{x}_i\end{bmatrix} \\ 
 \\ 
 &= \sum_{i=1}^n \begin{bmatrix} \left[ \ \ I(y_i=1) - P(y_i=1|\vec{x}_i) \ \ \right].\vec{x}_i \\ 
 \left[ \ \ I(y_i=2) - P(y_i=2|\vec{x}_i) \ \ \right].\vec{x}_i\end{bmatrix} \\ 
 \\ 
 &= \sum_{i=1}^n \begin{bmatrix}\ \ I(y_i=1) - P(y_i=1|\vec{x}_i) \ \  \\ 
 \ \ I(y_i=2) - P(y_i=2|\vec{x}_i) \ \ \end{bmatrix} \otimes \ \vec{x}_i
\end{aligned}$$

Here, $\ \ \ \vec{\beta}^T = \left(\vec{\beta}_1^T \ \ \ \vec{\beta}_2^T\right)$ and $\ \ \otimes$ is **Kronecker product**.

### Deriving the Hessian Matrix
Hessian matrix, 
$$\begin{aligned}
H &= \frac{\partial}{\partial \vec{\beta}}\left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T \\
 &= \begin{bmatrix}\frac{\partial}{\partial \vec{\beta}_1} \left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T \\ 
    \frac{\partial}{\partial \vec{\beta}_2} \left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T
    \end{bmatrix}
\end{aligned}
$$
Now,
$$\begin{aligned}
\frac{\partial}{\partial \vec{\beta}_1} \left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T &= \frac{\partial}{\partial \vec{\beta}_1} \sum_{i=1}^n\left[ \ I(y_i=1) - P(y_i=1|\vec{x}_i) \ \ \ \ I(y_i=2) - P(y_i=2|\vec{x}_i) \ \right] \otimes \vec{x}_i^T \\ 
 &= \sum_{i=1}^n\left[ \ \frac{\partial}{\partial \vec{\beta}_1}\left[I(y_i=1) - P(y_i=1|\vec{x}_i)\right] \ \ \ \ \ \frac{\partial}{\partial \vec{\beta}_1}\left[I(y_i=2) - P(y_i=2|\vec{x}_i)\right] \ \right] \otimes \vec{x}_i^T \\
 &= \sum_{i=1}^n \left[ \ \ 0 - \frac{\partial}{\partial \vec{\beta}_1}P(y_i=1|\vec{x}_i) \ \ \ \  0 - \frac{\partial}{\partial \vec{\beta}_1}P(y_i=2|\vec{x}_i) \ \ \right] \otimes \vec{x}_i^T \\ 
 &= -\sum_{i=1}^n \left[ \ \ \frac{\partial}{\partial \vec{\beta}_1}P(y_i=1|\vec{x}_i) \ \ \ \  \frac{\partial}{\partial \vec{\beta}_1}P(y_i=2|\vec{x}_i) \ \ \right] \otimes \vec{x}_i^T
 \end{aligned}$$

 Now,
$$\begin{aligned}
\frac{\partial}{\partial \beta_{1j}} P(y_i=1|\vec{x}_i) &= \frac{\partial}{\partial \beta_{1j}} \frac{e^{\vec{x}_i^T.\vec{\beta}_1}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}} \\ 
 \\
 &= \frac{(1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}).e^{\vec{x}_i^T.\vec{\beta}_1}.x_{ij} - (e^{\vec{x}_i^T.\vec{\beta}_1})^2.x_{ij}}{(1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2})^2} \\ 
 \\
 &= \frac{e^{\vec{x}_i^T.\vec{\beta}_1}.x_{ij}}{(1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2})^2}.[1+e^{\vec{x}_i^T.\vec{\beta}_2}] \\ 
 \\
 &= P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).x_{ij}
\end{aligned}$$

So,
$$\begin{aligned}
\frac{\partial}{\partial \vec{\beta}_1}P(y_i=1|\vec{x}_i) &= 
\begin{pmatrix}
\frac{\partial}{\partial \beta_{11}} P(y_i=1|\vec{x}_i) \\ 
\frac{\partial}{\partial \beta_{12}} P(y_i=1|\vec{x}_i) \\ 
. \\ 
. \\ 
. \\ 
\frac{\partial}{\partial \beta_{1p}} P(y_i=1|\vec{x}_i) \\ 
\end{pmatrix} \\ 
 \\ 
 &= \begin{pmatrix}P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).x_{i1} \\ 
 P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).x_{i2} \\ 
 . \\ 
 . \\ 
 . \\ 
 P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).x_{ip} 
 \end{pmatrix} \\ 
 \\ 
 &= P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).\begin{pmatrix}x_{i1} \\ 
 x_{i2} \\ 
 x_{i3} \\ 
 . \\ 
 . \\ 
 . \\ 
 x_{ip}
 \end{pmatrix} \\ 
 \\ 
 &= P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).\vec{x}_i
\end{aligned}
$$

$$P(y_i=2|\vec{x}_i) = \frac{e^{\vec{x}_i^T.\vec{\beta}_2}}{1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2}}$$

and,
$$\begin{aligned}
\frac{\partial}{\partial \beta_{1j}} P(y_i=2|\vec{x}_i) &= - \frac{e^{\vec{x}_i^T.\vec{\beta}_2}.e^{\vec{x}_i^T.\vec{\beta}_1}.x_{ij}}{(1+e^{\vec{x}_i^T.\vec{\beta}_1}+e^{\vec{x}_i^T.\vec{\beta}_2})^2} \\ 
 &= - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i).x_{ij} \\ 
\end{aligned}$$

So,
$$\frac{\partial}{\partial \vec{\beta}_1}P(y_i=2|\vec{x}_i) = - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i).\vec{x}_i$$

So,
$$\begin{aligned}
\frac{\partial}{\partial \vec{\beta}_1}\left[\frac{\partial}{\partial \vec{\beta}}log L\right]^T &= -\sum_{i=1}^n \left[ \ \ \frac{\partial}{\partial \vec{\beta}_1}P(y_i=1|\vec{x}_i) \ \ \ \  \frac{\partial}{\partial \vec{\beta}_1}P(y_i=2|\vec{x}_i) \ \ \right] \otimes \vec{x}_i^T \\ 
 &= -\sum_{i=1}^n \left[P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i).\vec{x}_i \ \ \ \  - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i).\vec{x}_i\right] \otimes \vec{x}_i^T \\ 
 &= -\sum_{i=1}^n\left[P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i)\right]\otimes\vec{x}_i\otimes\vec{x}_i^T \\ 
 &= -\sum_{i=1}^n\left[P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i)\right]\otimes\vec{x}_i.\vec{x}_i^T\end{aligned}$$

 Similarly,
 $$
 \begin{aligned}
 \frac{\partial}{\partial \vec{\beta}_2}\left[\frac{\partial}{\partial \vec{\beta}}log L\right]^T = -\sum_{i=1}^n\left[-P(y_i=2|\vec{x}_i)P(y_i=1|\vec{x}_i) \ \ \ \ P(y_i=2|\vec{x}_i)P(y_i\neq2|\vec{x}_i)\right]\otimes\vec{x}_i.\vec{x}_i^T\end{aligned}$$

So,
 $$\begin{aligned}
 H &= \begin{bmatrix}\frac{\partial}{\partial \vec{\beta}_1} \left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T \\ 
    \frac{\partial}{\partial \vec{\beta}_2} \left[ \frac{\partial}{\partial \vec{\beta}}log \ L \right]^T
    \end{bmatrix} \\ 
  \\ 
  &= \begin{bmatrix}-\sum_{i=1}^n\left[P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i)\right]\otimes\vec{x}_i.\vec{x}_i^T \\ 
  -\sum_{i=1}^n\left[-P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) \ \ \ \ \ \ \ P(y_i=2|\vec{x}_i).P(y_i\neq2|\vec{x}_i)\right]\otimes\vec{x}_i.\vec{x}_i^T\end{bmatrix} \\ 
  \\ 
  &= -\sum_{i=1}^n\begin{bmatrix}P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
  -P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) \ \ \ \ \ \ P(y_i=2|\vec{x}_i).P(y_i\neq2|\vec{x}_i)
  \end{bmatrix} \otimes \vec{x}_i.\vec{x}_i^T
 \end{aligned}
 $$

To have unique maxima to the problem, $H$ has to be negative semi-definite, or $-H$ positive semi-definite. We can prove it in the following way.
1.    Sum of positive definite matrices is positive definite. -H is sum of matrices. So, if we can show that the term inside sum is positive definite then -H is positive definite.
2.    Kronecker Product of two positive semi-definite matrices is positive semi-definite. It follows from the following theory:  

    **Theory:** Let, $\mathbf{A_{mxm}}$ has eigen vectors **$\mathbf{x_i}$** with corresponding eigen value $\mathbf{\lambda_i, i = 1, \dots, m}$;  
    $\mathbf{B_{nxn}}$ has eighen vectors $\mathbf{y_j}$ with corresponding eigen value $\mathbf{\mu_j, j = 1, \dots n}$.  
    Then, eigen vectors of $\mathbf{A\otimes B}$ is $\mathbf{x_i \otimes y_j}$ with corresponding eigen value $\mathbf{\lambda_i.\mu_j}$.  
    
    **Proof:** Here, $Ax_i = \lambda_i x_i$ and $By_j = \mu_j y_j$. Now by the property (standard easily available on internet) of Kronecker Product $(A\otimes B)(x_i\otimes y_j) = (Ax_i)\otimes (By_j)=(\lambda_i x_i)\otimes (\mu_j y_j) = \lambda_i\mu_j(x_i\otimes y_j)$. So, $x_i\otimes y_j$ is the eigen vector corresponding to eigenvector $\lambda_i\mu_j$

    Now, if **A** and **B** pd then both $\lambda_i$ and $\mu_j$ are non-negative. So is $\lambda_i\mu_j$, showing $A\otimes B$ is positive semi-definite.

    So if we can show that both the operands of the Kronecker Product is positve semi-definite, then -H will be postive semi-definite.
3.    $\vec{x}_i.\vec{x}_i^T$ is positive semi-definite. We have to prove $\begin{bmatrix}P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
  -P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) \ \ \ \ \ \ P(y_i=2|\vec{x}_i).P(y_i\neq2|\vec{x}_i)
  \end{bmatrix}$ is positive semi-definite.

  Now let,
  $$\begin{aligned}
   P &= \begin{bmatrix}
   P(y_i=1|\vec{x}_i).P(y_i\neq1|\vec{x}_i) \ \ \ \ \ - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
  -P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) \ \ \ \ \ \ P(y_i=2|\vec{x}_i).P(y_i\neq2|\vec{x}_i) 
  \end{bmatrix} \\ 
   \\ 
   &= \begin{bmatrix}
   P(y_i=1|\vec{x}_i).[1-P(y_i=1|\vec{x}_i)] & - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
  -P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) & P(y_i=2|\vec{x}_i).[1-P(y_i=2|\vec{x}_i)]
  \end{bmatrix} \\ 
   \\
   &= \begin{bmatrix}
   P(y_i=1|\vec{x}_i)-P^2(y_i=1|\vec{x}_i) & - P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
  -P(y_i=2|\vec{x}_i).P(y_i=1|\vec{x}_i) & P(y_i=2|\vec{x}_i)-P^2(y_i=2|\vec{x}_i)]
  \end{bmatrix} \\ 
   \\
   &= \begin{bmatrix}
   P(y_i=1|\vec{x}_i) & 0 \\ 
   0 & P(y_i=2|\vec{x}_i)
   \end{bmatrix} - 
   \begin{bmatrix}
   P^2(y_i=1|\vec{x}_i) & P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) \\ 
   P(y_i=1|\vec{x}_i).P(y_i=2|\vec{x}_i) & P^2(y_i=2|\vec{x}_i)\end{bmatrix} \\ 
   \\ 
   &= \begin{bmatrix}
   P(y_i=1|\vec{x}_i) & 0 \\ 
   0 & P(y_i=2|\vec{x}_i)
   \end{bmatrix} - 
   \begin{bmatrix}
   P(y_i=1|\vec{x}_i) \\ 
   P(y_i=2|\vec{x}_i)\end{bmatrix}.
   \begin{bmatrix}
   P(y_i=1|\vec{x}_i) \\ 
   P(y_i=2|\vec{x}_i)\end{bmatrix}^T
   \end{aligned}$$

   So,
   $$\begin{aligned}
   &  \vec{z}^T.P.\vec{z} \\ 
   &= \vec{z}^T.\begin{bmatrix}
   P(y_i=1|\vec{x}_i) & 0 \\ 
   0 & P(y_i=2|\vec{x}_i)
   \end{bmatrix}.\vec{z} - \vec{z}^T.
   \begin{bmatrix}
   P(y_i=1|\vec{x}_i) \\ 
   P(y_i=2|\vec{x}_i)\end{bmatrix}.
   \begin{bmatrix}
   P(y_i=1|\vec{x}_i) \\ 
   P(y_i=2|\vec{x}_i)\end{bmatrix}^T.\vec{z} \\ 
   \\
   &= \sum_{j=1}^2P(y_i=j|\vec{x}_i)z_j^2 \ \ - \ \ \left(\sum_{j=1}^2P(y_i=j|\vec{x}_i)z_j\right)^2  \end{aligned}$$

Now, by **Cauchy-Schwarz Inequality**, $$\left(\sum_{j=1}^n u_jv_j\right)^2 \ \ \leq \ \ \sum_{j=1}^n u_j^2.\sum_{j=1}^nv_j^2$$

If we set, n =2, $u_j=\sqrt{P(y_i=j|\vec{x}_i)}$ also as, $P(y_i=j|\vec{x}_i) \geq 0$, we can set $v_j=\sqrt{P(y_i=j|\vec{x}_i)}.z_j$; we get,
$$\begin{aligned}
 &\left(\sum_{j=1}^2 \sqrt{P(y_i=j|\vec{x}_i)}.\sqrt{P(y_i=j|\vec{x}_i)}.z_j\right)^2 \leq \sum_{j=1}^2P(y_i=j|\vec{x}_i).\sum_{j=1}^2P(y_i=j|\vec{x}_i).z_j^2 \\ 
 \implies & \left(\sum_{j=1}^2 P(y_i=j|\vec{x}_i).z_j\right)^2 \leq \sum_{j=1}^2P(y_i=j|\vec{x}_i).z_j^2
\end{aligned}$$

As, $\sum_{j=1}^2P(y_i=j|\vec{x}_i) = 1$.

So, $\vec{z}^T.P.\vec{z} \geq 0$. So, P is positive semi-definite.

Using the above steps we can prove that -H is negative semi-definite, so H is positive semi-definite.