# MSV / SS 2023 - Übung 4

### 4.1 Binäre Sentiment-Klassifizierung mit Logistic Regression - Beispiel aus Jurafsky & Martin

In [1]:
import pandas as pd
import numpy as np
import math

In [2]:
text = (
    "It's hokey . There are virtually no surprises , "
    "and the writing is second-rate . So why was it so enjoyable ? "
    "For one thing , the cast is great . Another nice touch is the music . "
    "I was overcome with the urge to get off the couch and start dancing . "
    "It sucked me in , and it'll do the same to you ."
        )
text

"It's hokey . There are virtually no surprises , and the writing is second-rate . So why was it so enjoyable ? For one thing , the cast is great . Another nice touch is the music . I was overcome with the urge to get off the couch and start dancing . It sucked me in , and it'll do the same to you ."

In [3]:
sentiment_feat = [("# pos. Wörter", 3, 2.5),
                  ("# neg. Wörter", 2, -5),
              ("'no' im Dokument", 1, -1.2),
             ("# 1. oder 2. Person Pronomen", 3, 0.5),
             ("'!' im Dokument", 0, 2), 
             ("log(word count)", 4.19, 0.7)]  


sentiment_feat = pd.DataFrame(sentiment_feat, columns = ['features' , 'values', 'weights'])

print(sentiment_feat)

                       features  values  weights
0                 # pos. Wörter    3.00      2.5
1                 # neg. Wörter    2.00     -5.0
2              'no' im Dokument    1.00     -1.2
3  # 1. oder 2. Person Pronomen    3.00      0.5
4               '!' im Dokument    0.00      2.0
5               log(word count)    4.19      0.7


#### Bias

In [4]:
bias = 0.1

#### "Testing": Die Sigmoid als Klassifizierungsfunktion

In [6]:
def sigmoid(x):
    return 1/(1 + math.exp(-x))

#### P(+|x)

In [7]:
z = sum(sentiment_feat['values']*sentiment_feat['weights'])+bias
print('z = ',z)
print('P(+|x) = ',sigmoid(z))

z =  0.8330000000000001
P(+|x) =  0.6969888901292717


#### P(-|x)

In [8]:
print('P(-|x) = ',1-sigmoid(z))

P(-|x) =  0.3030111098707283


### 4.2 Training
#### Kreuzentropie als Verlustfunktion

Ziel: 
* kleinere Verlust, wenn das Modell richtig geantwortet hat 
* größerer Verlust, wenn das Model falsch verstanden hat

$L(\hat{y},y) = -[y \log \hat{y} + (1 - y) \log (1 - \hat{y})]$

mit $y = 1$: $L(\hat{y},y) = -  \log \hat{y}  $


mit $y = 0$: $L(\hat{y},y) = -\log (1 - \hat{y})$


(und nicht vergessen: $\hat{y} = \sigma(w \cdot x + b)$

####  Verlust für $\hat{y} = 0.70$

In [12]:
z = sum(sentiment_feat['values']*sentiment_feat['weights'])+bias
pred_y = sigmoid(z)
verlust = -(np.log(pred_y))
verlust

0.3000584796176432

Und wenn die richtige Antwort $y = 0$ ist und $\hat{y} = 0.70$?

$L(\hat{y},y) = -[ (1 - y) \log (1 - \hat{y})]$

####  Verlust für $\hat{y} = 0.70$

In [15]:
verlust = -(np.log(1-pred_y))
verlust

1.3500584796176431

### Gradient Descent: 1 Schritt 

In [16]:
text

"It's hokey . There are virtually no surprises , and the writing is second-rate . So why was it so enjoyable ? For one thing , the cast is great . Another nice touch is the music . I was overcome with the urge to get off the couch and start dancing . It sucked me in , and it'll do the same to you ."

In [17]:
sentiment_feat_s = [("x1: # pos. Wörter", 3, 0),
                  ("x2: # neg. Wörter", 2, 0)]  


sentiment_feat_s = pd.DataFrame(sentiment_feat_s, columns = ['features' , 'values', 'weights'])

print(sentiment_feat_s)

            features  values  weights
0  x1: # pos. Wörter       3        0
1  x2: # neg. Wörter       2        0


#### Bias & Learning Rate

In [18]:
bias = 0
learn_rate = 0.1

#### Gradient Vektor (3-dim)

$y = 1$

$\nabla_{w,b} = \begin{bmatrix} (\sigma(w \cdot x + b)-y)x_1 \\ (\sigma(w \cdot x + b)-y)x_2 \\ \sigma(w \cdot x + b)-y\end{bmatrix}  $ 

(1) Klassifizierungsfunktion für geschätzen y (y = 0)

$ =  
\begin{bmatrix} (\sigma(0)-1)x_1 \\ (\sigma(0)-1)x_2 \\ \sigma(0)-1\end{bmatrix} = \begin{bmatrix} (0.5-1)x_1 \\ (0.5-1)x_2 \\ 0.5-1\end{bmatrix}  $ 

(2) Verlust & Gradient Vektor

$= \begin{bmatrix} -0.5 x_1 \\ -0.5 x_2 \\ -0.5 \end{bmatrix} = 
\begin{bmatrix} -1.5 \\ -1.0 \\ -0.5 \end{bmatrix}  =
\begin{bmatrix} -1.5 \\ -1.0 \\ -0.5 \end{bmatrix} $ 

#### Wie sehen die neue Parametern nach 1 Schritt aus?

Learning Rate

$\eta = 0.1$

Gradient Vector

$\begin{bmatrix} -1.5 \\ -1.0 \\ -0.5\end{bmatrix}$


Parameter update (opposite direction as gradient):

$\theta_{t+1} = \theta_t - \eta\nabla L(f(x;\theta),y)$

$\theta_{1} = \begin{bmatrix} w_1 \\ w_2 \\ b\end{bmatrix} - 0.1
\begin{bmatrix} -1.5 \\ -1.0 \\ -0.5\end{bmatrix} =
\begin{bmatrix} 0 \\ 0 \\ 0\end{bmatrix} - 
\begin{bmatrix} -.15 \\ -.1 \\ -.05\end{bmatrix} = \begin{bmatrix} .15 \\ .1 \\ .05\end{bmatrix}$ 

## Hausaufgaben

### Übung 4.1

Noch einen Schritt Gradient Descent berechnen, mit dem folgenden Beispiel 

y = 0

In [9]:
bias = 0.5

sentiment_feat = [("# pos. Wörter", 1, .15),
                  ("# neg. Wörter", 4, .1)]  

sentiment_feat = pd.DataFrame(sentiment_feat, columns = ['features' , 'values', 'weights'])

print(sentiment_feat)

        features  values  weights
0  # pos. Wörter       1     0.15
1  # neg. Wörter       4     0.10
