WT-2_Such-_und_Texttechnologien
- Welcher Part ist am häufigsten
- (Verhältnis statt w-keit , da keine evidenz) $$ {\displaystyle {\hat {y}}={\underset {k\in {1,\dots ,K}}{\operatorname {argmax} }}\ p(C_{k})\displaystyle \prod {i=1}^{n}p(x{i}\mid C_{k})} $$
prune(B, C) if (
B.len()/A.len() * B.err() + C.len()/A.len() * C.err()
) > A.err()
- TP: true positives
- FP: false positives
- FN: false negatives
- TN: true negatives
- P = TP + FN -- all positive instances
- N = FP + TN -- all negative instances
- Choose a value of k.
- Select k objects in an arbitrary fashion. Use these as the initial set of k centroids.
- Assign each of the objects to the cluster for which it is nearest to the centroid.
- Recalculate the centroids of the k clusters.
- Repeat steps 3 and 4 until the centroids no longer move.
-
$|p_j|$ : Anzahl aller Links aus$p_j$ -
$p_j \in B_{p_i}$ Alle Seiten die auf$p_i$ zeigen
- Log Term Freq
0 if tf(d,t) == 0 else log(tf(d,t), 10)
-
$df(t)$ -- Anzahl der Dokumente die Term$t$ enthalten