-
Notifications
You must be signed in to change notification settings - Fork 0
/
evaluation_explain.php
277 lines (267 loc) · 12 KB
/
evaluation_explain.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
<?php
/**
* Create some nice table with ratings to explain the metrics with an example
*/
function recsys_wb_evaluation_example_accuracy_table() {
$style = 'text-align:center;vertical-align:middle';
$header = array(
array( 'data' => ('User'), 'style' => $style ),
array( 'data' => t('Algorithm prediction'),'style' => $style ),
array( 'data' => t('User rating'),'style' => $style ),
);
$rows = array(
array(
array( 'data' => 'User 1', 'style' => $style ),
array( 'data' => 4, 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
),
array(
array( 'data' => 'User 2', 'style' => $style ),
array( 'data' => 2, 'style' => $style ),
array( 'data' => 4, 'style' => $style ),
),
array(
array( 'data' => 'User 3', 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
),
array(
array( 'data' => 'User 4', 'style' => $style ),
array( 'data' => 5, 'style' => $style ),
array( 'data' => 4, 'style' => $style ),
),
array(
array( 'data' => 'User 5', 'style' => $style ),
array( 'data' => 2, 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
),
);
return theme(
'table',
array(
'header' => $header,
'rows' => $rows,
)
);
}
/**
* Create some nice table with ranks to explain the metrics with an example
*/
function recsys_wb_evaluation_example_rank_table() {
$style = 'text-align:center;vertical-align:middle';
$header = array(
array( 'data' => ('Item'), 'style' => $style ),
array( 'data' => t('Algorithm prediction'),'style' => $style ),
array( 'data' => t('User rating'),'style' => $style ),
);
$rows = array(
array(
array( 'data' => 'Item 1', 'style' => $style ),
array( 'data' => 4.5, 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
),
array(
array( 'data' => 'Item 2', 'style' => $style ),
array( 'data' => 4.3, 'style' => $style ),
array( 'data' => 4.5, 'style' => $style ),
),
array(
array( 'data' => 'Item 3', 'style' => $style ),
array( 'data' => 4, 'style' => $style ),
array( 'data' => 4.5, 'style' => $style ),
),
array(
array( 'data' => 'Item 4', 'style' => $style ),
array( 'data' => 3.5, 'style' => $style ),
array( 'data' => 4, 'style' => $style ),
),
array(
array( 'data' => 'Item 5', 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
array( 'data' => 3, 'style' => $style ),
),
);
return theme(
'table',
array(
'header' => $header,
'rows' => $rows,
)
);
}
/**
* Explain why and what evaluation is
*/
function recsys_wb_explain_evaluation() {
$rs_introduction_link = l(
'[2]',
'aboutthis',
array( 'fragment' => 'References')
);
$read_more = l(
"Read more",
"readmore",
array( 'attributes' => array('target' => '_blank') )
);
$explanation = "Evaluating recommender algorithms is not an easy task as
there are simply too many objectiv functions. However since recommender systems
are widely used and there are a lot of them available, there needs to be
somthing that determines which one to use<sup>$rs_introduction_link</sup>.<br/>
There are two basic types of evaluation metrics:
<ul>
<li>Accuracy of prediction<br/>
Evaluation metrics as Mean Absolute Error (MEA) or Root Mean Squared Error
(RMSE) evaluate the algorithms in term of their prediction accuracy.
</li>
<li>Accuracy of rank<br/>
Evaluation metrics as Mean Reciprocal Rank (MRR) or Normalized Discounted
Cumulative Gain (NDGC) evalute the algorithms in term of how accurate their
predictions are in respect to the rank of the items (the prediction score is
ignored, only the rank is important).
</li>
</ul>
$read_more about evaluating recommender systems.";
return $explanation;
}
/**
* Explain the Mean Absolute Error
*/
function recsys_wb_explain_mae() {
$title = "<strong><h3>Mean Absolute Error</h3></strong>";
$explanation = "<div class='tex2jax'>";
$explanation .= "The Mean Absolut Error metric determines, by how much the
average prediction is wrong. The formula is straightforward: ";
$mea = 'MEA(r) = {\sum_{t=0}^{|T|} { | pred(r,t) - rating(t) |} \over |T|}';
$explanation .= mathBlock($mea);
$explanation .= "Where " . mathInline('T') . " represents the whole test set "
. mathInline('pred(r,t)') . " is the prediction of the recommender algorithm "
. mathInline('r') . " for the test-item " . mathInline('t') . " and "
. mathInline('rating(t)') . " is the
actual rating of test-item " . mathInline('t') . ". A test-item "
. mathInline('t') . " is unique as it represents the user-item combination,
therefore " . mathInline('rating(t) = rating(u,i)') . " which is the rating of
user " . mathInline('u') . " on item " . mathInline('i') . ".<br/>Consider the
following simple example (1 item, 5 users):";
$explanation .= recsys_wb_evaluation_example_accuracy_table();
$explanation .= "So the MEA would simply be: ";
$mea_example = '{|(4-3)| + |(2-4)| + |(3-3)| + |(5-4)| + |(2-3)| \over 5} =';
$mea_example .= '0.8';
$explanation .= mathBlock($mea_example);
$explanation .= "The MEA for itself does not help a lot. In our previous
example we had an MEA of 0.8. Is it good? Is it bad? Well this depends on the
rating scale. On a scale from 0 to 100, 0.8 is a very very good result (as the
algrithm prediction misses the actual rating in average by only 0.8%). However
on a scale from 0 to 2, 0.8 would be very very bad (as the algrithm prediction
misses the actual rating in average by 40%). So an MEA of 1.4 on a rating scale
from 0 to 10 is better than an MAE of 0.9 on a rating scale from 0 to 5. But
generally one can say, the smaller the MAE the better the algorithm";
return $title . $explanation . "</div>";
}
/**
* Explain the Root Mean Squared Error
*/
function recsys_wb_explain_rmse() {
$title = "<strong><h3>Root Mean Squared Error</h3></strong>";
$explanation = "<div class='tex2jax'>";
$explanation .= "The Mean Absolut Error metric determines, by how much the
average prediction is wrong. The formula is quite similar to the Mean Absolute
Error, but it penalizes big errors over small ones:";
$rmse = 'RMSE(r) = \sqrt{\sum_{t=0}^{|T|} { (pred(r,t) - rating(t))^2 }
\over |T|}';
$explanation .= mathBlock($rmse);
$explanation .= "Where " . mathInline('T') . " represents the whole test set "
. mathInline('pred(r,t)') . " is the prediction of the recommender algorithm "
. mathInline('r') . " for the test-item " . mathInline('t') . " and "
. mathInline('rating(t)') . " is the
actual rating of test-item " . mathInline('t') . ". A test-item "
. mathInline('t') . " is unique as it represents the user-item combination,
therefore " . mathInline('rating(t) = rating(u,i)') . " which is the rating of
user " . mathInline('u') . " on item " . mathInline('i') . ".<br/>Consider the
following simple example (1 item, 5 users):";
$explanation .= recsys_wb_evaluation_example_accuracy_table();
$explanation .= "So the RSME would be: ";
$rsme_example = '\sqrt{(4-3)^2 + (2-4)^2 + (3-3)^2 + (5-4)^2 + (2-3)^2 \over';
$rsme_example .= ' 5} = 1.18';
$explanation .= mathBlock($rsme_example);
$explanation .= "As for the MEA, the RSME for itself does not help a lot. In
our previous example we had an RSME of 1.18. Is it good? Is it bad? Well this
depends on the rating scale. On a scale from 0 to 100, 1.18 is a very very good
result (as the algrithm prediction misses the actual rating in average by only
1.18%). However on a scale from 0 to 2, 1.18 would be very very bad (as the
algrithm prediction misses the actual rating in average by 59%). So an RSME of
1.4 on a rating scale from 0 to 10 is better than an RSME of 0.9 on a rating
scale from 0 to 5. But generally one can say, the smaller the RSME the better
the algorithm.";
return $title . $explanation . "</div>";
}
/**
* Explain the Mean Reciprocal Rank
*/
function recsys_wb_explain_mrr() {
$title = "<strong><h3>Mean Reciprocal Rank</h3></strong>";
$explanation = "<div class='tex2jax'>";
$mrr_citation_link = l('[3]', 'aboutthis', array( 'fragment' => 'References') );
$explanation .= "The Mean Reciprocal Rank metric determines how good the
recommender algorithm is in putting good stuff first<sup>$mrr_citation_link
</sup>. The formula is really simple:" . mathBlock('MRR(r) = {1 \over i}');
$explanation .= "Where " . mathInline('i') . " is the rank of the first item
considered as good in the list of predictions produced by the recommender
algorithm " . mathInline('r') . ".<br/>Consider the following example (5 items,
1 user):";
$explanation .= recsys_wb_evaluation_example_rank_table();
$explanation .= "Let's say, that a 'good' item is everything what is rated 4
or more. So in this example the MRR would be 0.5: The first item, the
recommender algorithm thought it would be very good, is actually not that good.
The user rated this item only with 3. So this is not a good item. The second
item, the recommender algorithm also thought it is very good, actually is good.
The user rated this item 4.5. So as the second item is the first 'good' item the
MRR of this recommender algorithm is " . mathInline('1 \over 2') . ".";
return $title . $explanation . "</div>";
}
/**
* Explain the NDGC
*/
function recsys_wb_explain_ndgc() {
$title = "<strong><h3>Normalized Discounted Cumulative Gain</h3></strong>";
$explanation = "<div class='tex2jax'>";
$ndcg_citation_link = l('[3]', 'aboutthis', array( 'fragment' => 'References') );
$explanation .= "The Normalized Discounted Normalized Discounted Cumulative
Gain metric determines how correct the algorithms predictions are in respect of
the rank the items appear in. It compares the ranked list of the recommender
algorithm with an perfectly sorted list of the items<sup>$ndcg_citation_link
</sup>. The result of this comparision will be between smaller or equal to 1
where a perfectly sorted list will get 1. So the closer to 1 the result is, the
better the algorithm sorts the items. The formula is as follows:";
$ndcg = 'NDCG(r) = {DCG(r) \over DCG(L_{perfect})}';
$dcg = 'DCG(r) = \sum_{t=0}^{|T|} { rating(t) \times discount(t)}';
$discount = 'discount(t) = {1 \over max(1,\log_2{(rank(t))})}';
$explanation .= mathBlock($ndcg) . mathBlock($dcg) . mathBlock($discount);
$explanation .= "Where " . mathInline('r') . " is the recommender algorithm, "
. mathInline('L_{perfect}') . " is a perfectly sorted item list by users ratings
, " . mathInline('T') . " represents the whole test set "
. mathInline('pred(r,t)') . " is the prediction of the recommender algorithm "
. mathInline('r') . " for the test-item " . mathInline('t') . " and "
. mathInline('rating(t)') . " is the
actual rating of test-item " . mathInline('t') . ". A test-item "
. mathInline('t') . " is unique as it represents the user-item combination,
therefore " . mathInline('rating(t) = rating(u,i)') . " which is the rating of
user " . mathInline('u') . " on item " . mathInline('i') . ".<br/>So as one can see, the first
ratings are far more important than the last ones (caused by the logarithmic
discount). This is because the first items are much more likely to be selected
by a user so they should have more impact on the result.<br/>Let's have a look
at the following example (5 items, 1 user): ";
$explanation .= recsys_wb_evaluation_example_rank_table();
$explanation .= "In this example the rank of items produced by the algorithm
is obviously Item 1, Item 2, Item 3, Item 4, Item 5. The "
. mathInline('L_{perfect}') . " however is Item 2, Item 3, Item 4, Item 1, Item
5. This leads to";
$dcg_r = 'DCG(r) = 3 \times 1 + 4.5 \times 1 + 4.5 \times 0.68 + 4 \times
0.5 + 3 \times 0.43 = 13.85';
$dcg_l_perfect = 'DCG(L_{perfect}) = 4.5 \times 1 + 4.5 \times 1 + 4 \times
0.68 + 3 \times 0.5 + 3 \times 0.43 = 14.51';
$ndcg_r = 'NDCG(r) = {13.85 \over 14.51} = 0.95';
$explanation .= mathBlock($dcg_r) . mathBlock($dcg_l_perfect)
. mathBlock($ndcg_r);
return $title . $explanation . "</div>";
}
?>