In [97]:
import pandas as pd
ex1 = pd.read_csv('ex1.csv')
ex1 = ex1.rename(index=int, columns={"Unnamed: 0": "Value"})
diff = -0.014048786592986806
per_diff = -1.8152497147529003
diff = round(diff, 3)
per_diff = round(per_diff, 3)

## How to use this document
You can skip the code blocks unless you want to examine a particular calculation. The text slowly walks through a single table (`ex1.csv`) which shows a worked example of what exactly happens to surfaced hits for strikers and non-strikers during a 30% strike of ML-1M.

Some useful context to begin: for ML-1M, the difference in surfaced hits between SVD (fully personalized) and MovieMean (un-personalized) is 0.014, or 1.4% *of all* surfaced hits. This is a -1.815 *percent change* in surfaced hits. 

This will be important later for considering what these results mean in terms of "how far does a strike take performance towards un-personalized results".

In [98]:
print('Difference in % of surfaced hits', round(diff * 100, 1))
print('Percent difference in surfaced hits', per_diff)

Difference in % of surfaced hits -1.4
Percent difference in surfaced hits -1.815


One other useful piece of context, because we use 5-fold cross-validation, the maximum number of hits is equal to 115056.2 (the total hits in ML-1M divided by 5). 
So to convert from surfaced hits to total hits we just multiply by 115056.2.

For instance, below we'll see that the surfaced hits (SH) value for the all users with SVD is 77.393%.
This corresponds to 89k hits.

In [114]:
round(115056.2 * 0.77393, -2)

89000.0

First, let's look at the baseline Surfaced Hits values for our strikers, non-strikers, and the whole system ("everyone").

In [115]:
ex1[ex1.index == 0]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393


### Interpreting these baseline Surfaced Hits values
Without any strike action:
* the participants of the strike (30% of random users) were responsible for 23% of surfaced hits
* the non-participants were responsible for 54%
* together these exact values add up to about 77.39

In [116]:
round(23.227 + 54.167, 2)

77.39

We can verify that the strikers were contributing about 30% of surfaced hits in the pre-strike condition.

In [117]:
round(23.227 / 77.393 * 100, 2)

30.01

Next, let's look at the next row in this table, which will show how surfaced hits change after the strike.

### Surfaced Hits after strike

In [118]:
ex1[ex1.index.isin([0,1])]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393
1,1) SH after strike,22.77,53.917,76.687


Looks like surfaced hits decreased a bit for each group after the strike. We may also want to look directly at the *change* in hits.

This means we'll subtract row 0) from row 1) to find the change (it will be negative).
### Change in Surfaced Hits

In [119]:
ex1[ex1.index.isin([0,1,2])]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393
1,1) SH after strike,22.77,53.917,76.687
2,2) change in SH,-0.456,-0.25,-0.706


Calculating the change in surfaced hits, it becomes more obvious that the strikers themselves experience more loss in hits than the non-strikers. Recall that strikers are getting un-personalized recommendations. This is a key take-away from this worked example.

More specifically, from row 2), we can actually read the exact effect of the two data labor factors. The strikers column corresponds to the effect of strikers on themselves. The non-strikers column corresponds to the effect on strikers on non-strikers.

Interestingly, for a 30% boycott of ML-1M, the strikers getting (slightly worse than) un-personalized recommendations have about twice the effect of the non-strikers getting slightly worse recommendations.

We also verify that the effect on the whole system is equal to the sum of these two components.

Next, it will be useful to think about how this change in hits translate to *loss in personalization*. In other words, how far does the strike take performance towards un-personalized?

### Change in Surfaced Hits, normalized w.r.t to un-personalized results

In [120]:
ex1[ex1.index.isin([0,1,2,3])]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393
1,1) SH after strike,22.77,53.917,76.687
2,2) change in SH,-0.456,-0.25,-0.706
3,"3) change in SH, normalized w.r.t un-personalized",32.471,17.771,50.241


If we recall that the difference between SVD and MovieMean (i.e. the personalization gap) is equal to 1.4% surfaced hits, it makes sense that a loss of 0.25 takes non-participants 17.77% of the way towards un-personalized results.

In [121]:
round(-0.250 / -1.4 * 100, 2)

17.86

One puzzling result: why isn't the change in hits for participants not equal to 100% (since they are getting un-personalized recommendations)? This is because the strikers only comprise 30% of the population and we're looking at total surfaced hits still.

In other words, the 30% of strikers can at most take us 30% of the way towards un-personalized results. If we divide the change in hits for strikers, normalized w.r.t un-personalized results, we see indeed the amount is about 30%.

In [122]:
round(-0.456 / -1.4 * 100, 2)

32.57

To get more insight into what's happening from a group's perspective, it might be interesting to consider *percent change* instead of raw change. Specifically, this is percent change for each group, i.e. the change is relative to the group's baseline.

Critically, for an aggregate metric like surfaced hits, each group's baseline is proportional to its size.

We can look at row 0) to review what the baselines look like.

In [123]:
ex1[ex1.index.isin([0,1,2,3,4])]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393
1,1) SH after strike,22.77,53.917,76.687
2,2) change in SH,-0.456,-0.25,-0.706
3,"3) change in SH, normalized w.r.t un-personalized",32.471,17.771,50.241
4,4) % change in SH,-1.964,-0.461,-0.912


Finally, we can also think of percent change normalized w.r.t un-personalized results.

In other words, how close to un-personalized results did the strike bring *each individual group*. Here, we'd expect the strikers to experience around > 100% change.

In [124]:
ex1[ex1.index.isin([0,1,2,3,4,5])]

Unnamed: 0,Value,strikers,non-strikers,everyone
0,0) baseline,23.227,54.167,77.393
1,1) SH after strike,22.77,53.917,76.687
2,2) change in SH,-0.456,-0.25,-0.706
3,"3) change in SH, normalized w.r.t un-personalized",32.471,17.771,50.241
4,4) % change in SH,-1.964,-0.461,-0.912
5,"5) % change in SH, normalized w.r.t un-persona...",108.199,25.399,50.241


Let's dig into why the normalized % change in hits is not the same as normalized change in hits for non-participants, but it is for "everyone".

Non-participants experienced in 0.461% change in hits. The aggregate difference between fully- and un-personalized results is about 1.8.

So a drop of 0.25% of hits corresponds to the system going 17.77% of the way towards un-personalized.

In [125]:
round(0.25 / 1.4 * 100, 2)

17.86

And a percent change of -0.461 corresponds to non-participants experiencing a change that's 25.4% of the way towards un-personalized.

In other words, the asumption is that because the whole group experiences a -1.8 percent change when going from fully- to un-personalized, a group that experiences a -1.8 percent change is getting roughly un-personalized results.

* Note the use of the word roughly here. If we wanted to be extra precise with the un-personalized analysis, we would actually run a third set of experiments where we run omniscient MovieMean for every user combination. However, since MovieMean is relatively static and these numbers are just used for context, this is unnecessary.

In [126]:
round(0.461 / 1.815 * 100, 2)

25.4

Finally, we clarify the distinction here. When normalizing change in *units of surfaced hits*, we're always considering the whole system. The "how far towards un-personalized" question is about how close we are to getting the experience of a totally un-personalized system.

Specifically, the full question reads: "How far does a given group take the system towards un-personalized results?" This question is a bit unintuitive and so the results are not too useful.

To get the total normalized change in units of surfaced hits, we can just add the value for each group.

In [127]:
round(32.471 + 17.771, 2)

50.24

When normalizing percent change, we're only considering one group. The "how far towards un-personalized" question is about how close we are to getting the experience of an un-personalized system *for this group*.

Specifically, the full question here should read: "How far does a given group go towards un-personalized results for that group?"
This is probably the more intuitive question we might ask about change relative to un-personalized results. This is the number we report in Section 5.1.

To get the total normalized percent change, we can perform a weighted sum weighted by the size of each group.

In [128]:
round(108.2 * 0.3 + 25.4 * 0.7, 2)

50.24

A final note: the summative properties explored here apply only to surfaced hits. 
Taking the same approach with traditional metrics like NDCG, it is impossible to sum various groups to get "everyone". 
Instea you'd need weighted averages,