#Analyzing Shreddit's Q2 Top 5 voting

This started out as a curiosity. I was interested in what I'd need to do to take a bunch of "Top X" lists, combine them and then ask questions to the data like, "What thing was number one the most?" or "If the votes are weighted, what does the actual top X look like?" I then remembered that Shreddit just did a voting. ;)

This isn't a scientifically accurate analysis rooted in best practices. But I'm also just getting started with data analysis. So there's that.

In [1]:
# set up all the data for the rest of the notebook
import json
from collections import Counter
from itertools import chain
from IPython.display import HTML

def vote_table(votes):
    """Render a crappy HTML table for easy display. I'd use Pandas, but that seems like
    complete overkill for this simple task.
    """
    base_table = """
    <table>
        <tr><td>Position</td><td>Album</td><td>Votes</td></tr>
        {}
    </table>
    """
    
    base_row = "<tr><td>{0}</td><td>{1}</td><td>{2}</td></tr>"
    vote_rows = [base_row.format(idx, name, vote) for idx, (name, vote) in enumerate(votes, 1)]
    return HTML(base_table.format('\n'.join(vote_rows)))

with open('shreddit_q2_votes.json', 'r') as fh:
    ballots = json.load(fh)

with open('tallied_votes.json', 'r') as fh:
    tallied = Counter(json.load(fh))

equal_placement_ballots = Counter(chain.from_iterable(ballots))

##Equal Placement Ballots

The equal placement ballot assumes that any position on the ballot is equal to any other. And given that this how the voting was designed, it makes the most sense to look at this first. There are some differences, but given that /u/kaptain_carbon was tallying by hand, and I manually copy-pasted ballots (regex is hard) and then had to manually massage some data (fixing names and the like), differences are to be expected. Another note, all the data in my set is lower cased in an effort to normalize to make the data more accurate. My analysis also includes submissions from *after* voting was closed, mostly because I was too lazy to check dates.

I'm also playing fast and loose with items that end up with the same total, rather than doing the "right thing" and marking them at the same position. So, there's that.

Here's the top ten of the table in the post. 

In [2]:
vote_table(tallied.most_common(10))

0,1,2
Position,Album,Votes
1,Sulphur Aeon - Gateway to the Antisphere,33
2,Misþyrming - Söngvar elds og óreiðu,33
3,Leviathan - Scar Sighted,28
4,Blind Guardian - Beyond the Red Mirror,28
5,Elder - Lore,25
6,Enslaved - In Times,25
7,High on Fire - Lumineferous,16
8,Melechesh - Enki,15
9,Visigoth - The Revenant King,15


And here's the top ten from my computed tally:

In [3]:
vote_table(equal_placement_ballots.most_common(10))

0,1,2
Position,Album,Votes
1,misþyrming - söngvar elds og óreiðu,30
2,sulphur aeon - gateway to the antisphere,29
3,blind guardian - beyond the red mirror,27
4,leviathan - scar sighted,25
5,elder - lore,25
6,enslaved - in times,24
7,deathhammer - evil power,16
8,melechesh - enki,15
9,visigoth - the revenant king,15


##Weighted Tally Ballot

But that's boring. What if we pretended for a second that everyone submitted a ballot where the albums were actually ranked one through five. What would the top ten look like then? There's a few ways to figure this one out. Initially, my thought was to provide a number 1 to 5 based on position to each vote and then find the lowest sum. However, the problem is that an item that only appears once will be considered the most preferred. That won't work. But going backwards from five to one for each item and then finding the largest total probably would:

In [4]:
weighted_ballot = Counter()

for ballot in ballots:
    for item, weight in zip(ballot, range(5, 0, -1)):
        weighted_ballot[item] += weight

This handles the situation where a ballot may not be full (five votes), which make up a surpsingly non trival amount of the ballots:

In [5]:
sum(1 for _ in filter(lambda x: len(x) < 5, ballots)) / len(ballots)

0.1125

Anyways, what does a top ten for weighted votes end up looking like?

In [6]:
vote_table(weighted_ballot.most_common(10))

0,1,2
Position,Album,Votes
1,misþyrming - söngvar elds og óreiðu,105
2,sulphur aeon - gateway to the antisphere,100
3,leviathan - scar sighted,98
4,blind guardian - beyond the red mirror,93
5,elder - lore,90
6,enslaved - in times,76
7,visigoth - the revenant king,52
8,melechesh - enki,50
9,deathhammer - evil power,50


Hm, it's not actually all the different. Some bands move around a little bit, Deathhammer moves into the top ten using this method. But overall, the general spread is pretty much the same.

It's also interesting to look at the difference in position from the weighted tally vs the way it's done in the thread. There's major differences between the two due to the voting difference and from including submissions from after voting expired. There's also a missing band. :?

In [10]:
regular_tally_spots = {name.lower(): pos for pos, (name, _) in enumerate(tallied.most_common(), 1)}

base_table = """
<table>
    <tr><td>Album</td><td>Regular Spot</td><td>Weighted Spot</td></tr>
    {}
</table>
"""
base_row = "<tr><td>{0}</td><td>{1}</td><td>{2}</td></tr>"

rows = [base_row.format(name, regular_tally_spots[name], pos) 
        for pos, (name, _) in enumerate(weighted_ballot.most_common(), 1)
        # some albums didn't make it, like Arcturian D:
        if name in regular_tally_spots]

HTML(base_table.format('\n'.join(rows)))

0,1,2
Album,Regular Spot,Weighted Spot
misþyrming - söngvar elds og óreiðu,2,1
sulphur aeon - gateway to the antisphere,1,2
leviathan - scar sighted,3,3
blind guardian - beyond the red mirror,4,4
elder - lore,5,5
enslaved - in times,6,6
visigoth - the revenant king,9,7
melechesh - enki,8,8
deathhammer - evil power,22,9


##What album appeared at number one most often?

Another question I've been pondering is, "How do you figure out what thing appears at number one most often?" Again, this is assuming everyone submitted a ballot with the intention of it being read as ranked. Turns out, doing this isn't that hard either:

In [15]:
number_one = Counter([b[0] for b in ballots]) 
vote_table(number_one.most_common(10))

0,1,2
Position,Album,Votes
1,leviathan - scar sighted,12
2,sulphur aeon - gateway to the antisphere,11
3,elder - lore,10
4,misþyrming - söngvar elds og óreiðu,10
5,blind guardian - beyond the red mirror,9
6,enslaved - in times,6
7,visigoth - the revenant king,5
8,melechesh - enki,5
9,high on fire - lumineferous,4


This paints a slightly different picture of the top ten. While the names are largely the same, Scar Sighted was thought of as the top album most often, despite being at two or three through the other methods. And Misþyrming is at three (okay, "2", again fast and loose with numbering) despite being the solid top choice for all other methods.

##The Take Away
There's lot of different ways to look at the ballots and different ways to tally them. Weighted voting is certainly interesting than straight up counting votes the usual way.

Originally, I had wondered if something like something along the lines of Instant Runoff Voting or data processing packages like Panadas, Numpy or SciPy would be needed. But for basic prodding and poking, it turns out the stdlib is just fine.

Also: a lot of awesome music I haven't listened to at all this year (been tied up with [Peace is the Mission](https://www.youtube.com/watch?v=Z4TwbrihsNw) the last few weeks, too, sorry guys).

##The full tables

Because someone will ask for them, here's the full tables from my analysis:

In [13]:
#regular tallying
vote_table(equal_placement_ballots.most_common())

0,1,2
Position,Album,Votes
1,misþyrming - söngvar elds og óreiðu,30
2,sulphur aeon - gateway to the antisphere,29
3,blind guardian - beyond the red mirror,27
4,leviathan - scar sighted,25
5,elder - lore,25
6,enslaved - in times,24
7,deathhammer - evil power,16
8,melechesh - enki,15
9,visigoth - the revenant king,15


In [14]:
#weighted ballot
vote_table(weighted_ballot.most_common())

0,1,2
Position,Album,Votes
1,misþyrming - söngvar elds og óreiðu,105
2,sulphur aeon - gateway to the antisphere,100
3,leviathan - scar sighted,98
4,blind guardian - beyond the red mirror,93
5,elder - lore,90
6,enslaved - in times,76
7,visigoth - the revenant king,52
8,melechesh - enki,50
9,deathhammer - evil power,50


In [16]:
#number one count
vote_table(number_one.most_common())

0,1,2
Position,Album,Votes
1,leviathan - scar sighted,12
2,sulphur aeon - gateway to the antisphere,11
3,elder - lore,10
4,misþyrming - söngvar elds og óreiðu,10
5,blind guardian - beyond the red mirror,9
6,enslaved - in times,6
7,visigoth - the revenant king,5
8,melechesh - enki,5
9,high on fire - lumineferous,4
