# Machine Learning with the Preferential Balloting Random Forest

### Disclaimer: 

This extra exercise is meant for entertaining purposes, and is inspired by Nick Parker who shared his project on Github. We modified his work and ran the model with our own movie dataset, producing different results as his (of course). ヽ(✿ﾟ▽ﾟ)ノ



> https://github.com/njparker1993/oscars_predictions/blob/master/3-machine_learning_preferential_ballot.ipynb


---
In this notebook, we will try to create a Random Forest setup by running many decorrlated, individual Decision Tree Classifiers on our movie dataset. We will be stimulating the Oscar academy with a voting body of 10,000 members, and hence the name Preferential Balloting Random Forest. 

<img src="img/crowd.png" width="700" align="center"/>

# Let's begin !

In [1]:
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier

In [2]:
data = pd.read_csv('data/movie_dataset_final.csv')
data.head()

Unnamed: 0,Year,Movie,Oscar_winner,Oscar_nominee,Runtime (min),Certificate,Directors,Actors,Metascore,IMDb_rating,...,Golden_Bear_winner,Golden_Bear_nominee,Golden_Lion_winner,Golden_Lion_nominee,PCA_winner,PCA_nominee,NYFCC_winner,NYFCC_nominee,OFCS_winner,OFCS_nominee
0,1999,Fight Club,0,0,139,R(A),David Fincher,"['Brad Pitt', 'Edward Norton', 'Meat Loaf', 'Z...",66,8.8,...,0,0,0,0,0,0,0,0,0,1
1,1999,The Matrix,0,0,136,PG,Lana Wachowski Lilly Wachowski,"['Keanu Reeves', 'Laurence Fishburne', 'Carrie...",73,8.7,...,0,0,0,0,0,0,0,0,0,0
2,1999,The Green Mile,0,1,189,R(A),Frank Darabont,"['Tom Hanks', 'Michael Clarke Duncan', 'David ...",61,8.6,...,0,0,0,0,0,0,0,0,0,0
3,1999,American Beauty,1,1,122,R(A),Sam Mendes,"['Kevin Spacey', 'Annette Bening', 'Thora Birc...",84,8.3,...,0,0,0,0,1,1,0,1,1,1
4,1999,The Sixth Sense,0,1,107,PG,M. Night Shyamalan,"['Bruce Willis', 'Haley Joel Osment', 'Toni Co...",64,8.1,...,0,0,0,0,0,0,0,0,0,0


We will be training the model with all movies from 1999 to 2018, and try to predict just the Best Picture winner of 2019 out of the 100 movies in the same year.

In [3]:
# Training Set - Excluding 2019
train = data.loc[((data['Year'] >= 1999) & (data['Year'] < 2019))]
test = data.loc[(data['Year']==2019)]

print('training set contains:', train.shape[0], 'movies')
print('Prediciting on:', test.shape[0], 'movies')

training set contains: 2000 movies
Prediciting on: 100 movies


In [24]:
# The 100 movie candidates
candidates = list(test.Movie)
print(candidates)

['Joker', 'Avengers: Endgame', 'Once Upon a Time in Hollywood', 'Captain Marvel', 'Parasite', 'Star Wars: Episode IX - The Rise of Skywalker', 'Spider-Man: Far from Home', 'The Irishman', '1917', 'Knives Out', 'John Wick: Chapter 3 - Parabellum', 'Shazam!', 'Alita: Battle Angel', 'Marriage Story', 'Aladdin', 'Us', 'Ford v Ferrari', 'Glass', 'Jojo Rabbit', 'The Lion King', 'Toy Story 4', 'It Chapter Two', 'El Camino: A Breaking Bad Movie', 'Ad Astra', 'Fast & Furious Presents: Hobbs & Shaw', 'Uncut Gems', 'Midsommar', 'Dark Phoenix', 'Jumanji: The Next Level', 'Pokémon Detective Pikachu', 'Godzilla: King of the Monsters', 'Terminator: Dark Fate', '6 Underground', 'Rocketman', 'Zombieland: Double Tap', 'Frozen II', 'Triple Frontier', 'Doctor Sleep', 'Men in Black: International', 'The Lighthouse', 'How to Train Your Dragon: The Hidden World', 'The Gentlemen', 'Little Women', 'Yesterday', 'Murder Mystery', 'The Two Popes', 'Long Shot', 'Ready or Not', 'Booksmart', 'Escape Room', 'Pet Sema

---
In an attempt to make the balloting process fair and just, we will be including ALL the features available in our movie dataset.

In [5]:
features = [ 'Runtime (min)', 'Action', 'Adventure', 'Animation', 'Biography', 'Comedy', 'Crime', 
             'Drama', 'Family','Fantasy', 'History', 'Horror', 'Musical', 'Mystery', 'Romance', 
             'Sci-Fi', 'Sport','Thriller', 'War', 'Western','Budget','Domestic (US) gross',
             'International gross','Worldwide gross','Metascore', 'IMDb_rating', 'IMDb_votes', 'RT_rating', 'RT_review',
             'GG_drama_winner', 'GG_drama_nominee', 'GG_comedy_winner', 'GG_comedy_nominee',
             'BAFTA_winner', 'BAFTA_nominee', 'DGA_winner', 'DGA_nominee',
             'PGA_winner', 'PGA_nominee', 'CCMA_winner', 'CCMA_nominee',
             'Golden_Palm_winner', 'Golden_Palm_nominee', 'Golden_Bear_winner', 'Golden_Bear_nominee',
             'Golden_Lion_winner', 'Golden_Lion_nominee', 'PCA_winner', 'PCA_nominee',
             'NYFCC_winner', 'NYFCC_nominee', 'OFCS_winner', 'OFCS_nominee']  

# Simulating a Voter using a Decision Tree

Each voter will take a Decisions Tree trained on a smaller part of the data to pick that voter's rankings of the 100 movies in 2019. Meaning each voter will be rating each and every movie of 2019 from 1st place, 2nd place, all the way up to 100th place based on his/her own preference.

In [6]:
voter1 = DecisionTreeClassifier(splitter='random',
                                max_depth=3# Low depth allows for some randomness
                                min_samples_leaf=3,
                                random_state = 92)

Each Decision Tree in our Preferential Balloting Random Forest needs to produce ranked ballot rather that a classificaiton per film. The difference in the outputs of a tree from a traditional Random Forest and a tree from this Preferential Balloting Random Forest can be seen below:

In [7]:
def simulate_a_vote(model, train_df, to_predict_df, features):
    """
    This function creates, trains, and predicts with a DecisionTree to simulate an Academy voter.
    Each tree only sees a part of the data and gets Noise to decorrelate them from each other.
    The prediction is then ranked to create our ballot for Preferential Balloting
    """
    
    train = train_df.copy()
    test = to_predict_df.copy()
    
    # A noise column, randomly generated each time represents a voter's bias
    train.loc[:,'Noise'] = np.random.rand(train_df.shape[0])
    test.loc[:,'Noise'] = np.random.rand(to_predict_df.shape[0])

    # Looking at a random amount of awards shows (similar to bootstrapping)
    # This reflects a voter's attention to the season
    # num_features is how many of the features they care about
    num_features = np.random.choice(int(len(features)/1.7))
    voter_features = list(np.random.choice(features, num_features)) + ['Noise']

    x = np.array(train[voter_features])
    y = np.array(train['Oscar_winner'])
    
    model.fit(x,y)
    
    # ProbA of the voter will represent the ranked votes
    ballot_clean = model.predict_proba(np.array(test[voter_features]))[:,1]
    # Add small random values to break up ties
    ballot = ballot_clean + np.random.rand(len(ballot_clean))/10000
    
    # Use np.argsort() to rank the order of the probA
    # The Academy uses ranked votes calculate winner
    temp = ballot.argsort()
    ranks = np.empty_like(temp)
    ranks[temp] = np.arange(len(ballot))
    ranks = np.abs(ranks - len(ballot))
    return ranks

# Simulating the Entire Academy

By casting a vote many times, we can get the ballot from the entire academy

In [8]:
def simulate_voting_body(num_voters, model, train_df, to_predict_df, features):
    """
    Runs simulate_a_vote and collects ballots from an academy of num_voters size
    """
    collected_ballots = np.zeros((num_voters, to_predict_df.shape[0]))
    for i in range(num_voters):
        collected_ballots[i,:] = simulate_a_vote(model, train_df, to_predict_df, features)
    return collected_ballots

In [21]:
n = 3
print(f'Here is an example of a {n}-person Academy:')
print(simulate_voting_body(n, voter1, train, test, features))

Here is an example of a 3-person Academy:
[[  3. 100.   4.  48.  10.  73.  91.   7.   1.   6.  84.  97.  49.   5.
   75.  38.   8.  30.   2.  24.  92.  44.  95.  31.  90.  74.  35.  39.
   63.  28.  64.  14.  19.  68.  26.  42.  78.  54.  70.  21.  89.  67.
    9.  61.  96.  33.  13.  99.  69.  12.  51.  55.  36.  46.  40.  37.
   88.  56.  16.  41.  65.  85.  23.  83.  87.  27.  62.  11.  29.  52.
   98.  79.  17.  82.  25.  45.  76.  43.  58.  81.  77.  22.  66.  18.
   34.  32.  71.  93.  86.  57.  20.  53.  72.  59.  94.  47.  60.  80.
   15.  50.]
 [ 66.  28.   5.  65.   1.  97.  51.   3.   4.  15.  38.  49.  20.  24.
   60.  79.  26.  11.   2.  43.  33.  92.  90.  30.  91.  55.  67.  62.
   23.  18.  87.  40.   7.  94.  93.  39.  85.  46.  75.  58.  41.  99.
   47.  83.  19.  44.  72.  81.  50.  21.  77.   6.  73.  54.  96.  34.
   36.  53.  80.  74.  52.  89.  32.  29.  64.  22.  82.  10.  76.  12.
   59.  27.  78.  88.  48.  56.  86.  68.  63.  31.  14.  98.  25.  42.
   37.  6

So basically these 3 people voted the 100 movies based on a ranking of their preference.

In [10]:
def tally_votes(voting_body, list_of_nominees):
    # List of nominees must be in the same order as the vote index
    firsts = np.where(voting_body==1,1,0)
    tally = np.sum(firsts, axis = 0)
    tallied_votes_df = pd.DataFrame(tally, columns=['Votes']).T
    tallied_votes_df.columns = list_of_nominees
    return tallied_votes_df.T.sort_values('Votes', ascending = False)

In [None]:
n = 1000
this_academy = simulate_voting_body(n, voter1, train, test, features)
print(f"Overall, this {n}-person academy's top picks look like this:")
plot_df1 = tally_votes(this_academy, candidates)

# Tiered Voting Changes
We start elimnating the least voted for film from the ballots and re-ranking the films

In [12]:
def remove_least(voting_body, list_of_nominees):
    """
    A function used for the elimination step of Preferential Balloting
    This function determines which film has the least #1 rankings and removes it
    """
    # List of nominees must be in the same order as the vote index
    firsts = np.where(voting_body==1,1,0)
    tally = np.sum(firsts, axis = 0)
    least_votes_index = np.argmin(tally)
    
    # Removes the least voted entry (from # 1 to 0)
    voting_body = np.delete(voting_body, least_votes_index, axis = 1)
    list_of_nominees.remove(list_of_nominees[least_votes_index])
    return voting_body, list_of_nominees

In [13]:
def re_rank_ballots(voting_body):
    """
    Another function used for the elimination step of Preferential Balloting
    Takes a voting body (numpy array)
    Makes sure each row goes from 1 to shape[1]
    """
    re_ranked = np.zeros(voting_body.shape)
    for i in range(voting_body.shape[0]):
        temp = voting_body[i,:].argsort()
        ranks = np.empty_like(temp)
        ranks[temp] = np.arange(len(voting_body[i,:]))
        re_ranked[i,:] = ranks + 1
    return re_ranked

In [14]:
def run_one_round_of_eliminations(voting_body, list_of_nominees):
    """
    A function which runs one elimination step of Preferential Balloting 
    Takes in a Voting Body and List of Nominess and returns them,
    but the film with the least #1 votes has bene removed
    """    
    voting_body, list_of_nominees = remove_least(voting_body, list_of_nominees)
    voting_body = re_rank_ballots(voting_body)
    return voting_body, list_of_nominees

In [15]:
# Dry run with 1,000 participants for just one round of eliminations
new_votes, new_noms = run_one_round_of_eliminations(this_academy, list(test.Movie))

print(len(new_noms), 'films remaining')
print('\nNew Standings:')
tally_votes(new_votes, new_noms)

99 films remaining

New Standings:


Unnamed: 0,Votes
1917,468
Once Upon a Time in Hollywood,217
Parasite,67
The Irishman,67
Joker,35
...,...
Maleficent: Mistress of Evil,0
Bombshell,0
Cold Pursuit,0
The Lego Movie 2: The Second Part,0


# Re-Rank until one movie has more than 50% of the vote
This is where the real simulation comes in. Whenever a movie gains >50% of the total votes, the winner has been determined and the balloting process is over. We put together all the previous functions to simulate the result of the 2019 Best Picture voting

In [16]:
def run_preferential_voting(voting_body,list_of_nominees, show_steps = False):
    """
    Runs the process of Preferential Balloting on a voting_body(matrix)
    Terminates when one movie has greater than 50% of the total votes
    """   
    top_pick_percent = tally_votes(voting_body,list_of_nominees).max()[0]/tally_votes(voting_body,list_of_nominees).sum()[0]
    
    while top_pick_percent < 0.5:
        voting_body,list_of_nominees = run_one_round_of_eliminations(voting_body, list_of_nominees)
        top_pick_percent = tally_votes(voting_body,list_of_nominees).max()[0]/tally_votes(voting_body,list_of_nominees).sum()[0] 
        
        if show_steps:
            print(tally_votes(voting_body, list_of_nominees),'\n')
            
    return voting_body, list_of_nominees

---
# Lets Simulate the Oscars! ٩(｡・ω・｡)﻿و

In [18]:
print('training set contains:', train.shape[0], 'Movie')
print('Prediciting on:', test.shape[0], 'Movie')

# PicK the model we want for each random voter
voter_model = DecisionTreeClassifier(splitter='random',
                                     max_depth=3,
                                     min_samples_leaf=3,
                                     random_state = 92)

num_voters_academy = 10000
print(f'\nSimulating an Academy with {num_voters_academy} random voters.....')
academy_sim = simulate_voting_body(num_voters=num_voters_academy, model = voter_model, train_df = train, to_predict_df = test, features=features)

print('\nInitial Rankings:\n----------------------------------------')
print(tally_votes(academy_sim, list(test.Movie)),'\n')

print("Now we start eliminating films untill there one has more than 50% of the top picks:\n------------------------------------------------------")
final_ballot, final_films = run_preferential_voting(academy_sim, list(test.Movie), True)

training set contains: 2000 Movie
Prediciting on: 100 Movie

Simulating an Academy with 10000 random voters.....

Initial Rankings:
----------------------------------------
                               Votes
1917                            4433
Once Upon a Time in Hollywood   2271
Parasite                         767
The Irishman                     691
Jojo Rabbit                      355
...                              ...
Annabelle Comes Home               4
Always Be My Maybe                 4
Escape Room                        3
Triple Frontier                    3
Dumbo                              2

[100 rows x 1 columns] 

Now we start eliminating films untill there one has more than 50% of the top picks:
----------------------------------------
                                            Votes
1917                                         4433
Once Upon a Time in Hollywood                2271
Parasite                                      767
The Irishman                    

                                       Votes
1917                                    4434
Once Upon a Time in Hollywood           2272
Parasite                                 767
The Irishman                             691
Jojo Rabbit                              356
...                                      ...
Fast & Furious Presents: Hobbs & Shaw      6
Midsommar                                  6
Pokémon Detective Pikachu                  6
6 Underground                              6
Happy Death Day 2U                         5

[84 rows x 1 columns] 

                               Votes
1917                            4434
Once Upon a Time in Hollywood   2272
Parasite                         767
The Irishman                     691
Jojo Rabbit                      356
...                              ...
Brightburn                         6
Child's Play                       6
The Dirt                           6
Ready or Not                       6
El hoyo                     

                               Votes
1917                            4436
Once Upon a Time in Hollywood   2273
Parasite                         769
The Irishman                     693
Jojo Rabbit                      356
...                              ...
Captive State                      8
Booksmart                          8
Hellboy                            8
Maleficent: Mistress of Evil       8
Klaus                              8

[68 rows x 1 columns] 

                               Votes
1917                            4436
Once Upon a Time in Hollywood   2273
Parasite                         769
The Irishman                     693
Jojo Rabbit                      356
...                              ...
Booksmart                          8
Captive State                      8
Hellboy                            8
Klaus                              8
Maleficent: Mistress of Evil       8

[67 rows x 1 columns] 

                               Votes
1917                     

                                            Votes
1917                                         4438
Once Upon a Time in Hollywood                2273
Parasite                                      773
The Irishman                                  694
Jojo Rabbit                                   358
Joker                                         333
Marriage Story                                158
The Lion King                                  60
Uncut Gems                                     53
Knives Out                                     43
Avengers: Endgame                              42
Ford v Ferrari                                 40
Little Women                                   31
Ad Astra                                       28
Us                                             28
The Dead Don't Die                             28
Midway                                         25
Rocketman                                      25
Yesterday                                      25


                                            Votes
1917                                         4440
Once Upon a Time in Hollywood                2274
Parasite                                      773
The Irishman                                  694
Jojo Rabbit                                   358
Joker                                         333
Marriage Story                                158
The Lion King                                  62
Uncut Gems                                     53
Knives Out                                     44
Avengers: Endgame                              43
Ford v Ferrari                                 40
Little Women                                   31
The Dead Don't Die                             29
Ad Astra                                       29
Us                                             28
Rocketman                                      26
Yesterday                                      26
Midway                                         25


                                            Votes
1917                                         4440
Once Upon a Time in Hollywood                2274
Parasite                                      773
The Irishman                                  696
Jojo Rabbit                                   358
Joker                                         335
Marriage Story                                160
The Lion King                                  63
Uncut Gems                                     54
Avengers: Endgame                              47
Knives Out                                     44
Ford v Ferrari                                 42
Little Women                                   31
Ad Astra                                       29
The Dead Don't Die                             29
Us                                             28
Yesterday                                      27
Rocketman                                      26
Midway                                         25


                                            Votes
1917                                         4440
Once Upon a Time in Hollywood                2274
Parasite                                      773
The Irishman                                  698
Jojo Rabbit                                   360
Joker                                         336
Marriage Story                                165
The Lion King                                  63
Uncut Gems                                     55
Avengers: Endgame                              47
Knives Out                                     45
Ford v Ferrari                                 43
Little Women                                   34
Yesterday                                      30
Us                                             29
The Dead Don't Die                             29
Ad Astra                                       29
Rocketman                                      26
Midway                                         25


                                            Votes
1917                                         4441
Once Upon a Time in Hollywood                2275
Parasite                                      775
The Irishman                                  699
Jojo Rabbit                                   361
Joker                                         338
Marriage Story                                166
The Lion King                                  64
Uncut Gems                                     56
Avengers: Endgame                              49
Knives Out                                     47
Ford v Ferrari                                 44
Little Women                                   35
The Dead Don't Die                             31
Us                                             31
Yesterday                                      30
Ad Astra                                       29
Rocketman                                      27
The King                                       26


                                            Votes
1917                                         4441
Once Upon a Time in Hollywood                2275
Parasite                                      777
The Irishman                                  702
Jojo Rabbit                                   362
Joker                                         341
Marriage Story                                167
The Lion King                                  67
Uncut Gems                                     59
Avengers: Endgame                              50
Knives Out                                     49
Ford v Ferrari                                 46
Little Women                                   35
The Dead Don't Die                             33
Rocketman                                      33
Us                                             32
Ad Astra                                       31
Yesterday                                      30
The King                                       28


                                            Votes
1917                                         4444
Once Upon a Time in Hollywood                2283
Parasite                                      778
The Irishman                                  708
Jojo Rabbit                                   364
Joker                                         342
Marriage Story                                170
The Lion King                                  69
Uncut Gems                                     62
Knives Out                                     52
Avengers: Endgame                              51
Ford v Ferrari                                 47
Little Women                                   38
Rocketman                                      38
The Dead Don't Die                             36
Yesterday                                      34
Us                                             33
Ad Astra                                       32
The Lighthouse                                 29


                                     Votes
1917                                  4451
Once Upon a Time in Hollywood         2284
Parasite                               785
The Irishman                           714
Jojo Rabbit                            365
Joker                                  350
Marriage Story                         178
The Lion King                           69
Uncut Gems                              65
Ford v Ferrari                          57
Knives Out                              56
Avengers: Endgame                       55
Little Women                            45
Rocketman                               42
The Dead Don't Die                      40
Yesterday                               38
Us                                      36
Dolemite Is My Name                     35
The King                                35
Midway                                  34
Ad Astra                                34
The Lighthouse                          34
Good Boys  

                               Votes
1917                            4460
Once Upon a Time in Hollywood   2298
Parasite                         796
The Irishman                     723
Jojo Rabbit                      378
Joker                            369
Marriage Story                   186
Uncut Gems                        76
The Lion King                     75
Knives Out                        69
Ford v Ferrari                    68
Avengers: Endgame                 64
Little Women                      56
Rocketman                         54
The King                          53
The Dead Don't Die                52
Us                                50
Yesterday                         46
Ad Astra                          43
Dolemite Is My Name               43
The Lighthouse                    41 

                               Votes
1917                            4462
Once Upon a Time in Hollywood   2300
Parasite                         799
The Irishman                     726

Jojo Rabbit                      511 

                               Votes
1917                            4678
Once Upon a Time in Hollywood   2548
Parasite                        1168
The Irishman                    1007
Joker                            599 

                               Votes
1917                            4809
Once Upon a Time in Hollywood   2694
Parasite                        1345
The Irishman                    1152 

                               Votes
1917                            5256
Once Upon a Time in Hollywood   3028
Parasite                        1716 



In [19]:
tally_votes(final_ballot, final_films)

Unnamed: 0,Votes
1917,5256
Once Upon a Time in Hollywood,3028
Parasite,1716


In [22]:
winner = np.array(tally_votes(final_ballot, final_films).reset_index())[0][0].split('(')[0].strip()
print(f'And the Oscar goes to...\n🎉🏆 {winner} 🏆🎉')

And the Oscar goes to...
🎉🏆 1917 🏆🎉


Not bad!! The actual Best Picture winner of 2019, Parasite ranked the 3th place in our preferential balloting model. I will take it as a success! In conclusion,
This random forest set-up simulated the Oscar academy and changed the normally vote counting process of the RF to use preferential balloting.