# Open Meta Team Ranker

So here I've created an open meta team ranker based on PvPoke data from the top 55 ranked great league mons in PvPoke.com. All of this is done in Julia, but hopefully these little notes should clarify what the ranker is doing. I recommend running everything, and then reading through this. Some sections take a little bit of time to run (though mostly the inital data collection at 80ish seconds for 45 pokemon and the first histogram which can take a few minutes).

## Getting Started

### Installing Packages

Here I'm just grabbing some of the packages I need. I'll be reading from CSVs into DataFrames and making plots and using random distributions for uncertainty. Basically, I'm just grabbing some tools that make my life easier programming. I also added a progress meter to the summary stats function.

In [1]:
using CSV, Plots, DataFrames, Distributions, ProgressMeter, Distributed, SharedArrays, Mmap
gr();

### Reading Data

I've got a folder that contains the matrix battle CSV from PvPoke.

In [2]:
rankings = CSV.read("open-ultra.csv")
const numMons = nrow(rankings)
rankings

Unnamed: 0_level_0,Column1,Giratina (Origin) SC+SB/OW,Giratina (Altered) SC+DC/AP
Unnamed: 0_level_1,String,Int64,Int64
1,Giratina (Origin) SC+SB/OW,500,564
2,Giratina (Altered) SC+DC/AP,435,500
3,Cresselia C+FS/M,248,586
4,Registeel LO+FC/FB,503,583
5,Heracross C+CC/M,121,189
6,Mew SC+O/P,398,348
7,Regirock LO+SE/FB,475,393
8,Swampert MS+HC/S,350,393
9,Muk (Alolan) S+GS/DP,536,686
10,Mewtwo (Armored) C+FS/RS,196,414


### Colors

Just defining some colors so that things look pretty. However, there aren't cup typings this time, so its fairly arbitrary as to which typing I choose to determine the color of each mon. For Alolan Ninetales, for instance, I think of ice first, and thus that is the color of Alolan Ninetales in this model. Feel free to change these around as you feel, but given that they are primarily an aesthetic choice, I elected to not put a whole lot of thought into which color each mon gets.

In [3]:
opacity = 0.7
grass = RGBA(94/255,189/255,91/255, opacity); dragon = RGBA(14/255,104/255,184/255, opacity); dark = RGBA(86/255,86/255,99/255, opacity); normal = RGBA(153/255,159/255,161/255, opacity); fire = RGBA(254/255,163/255,84/255, opacity); ground = RGBA(212/255,141/255,91/255, opacity); poison = RGBA(193/255,98/255,212/255, opacity); rock = RGBA(208/255,196/255,142/255, opacity); ghost = RGBA(89/255,107/255,181/255, opacity); psychic = RGBA(245/255,126/255,121/255, opacity); ice = RGBA(120/255,212/255,192/255, opacity); water = RGBA(86/255,158/255,222/255, opacity); fighting = RGBA(213/255,63/255,91/255, opacity); steel = RGBA(82/255,142/255,160/255, opacity); fairy = RGBA(240/255,152/255,228/255, opacity); flying = RGBA(148/255,171/255,225/255, opacity); bug = RGBA(158/255,195/255,49/255, opacity); electric = RGBA(246/255,215/255,75/255, opacity);

### Team Numbers

Here I'm mapping the 3 mon teams to a number, such that each team has its own unique number from 1 to 12180 ($\frac{30 \cdot 29 \cdot 28}{2}$)

In [4]:
function prohibited(mon1, mon2, mon3)
    if mon1 == 1 && mon2 == 2 || mon2 == 1 && mon1 == 2 || mon1 == 1 && mon3 == 2 || mon3 == 1 && mon1 == 2 || mon2 == 1 && mon3 == 2 || mon3 == 1 && mon2 == 2
        return true
    elseif mon1 == 10 && mon2 == 44 || mon2 == 10 && mon1 == 44 || mon1 == 10 && mon3 == 44 || mon3 == 10 && mon1 == 44 || mon2 == 10 && mon3 == 44 || mon3 == 10 && mon2 == 44
        return true
    elseif mon1 == 9 && mon2 == 81 || mon2 == 8 && mon1 == 81 || mon1 == 9 && mon3 == 81 || mon3 == 9 && mon1 == 81 || mon2 == 9 && mon3 == 81 || mon3 == 9 && mon2 == 81
        return true
    else
        return false
    end
end

prohibited (generic function with 1 method)

In [5]:
const m = Matrix{Int64}[[mon1 mon2 mon3] for mon1 = 1:numMons, mon2 = 1:numMons, mon3 = 1:numMons if mon1 != mon2 && mon1 != mon3 && mon2 < mon3 && !prohibited(mon1, mon2, mon3)];

In [6]:
const numTeams = length(m)
numTeams

484218

In [7]:
teamNumberVar = zeros(numMons, numMons, numMons)
@simd for i = 1:numTeams
    @inbounds teamNumberVar[m[i][1], m[i][2], m[i][3]] = Int(i)
end
const teamNumber = teamNumberVar;

### Setting Up the Tables

I'm grabbing just the data I need and putting it in a constant (for SPEED) matrix, and also defining a matrix to store the outputs in.

In [8]:
# clean up ranking data for simulation
# Defining as constant for SPEED
const ranks = [rankings[i, j] for i = 1:numMons, j = 2:(numMons + 1)];

In [9]:
const processes = 32
addprocs(processes)
@eval @everywhere const numMons=$numMons
@eval @everywhere const m=$m
@eval @everywhere const numTeams=$numTeams
@eval @everywhere const ranks=$ranks
@everywhere using Mmap
@everywhere io = open("/tmp/mmap.bin", "w+")
@everywhere teamBattlesVar = Mmap.mmap(io, Array{UInt8, 2}, (numTeams, numTeams), shared = true)

## The Model

### Assumptions

So, we all know assumptions are bad. But in data science, sometimes our models need to be simplified so we can work with the information that we have in a reasonable time scale. Therefore, I have made some simplifying assumptions for the model of PvP battles, and will add some uncertainty to account for some of the differences between this model and the reality of PvP battling.

*   **No Switching:** Switching is a weird mechanic for which the timing is never consistent, when and if you should use it in scenario X is hotly debated, and then your opponent switches, which is a lot of variability. As will be a pretty common motivator among all of these reasons, if I have the home team switch perfectly, it involves some knowledge of the opposing team that you don't have in that situation. All of this to say, for this model, nobody switches because it makes everything work better, my life easier, and doesn't run the risk of having the model make decisions better than any player could be expected to. 

*   **Each Mon Gets One Shield:** I know, I know. Doesn't that add up to three shields? Well, yeah. But again, shielding choices add in some variability, and perfect use involves some knowledge of the opposing team that you don't have in that situation. Plus, then one data set is needed, the 1-1 shield matchups from PvPoke, which I believe also implicitly have the shield used on the first charged move. 

*   **Players Play Perfectly Otherwise:** Whoa. In all that avoidance of perfection, now I want my model to be perfect? Well, for one, this is just based on the assumptions in PvPoke. For two, I stand by that decision, as its perfection that's acheivable by the knowledge a player has in a particular situation. You may not know when to shield or switch, but you do probably know that you want an excellent charged move and to tap out fast moves (also I'm assuming everyone is on 1.57 or higher, because I do not want to deal with under or overtapping). Also, this model assumes the ideal moveset for each mon (though you can change that by changing the CSVs.

*   **Putting a Mon in the Second Position is the Same as Putting it in the Third:** I think this assumption is generally accurate. I think I've seen arguments for putting the fast switch in the third position, but I'm also assuming players play perfectly and there's no switching, so I genuinely think this doesn't affect the model, but I'm including it anyway.

*   **Mons Appear Uniformly Among the Top 30 Mons:** Sorry, Spoink fans. This is to keep the amount of data this processes to a reasonable amount. There is another version of this model that uses the Silph distribution instead, however, Silph data does not exist for the Ferocious Cup yet.

*   **Score Above 1500 is a Win:** This is based on the PvPoke battle score, and since there are three battles, its out of 3000 instead of 1000. Scoring is explained more below, but this is the assumption of what we do with that score. 

None of these assumptions are set in stone. In fact, if you have a way to change them and think that that's more useful to you, 1) go ahead and 2) let me know how you did it. 

### Scoring

I know, this section is already too long, just show you the data. But here's what the data means. Each PvPoke battle is given a score of 0-1000 with 500 representing a tie, and above that being a win, below that a loss. The score for the three-mon battle is adding those three scores together. But, that depends on which matchups you see. The lead pokemon always face each other. If your lead wins, you get the favorable matchups (because your opponent had to put in a pokemon and you can counter it), if your lead loses, you get the less favorable matchups. And, as stated above, over 1500 is a win.

In [10]:
@everywhere function individual_battle_verbose(home1::Integer,home2::Integer,home3::Integer,away1::Integer,away2::Integer,away3::Integer)
    
    @fastmath             score  = 1000 - ranks[away1, home1]
    @fastmath secondBattleResult1 = 1000 - ranks[away2, home2]
    @fastmath secondBattleResult2 = 1000 - ranks[away3, home2]
    @fastmath thirdBattleResult1  = 1000 - ranks[away2, home3]
    @fastmath thirdBattleResult2  = 1000 - ranks[away3, home3]

    if score > 500 
        @fastmath score += max(secondBattleResult2 + thirdBattleResult1, secondBattleResult1 + thirdBattleResult2) 
    elseif score < 500 
        @fastmath score += min(secondBattleResult2 + thirdBattleResult1, secondBattleResult1 + thirdBattleResult2) 
    else
        @fastmath score += (secondBattleResult2 + thirdBattleResult1 + secondBattleResult1 + thirdBattleResult2) ÷ 2
    end 
    
    return UInt8(score ÷ 30)
    
end;

In [11]:
individual_battle_verbose(1, 2, 3, 1, 2, 3)
@time individual_battle_verbose(1, 2, 3, 1, 2, 4)

  0.000002 seconds (4 allocations: 160 bytes)


0x30

In [12]:
@everywhere function run_away_teams_verbose(i::Integer, mon1::Integer, mon2::Integer, mon3::Integer)
    @simd for j = 1:numTeams
        # Do the battle!
        # Use the function we wrote above
        @fastmath @inbounds teamBattlesVar[j, i] = individual_battle_verbose(m[j][1],m[j][2],m[j][3], mon1,mon2,mon3)
    end
    Mmap.sync!(teamBattlesVar)
end

In [13]:
run_away_teams_verbose(2, 4, 1, 2)
@time run_away_teams_verbose(numTeams, 3, 1, 2)

  0.025452 seconds (967.93 k allocations: 14.770 MiB)


In [14]:
GC.gc()

In [15]:
function run_home_teams_verbose()
    @sync @distributed for i = 1:numTeams
        @inbounds run_away_teams_verbose(i,m[i][1],m[i][2],m[i][3])
    end
    rmprocs(2:(processes + 1))
    GC.gc()
    close(io)
end

run_home_teams_verbose (generic function with 1 method)

In [None]:
# Run once to compile, run again to test speed 
#run_home_teams_verbose()
@time run_home_teams_verbose()

### Summary Stats

Here I'm going to save the summary stats for every team, where my summary stats are the number of wins, and the mean, variance, skewness, and kurtosis of the scores. 

In [None]:
function generate_summary_stats()
    summaryStats = zeros(numTeams, 5)
    p = x -> (x < 50)
    io = open("/tmp/mmap.bin")
    teamBattles = Mmap.mmap(io, Array{UInt8, 2}, (numTeams, numTeams), shared = true)
    @showprogress for i = 1:numTeams
        @inbounds team = teamBattles[:, i]
        @fastmath numWins = count(p,team)
        @inbounds @fastmath summaryStats[i, :] = [100*numWins/numTeams (100 - mean(team)) m[i][1] m[i][2] m[i][3]]
    end
    return summaryStats
end

In [None]:
@time const summaryStats = generate_summary_stats()

## Number of Wins

Here we've got the information related to the number of wins. Below is a histogram of the number of teams vs their number of wins, a sorted list that shows the best and worst teams by number of wins, and a histogram comparing the scores of the best and worst teams by number of wins.

In [None]:
histogram(summaryStats[:, 1], bins = 100, label = "Number of Wins", color = :blue, alpha =0.5, xlims = (0, 100))

In [None]:
f = open("ultra.txt")
lines = readlines(f)

In [None]:
sumStatsNumWins = sortslices(summaryStats, by=x->x[1], dims = 1, rev = true)

names1 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names1[i, 1] = lines[Int(sumStatsNumWins[i, 3])]
end

names2 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names2[i, 1] = lines[Int(sumStatsNumWins[i, 4])]
end

names3 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names3[i, 1] = lines[Int(sumStatsNumWins[i, 5])]
end

teamNumbers = zeros(numTeams)
for i = 1:numTeams
    teamNumbers[i] = Int(teamNumber[Int(sumStatsNumWins[i, 3]), Int(sumStatsNumWins[i, 4]), Int(sumStatsNumWins[i, 5])])
end

sumStatsNumWins = hcat(sumStatsNumWins, names1)
sumStatsNumWins = hcat(sumStatsNumWins, names2)
sumStatsNumWins = hcat(sumStatsNumWins, names3)
sumStatsNumWins = hcat(sumStatsNumWins, teamNumbers)

sumStatsNumWins = DataFrame(sumStatsNumWins)

rename!(sumStatsNumWins, Symbol("x1")=>Symbol("Win %"))
rename!(sumStatsNumWins, Symbol("x2")=>Symbol("Avg Score"))
select!(sumStatsNumWins, Not(:x3))
select!(sumStatsNumWins, Not(:x4))
select!(sumStatsNumWins, Not(:x5))
rename!(sumStatsNumWins, Symbol("x6")=>Symbol("Mon 1"))
rename!(sumStatsNumWins, Symbol("x7")=>Symbol("Mon 2"))
rename!(sumStatsNumWins, Symbol("x8")=>Symbol("Mon 3"))
rename!(sumStatsNumWins, Symbol("x9")=>Symbol("Team Number"));

sumStatsNumWins

In [None]:
CSV.write("ultra.csv", sumStatsNumWins)

In [None]:
summaryStats = sortslices(summaryStats, by=x->x[1], dims = 1, rev = true)

In [None]:
function get_top_counters(n::Integer, m::Integer)
    counters = Array{String,2}(undef, n, m)
    io = open("/tmp/mmap.bin")
    teamBattles = Mmap.mmap(io, Array{UInt8, 2}, (numTeams, numTeams), shared = true)
    @showprogress for i = 1:n
        num = 0
        index = 0
        @inbounds team = teamBattles[:, i]
        while num < m
            index += 1
            team_index = Int(teamNumber[Int(summaryStats[index, 3]), Int(summaryStats[index, 4]), Int(summaryStats[index, 5])])
            if team[team_index] < 30
                num += 1
                counters[i, num] = string(100 - team[team_index]) * " " * string(summaryStats[index, 1]) * " " * 
                    lines[Int(summaryStats[index, 3])] * " " * lines[Int(summaryStats[index, 4])] * 
                    " " * lines[Int(summaryStats[index, 5])]
            end
        end
    end
    close(io)
    return counters
end

In [None]:
@time const counters = get_top_counters(5000, 20)
CSV.write("ultra-counters.csv", DataFrame(counters))
#DataFrame(counters)

Here we've got the number of wins on average in every team a particular pokemon is a part of and rank them below, again showing the best and worst.

In [None]:
avgNumOfWins = zeros(numMons, 2)
for i = 1:numTeams
    avgNumOfWins[m[i][1], 1] += summaryStats[i, 1]
    avgNumOfWins[m[i][2], 1] += summaryStats[i, 1]
    avgNumOfWins[m[i][3], 1] += summaryStats[i, 1]
end
for i = 1:numMons
    avgNumOfWins[i, 2] = i
    d = x -> (i in x)
    divisor = count(d, m)
    avgNumOfWins[i, 1] /= divisor
end

avgNumOfWinsSorted = sortslices(avgNumOfWins, by=x->x[1], dims = 1, rev = true)

nameNumWins = Array{String}(undef, numMons, 1)
for i = 1:numMons
    nameNumWins[i, 1] = rankings[Int(avgNumOfWinsSorted[i, 2]), 1]
end

avgNumOfWinsSorted = hcat(avgNumOfWinsSorted, nameNumWins)

avgNumOfWinsSorted = DataFrame(avgNumOfWinsSorted)

rename!(avgNumOfWinsSorted, Symbol("x1")=>Symbol("Win %"))
rename!(avgNumOfWinsSorted, Symbol("x2")=>Symbol("Rank"))
rename!(avgNumOfWinsSorted, Symbol("x3")=>Symbol("Name"))

Here I've plotted the PvPoke ranking compared to the average number of wins. I think its not necessarily a surprise that there's not a perfect correlation here (you don't put in the top 6 ranked mons generally), but there is a pretty clear negative slope here. The big spike I think is Lickilicky.

In [None]:
histogram(avgNumOfWins[:, 1], bins = 20, label = "Average Number of Wins")

In [None]:
plot(1:numMons, avgNumOfWins[:, 1], label = "Average Win %")

## Average Score

Here we've got the information related to the average. Below is a histogram of the number of teams vs their average, a sorted list that shows the best and worst teams by average score, and a histogram comparing the scores of the best and worst teams by average score. Not that there are minor differences between the average score statistics and the number of wins. I generally consider number of wins to be a more useful statistic (as it doesn't necessarily matter to me how much I win by as long as I win), but I could see arguments for using this particular statistic instead.

In [None]:
histogram(summaryStats[:, 2], label = "Average Score", color = :blue, alpha =0.5)
vline!([50], label = "W/L Cutoff", color = :red)

In [None]:
sumStatsAvgScore = sortslices(summaryStats, by=x->x[2], dims = 1, rev = true)

names1 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names1[i, 1] = rankings[Int(sumStatsAvgScore[i, 6]),1]
end

names2 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names2[i, 1] = rankings[Int(sumStatsAvgScore[i, 7]),1]
end

names3 = Array{String}(undef, numTeams, 1)
for i = 1:numTeams
    names3[i, 1] = rankings[Int(sumStatsAvgScore[i, 8]),1]
end

teamNumbers = zeros(numTeams)
for i = 1:numTeams
    teamNumbers[i] = Int(teamNumber[Int(sumStatsAvgScore[i, 6]), Int(sumStatsAvgScore[i, 7]), Int(sumStatsAvgScore[i, 8])])
end

sumStatsAvgScore = hcat(sumStatsAvgScore, names1)
sumStatsAvgScore = hcat(sumStatsAvgScore, names2)
sumStatsAvgScore = hcat(sumStatsAvgScore, names3)
sumStatsAvgScore = hcat(sumStatsAvgScore, teamNumbers)

sumStatsAvgScore = DataFrame(sumStatsAvgScore)

rename!(sumStatsAvgScore, Symbol("x1")=>Symbol("Win %"))
rename!(sumStatsAvgScore, Symbol("x2")=>Symbol("Avg Score"))
select!(sumStatsAvgScore, Not(:x3))
select!(sumStatsAvgScore, Not(:x4))
select!(sumStatsAvgScore, Not(:x5))
select!(sumStatsAvgScore, Not(:x6))
select!(sumStatsAvgScore, Not(:x7))
select!(sumStatsAvgScore, Not(:x8))
rename!(sumStatsAvgScore, Symbol("x9")=>Symbol("Mon 1"))
rename!(sumStatsAvgScore, Symbol("x10")=>Symbol("Mon 2"))
rename!(sumStatsAvgScore, Symbol("x11")=>Symbol("Mon 3"))
rename!(sumStatsAvgScore, Symbol("x12")=>Symbol("Team Number"));

sumStatsAvgScore

In [None]:
avgScore = zeros(numMons, 2)
avgScore = zeros(numMons, 2)
for i = 1:numTeams
    avgScore[m[i][1], 1] += summaryStats[i, 2]
    avgScore[m[i][2], 1] += summaryStats[i, 2]
    avgScore[m[i][3], 1] += summaryStats[i, 2]
end
for i = 1:numMons
    avgScore[i, 2] = i
    d = x -> (i in x)
    divisor = count(d, m)
    avgScore[i, 1] /= divisor
end

avgScoreSorted = sortslices(avgScore, by=x->x[1], dims = 1, rev = true)

nameAvgScore = Array{String}(undef, numMons, 1)
for i = 1:numMons
    nameAvgScore[i, 1] = rankings[Int(avgScoreSorted[i, 2]), 1]
end

avgScoreSorted = hcat(avgScoreSorted, nameAvgScore)

avgScoreSorted = DataFrame(avgScoreSorted)

rename!(avgScoreSorted, Symbol("x1")=>Symbol("Avg Score"))
rename!(avgScoreSorted, Symbol("x2")=>Symbol("Rank"))
rename!(avgScoreSorted, Symbol("x3")=>Symbol("Name"))

Again, there isn't a strong correlation with the average score to the PvPoke ranking, but that is to be expected. Also, average score is strongly correlated with the number of wins, which is also to be expected. So this particular statistic is not the same as number of wins, or even leads to the same conclusion, but they are correlated.

In [None]:
plot(1:numMons, avgScore[:, 1], label = "Average Score")

In [None]:
plot(avgScore[:, 1], avgNumOfWins[:, 1], seriestype=:scatter, legend = false)