# Pokemon Analysis - Some Questions and Answers

## This notebook answers some theoretical and emperical questions about pokemon typings

In [4]:
import pandas as pd
import numpy as np
from functools import reduce
from itertools import combinations

In [5]:
# These questions can be answered with two datasets:
pokemon = pd.read_csv('Pokemon.csv')
types = pd.read_csv('Types.csv')

In [6]:
# The pokemon dataframe contains all pokemons (first 7 generations) and their stats
pokemon.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [7]:
# The types dataframe contains type effectiveness information
types

Unnamed: 0,Attacking,Normal,Fire,Water,Electric,Grass,Ice,Fighting,Poison,Ground,Flying,Psychic,Bug,Rock,Ghost,Dragon,Dark,Steel,Fairy
0,Normal,1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.5,0.0,1.0,1.0,0.5,1.0
1,Fire,1,0.5,0.5,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,0.5,1.0,0.5,1.0,2.0,1.0
2,Water,1,2.0,0.5,1.0,0.5,1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,0.5,1.0,1.0,1.0
3,Electric,1,1.0,2.0,0.5,0.5,1.0,1.0,1.0,0.0,2.0,1.0,1.0,1.0,1.0,0.5,1.0,1.0,1.0
4,Grass,1,0.5,2.0,1.0,0.5,1.0,1.0,0.5,2.0,0.5,1.0,0.5,2.0,1.0,0.5,1.0,0.5,1.0
5,Ice,1,0.5,0.5,1.0,2.0,0.5,1.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,2.0,1.0,0.5,1.0
6,Fighting,2,1.0,1.0,1.0,1.0,2.0,1.0,0.5,1.0,0.5,0.5,0.5,2.0,0.0,1.0,2.0,2.0,0.5
7,Poison,1,1.0,1.0,1.0,2.0,1.0,1.0,0.5,0.5,1.0,1.0,1.0,0.5,0.5,1.0,1.0,0.0,2.0
8,Ground,1,2.0,1.0,2.0,0.5,1.0,1.0,2.0,1.0,0.0,1.0,0.5,2.0,1.0,1.0,1.0,2.0,1.0
9,Flying,1,1.0,1.0,0.5,2.0,1.0,2.0,1.0,1.0,1.0,1.0,2.0,0.5,1.0,1.0,1.0,0.5,1.0


# Question 1: How many unique type combinations are there?


Let $V$ be a one-dimensional vector containing all pokemon types. Then, all type combinations are uniquely defined by the upper (or lower) triangle of the outer product of this vector. 

$\text{all_combos} = V.V^T$

Note1: the diagonal of this matrix is all monotypes. there are 18 
<br>
Note2: the upper and lower triangle (discluding the diagonal) are symmetric with respect to combinations, but non-unique with respect to permutations. The permutations can represent primary and secondary typings of dual type pokemons. For example, all_combos[0,1] = NormalFire and all_combos[1,0] = FireNormal. Since I am not aware of primary/secondary typings having much impact, let us only care about unique combinations.


In [8]:
types_vector = np.asarray(types['Attacking'])
all_permutations = types_vector[:,None]+types_vector # outer product
all_combos = set(np.triu(all_permutations).flatten())
all_combos.remove(0) # some extra fluff from np.triu

In [9]:
print("Answer: There are exactly %d type combinations in Pokemon" % (len(all_combos)))

Answer: There are exactly 171 type combinations in Pokemon


Obviously, it was not necessary to compute all unique combinations explicitly, since the answer is a triangular number: 
<br>
$N * (N+1) / 2 = 18 * 19 / 2 = 171$

However, storing these combinations for later will help us answer some questiosn :)


# Question 2: Are all combinations available?

In [10]:
# this helper function will help us solve some problems
def typeCombinationSplitter(combo: str):
    """
        Args:
            combo (str) : A pokemon typing string from what is produced by 
            all_combos above. E.g. 'GroundWater', 'WaterWater'
        
        Returns:
            2-tuple : lists both 
    
    """
    # splits the combination typing into component types
    t = set([])
    for pokemon_type in types_vector:
        if pokemon_type in combo:
            t.add(pokemon_type)
    # note: we lexographically sort here, to aid in 
    # checking for uniqueness later on.
    return tuple(sorted(list(t)))

In [11]:
# We need to re-orient all_combos so that type1, type2 are lexographically ordered. 
# This could have also been achieved by sorting the types in the original types data table.
all_sorted_combos = set([])
for combo in all_combos:
    this_combo = reduce(lambda x,y : x+y, typeCombinationSplitter(combo))
    all_sorted_combos.add(this_combo)

In [12]:
# Now, we can gather the emperical combos for comparisons
all_emperical_combos = set([])
pokemon
for ind, row in pokemon.iterrows():
    type1, type2 = row["Type 1"], row["Type 2"]
    nonzero_types = [type1]
    if type2 != None and type(type2) == str: # otherwise it is NaN - monotype
        nonzero_types.append(type2)

    # sorting to avoid non-unique permuatations. 
    this_combo = reduce(lambda x,y : x+y, sorted(nonzero_types))
    all_emperical_combos.add(this_combo)

print("There are {} of {} theoretical typings".format(len(all_emperical_combos), len(all_sorted_combos)))
print("The following typings are currently unavailable")
for combo in all_sorted_combos - all_emperical_combos:
    print(combo)

# sanity check, we should have no emperical combos outside of the theoretical ones
assert (len(all_emperical_combos - all_sorted_combos) == 0)   
    

There are 133 of 171 theoretical typings
The following typings are currently unavailable
NormalRock
DarkFairy
ElectricFighting
NormalSteel
ElectricRock
BugPsychic
PoisonRock
IceNormal
GhostRock
NormalPoison
BugDragon
DarkNormal
BugIce
PoisonSteel
FairyGround
ElectricPsychic
DragonFighting
IceSteel
FightingIce
FightingGhost
DragonNormal
FairyFighting
FightingGround
IcePoison
FairyPoison
BugDark
ElectricPoison
BugFairy
BugNormal
GhostNormal
FairyFire
FairyIce
FireIce
DarkElectric
PoisonPsychic
FireGrass
FairyGhost
FirePoison


Interesting. There are still 38 dual types that do not yet exist :). 

# Question 3: Which types are most and least prevelant?

In [14]:
# To answer this question, we can repurpose the above code to use a counter:
all_emperical_combos = {}
pokemon
for ind, row in pokemon.iterrows():
    type1, type2 = row["Type 1"], row["Type 2"]
    nonzero_types = [type1]
    if type2 != None and type(type2) == str: # otherwise it is NaN - monotype
        nonzero_types.append(type2)

    # sorting to avoid non-unique permuatations. 
    this_combo = reduce(lambda x,y : x+y, sorted(nonzero_types))
    if this_combo in all_emperical_combos:
        all_emperical_combos[this_combo] += 1
    else:
        all_emperical_combos[this_combo] = 1

ranked_by_prevalance = [(combo, count) for combo, count in all_emperical_combos.items()]
ranked_by_prevalance.sort(key=lambda x : x[1], reverse=True)

print("\n")
print("The top 10 most prevalant pokemon types are:")
for combo, count in ranked_by_prevalance[:10]:
    print(combo, count)

print("\n")
print("The top 10 least prevalant pokemon types are:")
for combo, count in ranked_by_prevalance[-10:]:
    print(combo, count)
    



The top 10 most prevalant pokemon types are:
Normal 61
Water 59
Psychic 38
Grass 33
Fire 28
Electric 27
FlyingNormal 24
Fighting 20
Bug 17
GrassPoison 15


The top 10 least prevalant pokemon types are:
DragonSteel 1
FireSteel 1
ElectricGround 1
FightingRock 1
GroundNormal 1
DragonPoison 1
FightingFlying 1
ElectricFairy 1
GhostPsychic 1
FireWater 1


Cool! Some people would definitely had guessed that Water and Normal would be most prevalent. However I was surprised to see some dual types in the top 10.. Although it must be true that "FlyingNormal" blankets all regions xD. 

## Question 3 Bonus: How many pokemon are the *only* one of their type combo, and what are they?

This question is begging to be answered by the previous result. How many pokemon are lonely (only ones)?

In [15]:
# Gathering them all by name, it's easiest just to repurpose the above code (third time)
all_emperical_combos = {}
pokemon
for ind, row in pokemon.iterrows():
    type1, type2 = row["Type 1"], row["Type 2"]
    nonzero_types = [type1]
    if type2 != None and type(type2) == str: # otherwise it is NaN - monotype
        nonzero_types.append(type2)

    # sorting to avoid non-unique permuatations. 
    this_combo = reduce(lambda x,y : x+y, sorted(nonzero_types))
    if this_combo in all_emperical_combos:
        all_emperical_combos[this_combo].append(row["Name"])
    else:
        all_emperical_combos[this_combo] = [row['Name']]


ranked_by_prevalance = [(combo, len(val), val) for combo, val in all_emperical_combos.items()]
ranked_by_prevalance.sort(key=lambda x : x[1], reverse=True)
pb = 1
while(ranked_by_prevalance[-(pb+1)][1] == 1):
    pb += 1
    
print("\n")
print("There are {} pokemon who are the only ones of their type combo!. Here they Are:".format(pb))
for combo, count, these_pokemon in ranked_by_prevalance[-pb:]:
    print("{} - {}".format(combo, these_pokemon[0]))




There are 24 pokemon who are the only ones of their type combo!. Here they Are:
FireRock - Magcargo
FlyingSteel - Skarmory
DragonGrass - SceptileMega Sceptile
BugWater - Surskit
BugGhost - Shedinja
DragonFairy - AltariaMega Altaria
GrassGround - Torterra
SteelWater - Empoleon
NormalWater - Bibarel
GhostIce - Froslass
ElectricGhost - Rotom
ElectricFire - RotomHeat Rotom
ElectricIce - RotomFrost Rotom
ElectricGrass - RotomMow Rotom
DragonSteel - Dialga
FireSteel - Heatran
ElectricGround - Stunfisk
FightingRock - Terrakion
GroundNormal - Diggersby
DragonPoison - Dragalge
FightingFlying - Hawlucha
ElectricFairy - Dedenne
GhostPsychic - HoopaHoopa Confined
FireWater - Volcanion


# Question 4: Which 4-move movepool has the most super effective coverage?

In [16]:
# We need a helper function to compute type vulnerabilities
def typeComboVulnerabilities(combo):
    # computes the vector of type vulnerabilities for this 
    # type combinations    
    if(len(combo) == 1):
        vals = types[combo[0]].copy()
    else:
        # it's just broadcasted product
        vals = types[combo[0]] * types[combo[1]]
    vals.index = types['Attacking']
    return vals

In [17]:
# for example, we can evaluate the vulnerabilities of a BugFire type. E.g. Volcarona
type1 = typeCombinationSplitter("BugFire")
print(typeComboVulnerabilities(type1))

Attacking
Normal      1.00
Fire        1.00
Water       2.00
Electric    1.00
Grass       0.25
Ice         0.50
Fighting    0.50
Poison      1.00
Ground      1.00
Flying      2.00
Psychic     1.00
Bug         0.50
Rock        4.00
Ghost       1.00
Dragon      1.00
Dark        1.00
Steel       0.50
Fairy       0.50
dtype: float64


In [18]:
def typePoolEffectiveness(type_pool: list, target_pokemon : set):
    """
        Returns how many pokemons this type movepool would be
        super effective against
        
        Args: 
            type_pool (list) : list of types, indicating a pokes movepool
            target_pokemon (set) : pokemon in the arena (could be tier ? TODO count of all of them?)
    """
    count = 0
    for pokemon_typing in target_pokemon:
        combo = typeCombinationSplitter(pokemon_typing)
        vuln = typeComboVulnerabilities(combo)
        for movetype in type_pool:
            if vuln[movetype] >= 2: # not distinguishing between x2 and 4x here, both are super effective
                count += 1
                break
    return count

def getBestMovePoolsUsingNMoves(N, target_pokemon : set):
    ans = None
    best = 0
    for type_pool_combo in combinations(types['Attacking'], N):
        val = typePoolEffectiveness(type_pool_combo, target_pokemon)
        if val == best:
            ans.append(type_pool_combo)
        elif(val > best):
            best = val
            ans = [type_pool_combo]
    
    return best, ans
        

In [19]:
# typePoolEffectiveness will give us the number of theoretical types that 
# a given movepool will at least have one super effective move against.

# Here is a test of one of the well-known high coverage movepoools for
# Electivire: 
count = typePoolEffectiveness(['Electric', 'Ice', 'Fighting', 'Ground'], all_sorted_combos)
print("Effective against {} out of 171 theoretical typings ({}%)".format(count, count / float(len(all_sorted_combos)) * 100.))
count = typePoolEffectiveness(['Electric', 'Ice', 'Fighting', 'Ground'], all_emperical_combos)
print("Effective against {} out of 133 emperical typings ({}%)".format(count, count / float(len(all_emperical_combos)) * 100.))

Effective against 135 out of 171 theoretical typings (78.94736842105263%)
Effective against 109 out of 133 emperical typings (81.95488721804512%)


Cool! Electivire's movepool can hit 82% of all emperical typings for super effective! That is impressive. Note: as we saw above, the actual distritions of these type combos will fluctuate and naturally, the percentage of all actual pokemont that can be hit will be slightly different. Moreover, it may be interesting to examine the result of getBestMovePoolsUsingNMoves() on different tiers or universes, since this naturally comes up in both competitive and casual play.

Now, can we find a movepool that is better ?

In [37]:
count, ans = getBestMovePoolsUsingNMoves(4, all_sorted_combos)
print(count, ans)

150 [('Ice', 'Ground', 'Rock', 'Fairy')]


Nice! The movepool: ('Ice', 'Ground', 'Rock', 'Fairy'), hits more!

In [20]:
# Now, let's see what percentage it hits!
best4 = ['Ice', 'Ground', 'Rock', 'Fairy']
count = typePoolEffectiveness(best4, all_sorted_combos)
print("Effective against {} out of 171 theoretical typings ({}%)".format(count, count / float(len(all_sorted_combos)) * 100.))
count = typePoolEffectiveness(best4, all_emperical_combos)
print("Effective against {} out of 133 emperical typings ({}%)".format(count, count / float(len(all_emperical_combos)) * 100.))

Effective against 150 out of 171 theoretical typings (87.71929824561403%)
Effective against 114 out of 133 emperical typings (85.71428571428571%)


That is a nice improvement! That is definitely an improvement! A follow up investigation: can any pokemon incorporate these 4 types into a valid movepool ? This would require more data sets. Namely, we would need to know the movepool of each pokemon. Moreover, not all moves are createded equally, so this investigation could baloon in complexity once we leverage damage/accuracy/effects of the pertinent moves. 

# Question 5: Which pokemon has the most offensive dual type?

## Let's simplify this question to: Which dual type has the most super effective STAB coverage?
If you do not know: STAB = Same Type Additive Bonus = if a pokemon is using a move of its type, it will do 1.5x damage.

In [23]:
# Functionaly, this question is a subset of the one above:
count, ans = getBestMovePoolsUsingNMoves(2, all_sorted_combos)
print(count, ans)

109 [('Ice', 'Ground')]


Nice! A lot of people will not guess this. Maybe people would guess FireWater or ElectricIce. Interstingly, this dual typing is a subset of the 4-move movepool shown above. This must mean that ground and ice constitute a decent amount of coverage. Now, how many pokemon are of this type?

In [27]:
# For this, we can recyle our data structure from earlier:
all_emperical_combos["GroundIce"]

['Swinub', 'Piloswine', 'Mamoswine']

Wow! That is a single evolutionary line. Take a look at mamoswine's current smogon page:
https://www.smogon.com/dex/sm/pokemon/mamoswine/


So most STAB coverage + 130 attk + priority moves!? Is mamoswine OP?

# Question 6: Which pokemon has the most defensive type?

We can evaluate this question from a couple different angles. For example, some pokemon may have few weakness, while others may have a lot of invulnerabilities. Let's take a look at both:

In [59]:
# based on what is super effective against it
super_effective_counts = {}
for pokemon_typing in all_sorted_combos:
    combo = typeCombinationSplitter(pokemon_typing)
    vuln = typeComboVulnerabilities(combo)
    supp = [val for val in vuln if val >= 2]
    n = len(supp)
    if n in super_effective_counts:
        super_effective_counts[n].append(combo)
    else:
        super_effective_counts[n] = [combo]
        
sorted_defensive_types = [(key, val) for key, val in super_effective_counts.items()]
sorted_defensive_types.sort(key=lambda x:x[0])


print("Most defensive types: (with {} weaknesses)".format(sorted_defensive_types[0][0]))
for combo in sorted_defensive_types[0][1]:
    print(combo)
print("Least defensive types: (with {} weaknesses)".format(sorted_defensive_types[-1][0]))
for combo in sorted_defensive_types[-1][1]:
    print(combo)

Most defensive types: (with 1 weaknesses)
('Dark', 'Poison')
('Ground', 'Water')
('Electric',)
('Ghost', 'Normal')
('Dark', 'Ghost')
('Bug', 'Steel')
('Normal',)
Least defensive types: (with 7 weaknesses)
('Grass', 'Ice')
('Psychic', 'Rock')
('Grass', 'Psychic')
('Fighting', 'Rock')
('Dark', 'Grass')
('Dark', 'Rock')


Nice. One of the commonly touted defensive typings is BugSteel, which ends up here. Interestlingly, by default, Normal and Electric both end up with only one weakness (Fighting, Ground, resp.). It's also nice to see GroundWater show up; not only am I a huge swampert fan, but this combo makes so much since when you think the constituent types. Let's list the pokemons of these types:

In [60]:
print("Most defensive types: (with {} weaknesses)".format(sorted_defensive_types[0][0]))
for combo in sorted_defensive_types[0][1]:
    combo = reduce(lambda x,y:x+y, sorted(combo))
    try:
        print(combo, all_emperical_combos[combo])
    except:
        print("This combo: - {} - doesn't exist!".format(combo))
    print('')
print("Least defensive types: (with {} weaknesses)".format(sorted_defensive_types[-1][0]))
for combo in sorted_defensive_types[-1][1]:
    combo = reduce(lambda x,y:x+y, sorted(combo))
    try:
        print(combo, all_emperical_combos[combo])
    except:
        print("This combo: - {} - doesn't exist!".format(combo))
    print('')

Most defensive types: (with 1 weaknesses)
DarkPoison ['Stunky', 'Skuntank', 'Drapion']

GroundWater ['Wooper', 'Quagsire', 'Marshtomp', 'Swampert', 'SwampertMega Swampert', 'Barboach', 'Whiscash', 'Gastrodon', 'Palpitoad', 'Seismitoad']

Electric ['Pikachu', 'Raichu', 'Voltorb', 'Electrode', 'Electabuzz', 'Jolteon', 'Pichu', 'Mareep', 'Flaaffy', 'Ampharos', 'Elekid', 'Raikou', 'Electrike', 'Manectric', 'ManectricMega Manectric', 'Plusle', 'Minun', 'Shinx', 'Luxio', 'Luxray', 'Pachirisu', 'Electivire', 'Blitzle', 'Zebstrika', 'Tynamo', 'Eelektrik', 'Eelektross']

This combo: - GhostNormal - doesn't exist!

DarkGhost ['Sableye', 'SableyeMega Sableye', 'Spiritomb']

BugSteel ['Forretress', 'Scizor', 'ScizorMega Scizor', 'WormadamTrash Cloak', 'Escavalier', 'Durant', 'Genesect']

Normal ['Rattata', 'Raticate', 'Meowth', 'Persian', 'Lickitung', 'Chansey', 'Kangaskhan', 'KangaskhanMega Kangaskhan', 'Tauros', 'Ditto', 'Eevee', 'Porygon', 'Snorlax', 'Sentret', 'Furret', 'Aipom', 'Dunsparce', '

In [61]:
# Now, let's answer the same question based on the pokemon's invulnerabilities:
# based on what is super effective against it
super_effective_counts = {}
for pokemon_typing in all_sorted_combos:
    combo = typeCombinationSplitter(pokemon_typing)
    vuln = typeComboVulnerabilities(combo)
    no_effects = [val for val in vuln if val == 0]
    n = len(no_effects)
    if n in super_effective_counts:
        super_effective_counts[n].append(combo)
    else:
        super_effective_counts[n] = [combo]
        
sorted_defensive_types = [(key, val) for key, val in super_effective_counts.items()]
sorted_defensive_types.sort(key=lambda x:x[0], reverse=True)

print("Most defensive types: (with {} invulnerabilities)".format(sorted_defensive_types[0][0]))
for combo in sorted_defensive_types[0][1]:
    combo = reduce(lambda x,y:x+y, sorted(combo))
    try:
        print(combo, all_emperical_combos[combo])
    except:
        print("This combo: - {} - doesn't exist!".format(combo))
    print('')
    

Most defensive types: (with 3 invulnerabilities)
GhostGround ['Golett', 'Golurk']

This combo: - GhostNormal - doesn't exist!

FlyingGhost ['Drifloon', 'Drifblim']

DarkGhost ['Sableye', 'SableyeMega Sableye', 'Spiritomb']

GhostSteel ['Honedge', 'Doublade', 'AegislashBlade Forme', 'AegislashShield Forme']

This combo: - FairyGhost - doesn't exist!



Notice that all of these typings include Ghost, since it is the only type with two vulnerabilites (Fighting and Normal). 