# Numpy use case:


# Pokemon Dataset

In this exercise you will be analysing a dataset about Pokemon.
<img src="figures/pokemon.png" alt="Pikatchu" align="right" width="200"/>

### Intro to Pokemon
Pokémon is a media franchise managed by The Pokémon Company, a Japanese consortium between Nintendo, Game Freak, and Creatures. While the franchise copyright is shared by all three companies, Nintendo is the sole owner of the trademark. The franchise was created by Satoshi Tajiri in 1995, and is centered on fictional creatures called "Pokémon", which humans, known as Pokémon Trainers, catch and train to battle each other for sport.

The name Pokémon is the romanized contraction of the Japanese brand Pocket Monsters. The term Pokémon, in addition to referring to the Pokémon franchise itself, also collectively refers to the 721 known fictional species that have made appearances in Pokémon media as of the release of the sixth generation titles Pokémon X and Y. "Pokémon" is identical in both the singular and plural, as is each individual species name; it is grammatically correct to say "one Pokémon" and "many Pokémon", as well as "one Pikachu" and "many Pikachu". (source: wikipedia.com)

### The Dataset
You can find the data in three files in the 'data' directory. This are the raw attributes that are used for calculating how much damage an attack will do in the games. This dataset is about the Pokemon games (NOT Pokemon cards or Pokemon Go).

1) 'pokemon_ids.npy' contains an int ndarray of the ID for each Pokemon

2) 'pokemon_names.npy' contains a string ndarray of the name of each Pokemon (Hint: to `load` this file, you need to set the named parameter `allow_pickle` to `True`, as strings are objects)

3) 'pokemon_stats.npy' contains a ndarray with 6 columns for each Pokemon

- HP: hit points, or health, defines how much damage a Pokemon can withstand before fainting
- Attack: the base modifier for normal attacks (eg. Scratch, Punch)
- Defense: the base damage resistance against normal attacks
- SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)
- SP Def: the base damage resistance against special attacks
- Speed: determines which Pokemon attacks first each round


### Exercises 1

* 1a) load the dataset into three numpy arrays 'ids', 'names' and 'stats'. Verfiy the datatypes of each array.

* 1b) how many rows do you expect the array to have? verify your assumption!

* 1c) inspect the first 10 rows and the last 10 rows - do you notice anything important? Find an explanation for your observation!


In [2]:
## ---------- SOLUTION 1a

In [3]:
import numpy as np

In [4]:
ids = np.load('data/pokemon_ids.npy')

In [5]:
names = np.load('data/pokemon_names.npy', allow_pickle=True)

In [6]:
stats = np.load('data/pokemon_stats.npy')

In [7]:
print(ids.dtype, names.dtype, stats.dtype)

int64 object int64


In [8]:
## ---------- SOLUTION 1b

In [9]:
# we expect 721 rows

In [10]:
print(len(names), len(stats), len(ids))

800 800 800


In [11]:
print(names.shape, stats.shape, ids.shape)

(800,) (800, 6) (800,)


In [12]:
# oops, we got 800 rows!!! this is strange!!!

In [13]:
## ---------- SOLUTION 1c

In [14]:
print(names[:10])
print(names[-10:])

['Bulbasaur' 'Ivysaur' 'Venusaur' 'VenusaurMega Venusaur' 'Charmander'
 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Squirtle']
['Noibat' 'Noivern' 'Xerneas' 'Yveltal' 'Zygarde50% Forme' 'Diancie'
 'DiancieMega Diancie' 'HoopaHoopa Confined' 'HoopaHoopa Unbound'
 'Volcanion']


In [15]:
print(ids[:10])
print(ids[-10:])

[1 2 3 3 4 5 6 6 6 7]
[714 715 716 717 718 719 719 720 720 721]


In [16]:
print(stats[:10])
print(stats[-10:])

[[ 45  49  49  65  65  45]
 [ 60  62  63  80  80  60]
 [ 80  82  83 100 100  80]
 [ 80 100 123 122 120  80]
 [ 39  52  43  60  50  65]
 [ 58  64  58  80  65  80]
 [ 78  84  78 109  85 100]
 [ 78 130 111 130  85 100]
 [ 78 104  78 159 115 100]
 [ 44  48  65  50  64  43]]
[[ 40  30  35  45  40  55]
 [ 85  70  80  97  80 123]
 [126 131  95 131  98  99]
 [126 131  95 131  98  99]
 [108 100 121  81  95  95]
 [ 50 100 150 100 150  50]
 [ 50 160 110 160 110 110]
 [ 80 110  60 150 130  70]
 [ 80 160  60 170 130  80]
 [ 80 110 120 130  90  70]]


In [17]:
# to notice: the ID is not unique, there seem to be "similar" pokemon with the same ID 
# ("mega evolutions"). 
# We do have 721 IDs as expected, but 800 rows
# Doing some research, we find out, that some Pokemon can be temporarily changed to a different form 
# (for one fight) with a "mega stone" - after the fight, they revert to the original form. These temporary
# forms have the same ID.

### Exercises 2

* 2a) find the ids and names of all Pokemon with hitpoints above 150
* 2b) find the names of all Pokemon that have a higher Attack than Defense (the "attackers")

In [18]:
## ---------- SOLUTION 2a

In [19]:
hp = stats[:, 0] # create a 1-dim array of the hp values

In [20]:
large_hp_idx = (hp>150) # create a boolean 1-dim array of the hps above 150 for indexing

In [21]:
print(ids[large_hp_idx])
print(names[large_hp_idx])

[113 143 202 242 321 594]
['Chansey' 'Snorlax' 'Wobbuffet' 'Blissey' 'Wailord' 'Alomomola']


In [22]:
## ---------- SOLUTION 2b

In [23]:
# concise solution
attacker_idx = stats[:, 1] > stats[:, 2]

In [24]:
# easier to understand solution
attack = stats[:, 1]
defense = stats[:, 2]
attacker_idx = attack > defense

In [25]:
print(f'We have {attacker_idx.sum()} attackers, the first 20 are: {names[attacker_idx][:20]}')

We have 433 attackers, the first 20 are: ['Charmander' 'Charmeleon' 'Charizard' 'CharizardMega Charizard X'
 'CharizardMega Charizard Y' 'Weedle' 'Beedrill' 'BeedrillMega Beedrill'
 'Pidgey' 'Pidgeotto' 'Pidgeot' 'Rattata' 'Raticate' 'Spearow' 'Fearow'
 'Ekans' 'Arbok' 'Pikachu' 'Raichu' 'Nidoqueen']


### Exercises 3

* 3a) load the dataset

* 3b) print (in one line) the min, mean, max and median of the hitpoints

* 3c) compute a one-dim array containing the sum of Attack and Defense for each Pokemon

* 3d) compute a list 'all' containing the sum of all 6 stats for each Pokemon (using a list comprehension)

* 3e) use boolean indexing and your 'all' list to find the name of the Pokemon with the highest sum of all 6 stats

In [5]:
## ---------- SOLUTION 3b

In [6]:
hp = stats[:, 0]
print(f'HP: min={hp.min()}, mean={hp.mean()}, max={hp.max()}, median={np.median(hp)}')

HP: min=1, mean=69.25875, max=255, median=65.0


In [7]:
## ---------- SOLUTION 3c

In [8]:
# some possible solutions:
ad = np.add(stats[:, 1], stats[:,2])    # use the np.add function explicitly
ad = stats[:, 1] + stats[:,2]           # use the np.add function via infix notation
ad = stats[:,1:3].sum(1)[:10]           # use the sum function on the right axis
print(ad[:10])

[ 98 125 165 223  95 122 162 241 182 113]


In [9]:
## ---------- SOLUTION 3d

In [10]:
all_list = [np.sum(x) for x in stats]
print(all_list[:10])

[318, 405, 525, 625, 309, 405, 534, 634, 634, 314]


In [11]:
# higher performance using np instead of the comprehension (but we wanted to practise comprehensions!)
all_nparray=stats.sum(1)
print(all_nparray[:10])

[318 405 525 625 309 405 534 634 634 314]


In [12]:
## ---------- SOLUTION 3e

In [13]:
print(names[all_nparray == all_nparray.max()])

['MewtwoMega Mewtwo X' 'MewtwoMega Mewtwo Y' 'RayquazaMega Rayquaza']


### Exercises 4

* 4a) find the names of all Pokemon where the sum of defense and special defense is at least twice the sum of attack and special attack (let's call them the 'strong defenders')

* 4b) create a 1-dim array 'att_or_def' stating for each Pokemon if it is a 'Defender' or an 'Attacker' depending on the condition from the previous exercise, using the 'where' method.

* 4c) create a dict where the key are the names of the Pokemon and the value is 'Defender' or 'Attacker' (hint: use a 'dict' comprehension and the *zip* function)

In [14]:
## ---------- SOLUTION 4a

In [15]:
# using the numpy sum function (fastest)
strong_defenders = np.sum(stats[:,(2,4)], axis=1) >= 2*(np.sum(stats[:,(1,3)], axis=(1)))
# using the Python sum function (slower)
strong_defenders = stats[:,(2,4)].sum(1) >= 2*(stats[:,(1,3)].sum(1))
# using addition instead of sum
strong_defenders = stats[:,2]+stats[:,4] >= 2*(stats[:,1]+stats[:,3])

print(names[strong_defenders])

['Onix' 'Chansey' 'Magikarp' 'Togepi' 'Marill' 'Shuckle' 'Smeargle'
 'Azurill' 'Nosepass' 'Feebas' 'Duskull' 'Dusclops' 'Wynaut' 'Regirock'
 'Regice' 'Registeel' 'DeoxysDefense Forme' 'Shieldon' 'Bastiodon'
 'Bronzor' 'Happiny' 'Mantyke' 'Probopass' 'Ferroseed'
 'AegislashShield Forme' 'Carbink']


In [16]:
## ---------- SOLUTION 4b

In [17]:
att_or_def = np.where(strong_defenders, 'Defender', 'Attacker')

In [18]:
## ---------- SOLUTION 4c

In [19]:
ad_dict = {x:y for x, y in zip(names, att_or_def)}       # with a dict comprehension
ad_dict = dict(zip(names, att_or_def))                   # with dict/zip

In [20]:
print(list(ad_dict.items())[:10])                        # show 10 entries of the dict

[('Bulbasaur', 'Attacker'), ('Ivysaur', 'Attacker'), ('Venusaur', 'Attacker'), ('VenusaurMega Venusaur', 'Attacker'), ('Charmander', 'Attacker'), ('Charmeleon', 'Attacker'), ('Charizard', 'Attacker'), ('CharizardMega Charizard X', 'Attacker'), ('CharizardMega Charizard Y', 'Attacker'), ('Squirtle', 'Attacker')]


### Exercises 5

At the start of a new game, each player get's random set of Pokemon to start with. For each Pokemon, its chance is 1.5% to be in this list.

* 5a) How many Pokemon do you expect in this starting list?
* 5b) Generate such a 'my_pokemon' set (make sure you think about and pick a suitable data type!) containing the names of the Pokemon. How many does your set contain?  
Hint: **np.random.uniform** may come in handy.
* 5c) Compute the total hit points your starter Pokemon can sustain (i.e. the sum of the hitpoints of your starter Pokemon)

In [21]:
## ---------- SOLUTION 5a

In [22]:
# This sounds easy, and it is - but note that we can only have the non-stone-evolved forms in our
# list, as the stone-evolved form is temporary, so the correct solution is not 0.015 * len(names) 
# but rather
0.015 * len(np.unique(ids))

10.815

In [23]:
## ---------- SOLUTION 5b

In [24]:
# seed the random number generator to make the following cells reproducible
np.random.seed(1)

In [25]:
# we will be using a numpy array, because it is powerful and works well. we could use a regular python list as well.
# there is also a class 'set' which could be used
starter_pokemon_b = np.random.uniform(size=(len(names)))<=0.015

In [26]:
# show the indizes of all potentially chosen starter pokemon
idx = [idx for idx in np.arange(0,len(ids))[starter_pokemon_b]]
print(idx)

[2, 98, 149, 196, 250, 441, 443, 487, 538, 545, 563, 569, 677, 690, 739, 744, 765]


In [27]:
print(names[starter_pokemon_b])

['Venusaur' 'Cloyster' 'Omanyte' 'AmpharosMega Ampharos' 'Phanpy' 'Starly'
 'Staraptor' 'Mime Jr.' 'Mesprit' 'GiratinaOrigin Forme' 'Patrat'
 'Liepard' 'Shelmet' 'Vullaby' 'Florges' 'Furfrou' 'Heliolisk']


In [28]:
# filter out all "form" pokemon
starter_pokemon_idx = [idx for idx in np.arange(0,len(ids))[starter_pokemon_b] if idx==1 or ids[idx-1]!=ids[idx]]
starter_pokemon = names[starter_pokemon_idx]
print(starter_pokemon)

['Venusaur' 'Cloyster' 'Omanyte' 'Phanpy' 'Starly' 'Staraptor' 'Mime Jr.'
 'Mesprit' 'Patrat' 'Liepard' 'Shelmet' 'Vullaby' 'Florges' 'Furfrou'
 'Heliolisk']


In [29]:
len(starter_pokemon)

15

In [30]:
## ---------- SOLUTION 5c

In [31]:
# compute the combined health
ch = np.sum(stats[:,0][starter_pokemon_idx])
# Alternatively, we can use the names to compute the boolean indexing array 
ch = np.sum(stats[:,0][np.array([(x in starter_pokemon) for x in names])])
# Or, even shorter, you do not actually need to convert the boolean array to an np.array
#  (the line above may be easier to understand, though)
ch = np.sum(stats[:,0][[(x in starter_pokemon) for x in names]])

print(ch)

924


---

In [32]:
# alternative SOLUTION 5b & 5c using return_index parameter of np.unique

In [33]:
uix = np.unique(ids, return_index=True)[1]

In [34]:
# compute potential starter (ps) ids, names, stats
ps_ids = ids[uix] # this is identical to np.unique(ids, return_index=True)[0] = arange(1, 722)
ps_names = names[uix]
ps_stats = stats[uix]

In [35]:
# init random number generator to make result deterministic
np.random.seed(42)
# computer starter ids
starter_pokemon_b = np.random.uniform(size=(len(ps_ids)))<=0.015

In [36]:
print(ps_names[starter_pokemon_b])

['Tentacruel' 'Magikarp' 'Dunsparce' 'Snubbull' 'Swablu' 'Cranidos'
 'Lumineon' 'Glaceon' 'Tirtouga']


In [37]:
np.sum(ps_stats[:,0][starter_pokemon_b])

560

---