# Intro to Map & ApplyMap

**Author:** _Cameron Bronstein | DSI-SF_

Want to run a function on every value in a dataframe column? Tired of iterating through the values in a series and appending to a new list? **Try pandas maps!**
- works with named functions, lambda functions, dictionaries galore!

**`.map`**
- works on Pandas Series (one column at a time)

**`.applymap`** 
- can work on multiple columns (dataframe) simultaneously 

In [1]:
import pandas as pd
import numpy as np

Import the data

In [2]:
df = pd.read_csv('./data/pokedex_basic_corrupted.csv')

What's wrong here?

In [3]:
df.head()

Unnamed: 0,PokedexNumber,Name,Type,Total,HP,Attack,Defense,SpecialAttack,SpecialDefense,Speed
0,1,Bulbasaur,GrassPoison,318 points!,45 points!,49 points!,49 points!,65 points!,65 points!,45 points!
1,2,Ivysaur,GrassPoison,405 points!,60 points!,62 points!,63 points!,80 points!,80 points!,60 points!
2,3,Venusaur,GrassPoison,525 points!,80 points!,82 points!,83 points!,100 points!,100 points!,80 points!
3,3,VenusaurMega Venusaur,GrassPoison,625 points!,80 points!,100 points!,123 points!,122 points!,120 points!,80 points!
4,4,Charmander,Fire,309 points!,39 points!,52 points!,43 points!,60 points!,50 points!,65 points!


In [4]:
df.dtypes

PokedexNumber      int64
Name              object
Type              object
Total             object
HP                object
Attack            object
Defense           object
SpecialAttack     object
SpecialDefense    object
Speed             object
dtype: object

### Steps for mapping

- find an appropriate test case (maybe the first value in the series)
- write a named function or a lambda function that will transform the data to the appropriate output
    - confirm that it works!
- Map!


#### Fix the `Total` column. 
- Remove the `" points!"`
- convert to integer

In [5]:
test_case = df.loc[0, 'Total']

def fix_data(x):
    return int(x.replace(" points!", ""))

print(fix_data(test_case))
print(type(fix_data(test_case)))

318
<class 'int'>


- Map the function to the entire series!
- (remember to overwrite!)

In [6]:
df.loc[:, 'Total'] = df.loc[:, 'Total'].map(fix_data)

In [7]:
df.head()

Unnamed: 0,PokedexNumber,Name,Type,Total,HP,Attack,Defense,SpecialAttack,SpecialDefense,Speed
0,1,Bulbasaur,GrassPoison,318,45 points!,49 points!,49 points!,65 points!,65 points!,45 points!
1,2,Ivysaur,GrassPoison,405,60 points!,62 points!,63 points!,80 points!,80 points!,60 points!
2,3,Venusaur,GrassPoison,525,80 points!,82 points!,83 points!,100 points!,100 points!,80 points!
3,3,VenusaurMega Venusaur,GrassPoison,625,80 points!,100 points!,123 points!,122 points!,120 points!,80 points!
4,4,Charmander,Fire,309,39 points!,52 points!,43 points!,60 points!,50 points!,65 points!


Confirm by checking the datatypes of the columns!

In [8]:
df.dtypes

PokedexNumber      int64
Name              object
Type              object
Total              int64
HP                object
Attack            object
Defense           object
SpecialAttack     object
SpecialDefense    object
Speed             object
dtype: object

**We can also map with lambda function!**
- Let's try this with the `HP` column.

In [9]:
df.loc[:, 'HP'].map(lambda x: int(x.replace(" points!", ""))).head()

0    45
1    60
2    80
3    80
4    39
Name: HP, dtype: int64

In [10]:
# save inplace
df.loc[:, 'HP'] = df.loc[:, 'HP'].map(lambda x: int(x.replace(" points!", "")))

In [11]:
df.dtypes

PokedexNumber      int64
Name              object
Type              object
Total              int64
HP                 int64
Attack            object
Defense           object
SpecialAttack     object
SpecialDefense    object
Speed             object
dtype: object

#### We can use `.applymap()`  to do this on multiple columns at once!

In [12]:
df.loc[:, 'Attack' : 'Speed'].head()

Unnamed: 0,Attack,Defense,SpecialAttack,SpecialDefense,Speed
0,49 points!,49 points!,65 points!,65 points!,45 points!
1,62 points!,63 points!,80 points!,80 points!,60 points!
2,82 points!,83 points!,100 points!,100 points!,80 points!
3,100 points!,123 points!,122 points!,120 points!,80 points!
4,52 points!,43 points!,60 points!,50 points!,65 points!


In [13]:
df.loc[:, 'Attack' : 'Speed'] = df.loc[:, 'Attack' : 'Speed'].applymap(lambda x: int(x.replace(" points!", "")))
df.head()

Unnamed: 0,PokedexNumber,Name,Type,Total,HP,Attack,Defense,SpecialAttack,SpecialDefense,Speed
0,1,Bulbasaur,GrassPoison,318,45,49,49,65,65,45
1,2,Ivysaur,GrassPoison,405,60,62,63,80,80,60
2,3,Venusaur,GrassPoison,525,80,82,83,100,100,80
3,3,VenusaurMega Venusaur,GrassPoison,625,80,100,123,122,120,80
4,4,Charmander,Fire,309,39,52,43,60,50,65


### We can also map dictionaries!

In [14]:
# these are just some of our pokemon types, with their respective frequencies in the dataset
df['Type'].value_counts().head()

Normal     61
Water      59
Psychic    38
Grass      33
Fire       28
Name: Type, dtype: int64

In [15]:
types = df['Type'].value_counts().index
counts = df['Type'].value_counts().values

# create a dictionary where the types are the keys, and the frequencies are the values
test_dictionary = {poketype: count for poketype, count in zip(types, counts)}

In [16]:
# create a new column where the values are the frequencies of that type
df['Type Counts'] = df['Type'].map(test_dictionary)

In [17]:
df.head()

Unnamed: 0,PokedexNumber,Name,Type,Total,HP,Attack,Defense,SpecialAttack,SpecialDefense,Speed,Type Counts
0,1,Bulbasaur,GrassPoison,318,45,49,49,65,65,45,15
1,2,Ivysaur,GrassPoison,405,60,62,63,80,80,60,15
2,3,Venusaur,GrassPoison,525,80,82,83,100,100,80,15
3,3,VenusaurMega Venusaur,GrassPoison,625,80,100,123,122,120,80,15
4,4,Charmander,Fire,309,39,52,43,60,50,65,28
