[👈 Chapter 19](19-pandas-filtering-sorting.ipynb) -
[🏠 To index](README.md) -
[👉 Chapter 21](21-pandas-plots.ipynb)

# 20 - Pandas: advanced ways to transform data with `apply()`

In [2]:
import pandas as pd
df = pd.read_csv("svb-names-2014.csv")

In [3]:
# We've already seen many ways to transform or extract data, 
# like the string operations in the previous chapter. 
# But what if we want to do something more complex that isn't available in Pandas by default?
# For example, let's (this is a pretty contrived example) we want to remove all vowels from the names 
# in our dataframe. Usually we would use something like a for loop for that
names_without_vowels = []
VOWELS = ["a", "e", "o", "i", "u", "y"] # Remember that constants are capitalized by convention

for name in df["name"].values:
    for vowel in VOWELS:
        name = name.lower().replace(vowel, "") # Remember that we also need to check for uppercase vowels!
        
    names_without_vowels.append(name.capitalize()) # And re-capitalize again
    
df["name_without_vowels"] = names_without_vowels # When assigning a list to a new column, it becomes a series
df.head()

Unnamed: 0,name,number,gender,name_without_vowels
0,Sophie,836,1,Sph
1,Daan,751,0,Dn
2,Emma,728,1,Mm
3,Bram,727,0,Brm
4,Milan,700,0,Mln


In [4]:
# For example, let's (this is a pretty contrived example) we want to remove all vowels from the names 
# in our dataframe. Usually we would use something like a for loop for that.
names_without_vowels = []
VOWELS = ["a", "e", "o", "i", "u", "y"] # Remember that constants are capitalized by convention

for name in df["name"].values:
    for vowel in VOWELS:
        name = name.lower().replace(vowel, "") # Remember that we also need to check for uppercase vowels!
        
    names_without_vowels.append(name.capitalize()) # And re-capitalize again
    
df["name_without_vowels"] = names_without_vowels # When assigning a list to a new column, it becomes a series
df.head()

Unnamed: 0,name,number,gender,name_without_vowels
0,Sophie,836,1,Sph
1,Daan,751,0,Dn
2,Emma,728,1,Mm
3,Bram,727,0,Brm
4,Milan,700,0,Mln


In [5]:
# Pandas makes this a lot easier by using the apply() method in combination 
# with a user defined function. In many languages this is called 'mapping': you
# apply a function to all the values in a list

# Let's rewrite the above example using apply(), first we define the vowels again
VOWELS = ["a", "e", "o", "i", "u", "y"]

# Then we write a function, Note that we accept one argument (the name)
# and return the name
def devowelize(name):
    for vowel in VOWELS:
        name = name.lower().replace(vowel, "")
        
    return name

# Then we can use the apply method, we give it the function, apply()
# will now run this function on all names and create a new Series, 
# which we can then apply to the column
df["name_without_vowels"] = df["name"].apply(devowelize)
df.head()

Unnamed: 0,name,number,gender,name_without_vowels
0,Sophie,836,1,sph
1,Daan,751,0,dn
2,Emma,728,1,mm
3,Bram,727,0,brm
4,Milan,700,0,mln


[👈 Chapter 19](19-pandas-filtering-sorting.ipynb) -
[🏠 To index](README.md) -
[👉 Chapter 21](21-pandas-plots.ipynb)