# Encode Column With List of Strings

Sometimes several features are available as a list of strings in a single column. The challenge then is to clean the strings and convert them into dummy variables. In the following, I show you an example of how you can do this with pandas in just a few steps.

In [1]:
# Import pandas
import pandas as pd

In [2]:
# Read data from csv
listings = pd.read_csv('../resources/airbnb.csv')

In [3]:
# Show first five rows of listings
listings.amenities.head()

0    {Wifi,Kitchen,"Smoking allowed",Heating,Essent...
1    {TV,Internet,Wifi,Kitchen,"Smoking allowed",He...
2    {Internet,Wifi,"Smoking allowed",Heating,"Fami...
3    {Wifi,"Free parking on premises",Gym,Heating,W...
4    {TV,Internet,Wifi,Kitchen,"Free street parking...
Name: amenities, dtype: object

In [4]:
# Function to remove '{', '}' and '"' from each string
clean_string = lambda s: s.strip('{}').replace('"', '')

# Clean string and split by ',' to get dummies
amenities = listings.amenities.map(clean_string).str.get_dummies(',')

# Strip columns
amenities.columns = amenities.columns.str.strip()

# Order columns alphabetically
amenities.sort_index(axis=1, inplace=True)

In [5]:
# Show first five rows of amenities
amenities.head()

Unnamed: 0,24-hour check-in,Air conditioning,Air purifier,BBQ grill,Baby bath,Baby monitor,Babysitter recommendations,Baking sheet,Barbecue utensils,Bathtub,...,Washer/Dryer,Waterfront,Well-lit path to entrance,Wheelchair accessible,Wide entrance for guests,Wide hallways,Wifi,Window guards,translation missing: en.hosting_amenity_49,translation missing: en.hosting_amenity_50
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
