# Using Select Dtypes

<span>Once you get used to using pandas, filtering dataframe for content quickly and efficiently can be a huge asset. Pandas introduced a new feature ".select_dtypes" function that allows you to filter dataframe quickly by dtype values. You can really get create and use this function to operate over you dataframe in more flexible ways, with this code. Check out some of the examples below.</span>

### Import Preliminaies

In [18]:
# Import modules
import pandas as pd

# Import video games data
df = pd.read_csv('Data/Video Games Sales/vgsales.csv')

# View the schema of the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
Rank            16598 non-null int64
Name            16598 non-null object
Platform        16598 non-null object
Year            16327 non-null float64
Genre           16598 non-null object
Publisher       16540 non-null object
NA_Sales        16598 non-null float64
EU_Sales        16598 non-null float64
JP_Sales        16598 non-null float64
Other_Sales     16598 non-null float64
Global_Sales    16598 non-null float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


In [19]:
# View the dataframe
df.head(5)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


### Using the Select DTypes Function

In [20]:
# Select the object features in your dataframe
df.select_dtypes(include=['object']).head(5)

Unnamed: 0,Name,Platform,Genre,Publisher
0,Wii Sports,Wii,Sports,Nintendo
1,Super Mario Bros.,NES,Platform,Nintendo
2,Mario Kart Wii,Wii,Racing,Nintendo
3,Wii Sports Resort,Wii,Sports,Nintendo
4,Pokemon Red/Pokemon Blue,GB,Role-Playing,Nintendo


In [21]:
# Select all the non features that are not object or int features in the
# dataframe
df.select_dtypes(exclude=['object','int']).head(5)

Unnamed: 0,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,2006.0,41.49,29.02,3.77,8.46,82.74
1,1985.0,29.08,3.58,6.81,0.77,40.24
2,2008.0,15.85,12.88,3.79,3.31,35.82
3,2009.0,15.75,11.01,3.28,2.96,33.0
4,1996.0,11.27,8.89,10.22,1.0,31.37


In [22]:
# View all the non float feature in the dataframe
df.select_dtypes(exclude=['float']).head(5)

Unnamed: 0,Rank,Name,Platform,Genre,Publisher
0,1,Wii Sports,Wii,Sports,Nintendo
1,2,Super Mario Bros.,NES,Platform,Nintendo
2,3,Mario Kart Wii,Wii,Racing,Nintendo
3,4,Wii Sports Resort,Wii,Sports,Nintendo
4,5,Pokemon Red/Pokemon Blue,GB,Role-Playing,Nintendo


### Use Case: Describing Only Numeric Features

In [23]:
# Filtering for flaot and int feature before viewing the data's summary stastics
df.select_dtypes(include=['float','int']).describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74


### Use Case: Encoding Objects Values

In [24]:
# View on the object features
df.select_dtypes(include=['object']).head(5)

Unnamed: 0,Name,Platform,Genre,Publisher
0,Wii Sports,Wii,Sports,Nintendo
1,Super Mario Bros.,NES,Platform,Nintendo
2,Mario Kart Wii,Wii,Racing,Nintendo
3,Wii Sports Resort,Wii,Sports,Nintendo
4,Pokemon Red/Pokemon Blue,GB,Role-Playing,Nintendo


In [26]:
# Encoding all the object features in our dataframe
encoded_feature = df.select_dtypes(include=['object']).columns

# Label encode categorical features
for col in df.select_dtypes(include=['object']):
    df[col] = df[col].astype('category').cat.codes

# View dataframe head
df[encoded_feature].head(5)

Unnamed: 0,Name,Platform,Genre,Publisher
0,11007,26,10,359
1,9327,11,4,359
2,5573,26,6,359
3,11009,26,10,359
4,7346,5,7,359


In [27]:
# View the Schema of the dataframe again
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
Rank            16598 non-null int64
Name            16598 non-null int16
Platform        16598 non-null int8
Year            16327 non-null float64
Genre           16598 non-null int8
Publisher       16598 non-null int16
NA_Sales        16598 non-null float64
EU_Sales        16598 non-null float64
JP_Sales        16598 non-null float64
Other_Sales     16598 non-null float64
Global_Sales    16598 non-null float64
dtypes: float64(6), int16(2), int64(1), int8(2)
memory usage: 1005.0 KB


Author: Kavi Sekhon