# Demo 2.3  ***query()***:  Filtering a *pandas* Dataframe         

 
- **Demonstrates**:  
  - Filtering with  ***query()*** 
    - Using  ***in*** 
  - Checking Data Types with ***dtypes***  
  

- Create and Fill dataframes to answer the following questions:  
  - Q1: How many medals did the US win?  
    - ***query("Country == 'United States (USA)'")***    
    - Other Possibilities:  !=   
  - Q2: How many countries won more than 10 Gold medals? 
    - ***query("Gold > 10")***  
    - Other Possibilities:  >=, <, <=  
  - Q3: How did the US, Canada and Mexico compare in medals?    
    - ***query("Country in @selected_countries")***  

- Data file:  **Olympics.csv**  

In [1]:
import pandas as pd

### Read the datafile File into a *pandas* Dataframe  

In [2]:
df = pd.read_csv("Data/Olympics.csv")

print(df.shape)
df.head()

(87, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42


# Q1: How many medals did the US win? 
- When you're doing a query on a string, *unique()* is very handy to get the exact spelling of the string!  

In [3]:
df['Country'].unique()

array(['United States (USA)', 'Great Britain (GBR)', 'China (CHN)',
       'Russia (RUS)', 'Germany (GER)', 'Japan (JPN)', 'France (FRA)',
       'South Korea (KOR)', 'Italy (ITA)', 'Australia (AUS)',
       'Netherlands (NED)', 'Hungary (HUN)', 'Brazil (BRA)*',
       'Spain (ESP)', 'Kenya (KEN)', 'Jamaica (JAM)', 'Croatia (CRO)',
       'Cuba (CUB)', 'New Zealand (NZL)', 'Canada (CAN)',
       'Uzbekistan (UZB)', 'Kazakhstan (KAZ)', 'Colombia (COL)',
       'Switzerland (SUI)', 'Iran (IRI)', 'Greece (GRE)',
       'Argentina (ARG)', 'Denmark (DEN)', 'Sweden (SWE)',
       'South Africa (RSA)', 'Ukraine (UKR)', 'Serbia (SRB)',
       'Poland (POL)', 'North Korea (PRK)', 'Belgium (BEL)',
       'Thailand (THA)', 'Slovakia (SVK)', 'Georgia (GEO)',
       'Azerbaijan (AZE)', 'Belarus (BLR)', 'Turkey (TUR)',
       'Armenia (ARM)', 'Czech Republic (CZE)', 'Ethiopia (ETH)',
       'Slovenia (SLO)', 'Indonesia (INA)', 'Romania (ROU)',
       'Bahrain (BRN)', 'Vietnam (VIE)', 'Chinese Taipei

In [4]:
df_Q1 = df.query("Country == 'United States (USA)'")

print(df_Q1.shape)
df_Q1.head()

(1, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121


# Q2: How many countries won more than 10 Gold medals?

In [5]:
# Check data type for Gold to make sure it's numeric
df.dtypes

Rank        int64
Country    object
Gold        int64
Silver      int64
Bronze      int64
Total       int64
dtype: object

In [6]:
df_Q2 = df.query("Gold > 10")

print(df_Q2.shape)
df_Q2

(6, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
1,2,Great Britain (GBR),27,23,17,67
2,3,China (CHN),26,18,26,70
3,4,Russia (RUS),19,17,19,55
4,5,Germany (GER),17,10,15,42
5,6,Japan (JPN),12,8,21,41


# Q3: How did the US, Canada and Mexico compare in medals?

In [7]:
df['Country'].unique()

array(['United States (USA)', 'Great Britain (GBR)', 'China (CHN)',
       'Russia (RUS)', 'Germany (GER)', 'Japan (JPN)', 'France (FRA)',
       'South Korea (KOR)', 'Italy (ITA)', 'Australia (AUS)',
       'Netherlands (NED)', 'Hungary (HUN)', 'Brazil (BRA)*',
       'Spain (ESP)', 'Kenya (KEN)', 'Jamaica (JAM)', 'Croatia (CRO)',
       'Cuba (CUB)', 'New Zealand (NZL)', 'Canada (CAN)',
       'Uzbekistan (UZB)', 'Kazakhstan (KAZ)', 'Colombia (COL)',
       'Switzerland (SUI)', 'Iran (IRI)', 'Greece (GRE)',
       'Argentina (ARG)', 'Denmark (DEN)', 'Sweden (SWE)',
       'South Africa (RSA)', 'Ukraine (UKR)', 'Serbia (SRB)',
       'Poland (POL)', 'North Korea (PRK)', 'Belgium (BEL)',
       'Thailand (THA)', 'Slovakia (SVK)', 'Georgia (GEO)',
       'Azerbaijan (AZE)', 'Belarus (BLR)', 'Turkey (TUR)',
       'Armenia (ARM)', 'Czech Republic (CZE)', 'Ethiopia (ETH)',
       'Slovenia (SLO)', 'Indonesia (INA)', 'Romania (ROU)',
       'Bahrain (BRN)', 'Vietnam (VIE)', 'Chinese Taipei

In [8]:
# Create a List of the Countries we want to filter by
selected_countries = ['United States (USA)', 'Canada (CAN)', 'Mexico (MEX)' ]

In [9]:
df_Q3 = df.query("Country in @selected_countries")

print(df_Q3.shape)
df_Q3

(3, 6)


Unnamed: 0,Rank,Country,Gold,Silver,Bronze,Total
0,1,United States (USA),46,37,38,121
19,20,Canada (CAN),4,3,15,22
60,61,Mexico (MEX),0,3,2,5
