### Using DataFrame's `.isin()` and `.query()` methods.

**Expected Time: 30 Minutes**

**Total Points: 10**

This assignment focuses on the use of two pandas DataFrame methods -- `.isin()` and `query()`.  These methods are used to select rows and columns of a dataframe based on conditions.  To explore their use, we create a simple DataFrame with information on basketball players in a game, the number of points they score and number of minutes they play.

## Index:

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)


In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

In [3]:
df

Unnamed: 0,minutes,points,team
drose,30,13,knicks
lebron,35,21,lakers
kemba,40,50,knicks


[Back to top](#Index:) 

### Problem 1

#### Passing a list of values

**2 Points**

The DataFrame `df` has a method `.isin(values)` that will return a boolean based on the *values* argument which should be an iterable (lists, tuples, sets, series, dict).  Examine the result of passing the list `[30, 21]` to the `.isin()` function.  Save the resulting DataFrame to `ans_1` below.

In [4]:
### GRADED

ans_1 = ''

### BEGIN SOLUTION
ans_1 = df.isin([30, 21])
### END SOLUTION

# Answer check
print(type(ans_1))
ans_1

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,True,False,False
lebron,False,True,False
kemba,False,False,False


In [5]:
### BEGIN HIDDEN TESTS
df_ = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

ans_1_ = df.isin([30, 21])



pd.testing.assert_frame_equal(ans_1, ans_1_)
### END HIDDEN TESTS

[Back to top](#Index:) 

### Problem 2

#### Passing a dictionary of values

**2 Points**

To pass a dictionary of values to the method, specify the column and values of interest.  For example

```python
df.isin({'team': ['knicks']})
```

Create a dictionary that returns `True` for players who scored 13 points or played for 35 minutes. Assign the dictionary to `values` and the result of the query as a dataframe to `ans_2` below.

In [6]:
### GRADED

values = ''
ans_2 = ''

### BEGIN SOLUTION
values = {'points': [13], 'minutes': [35]}
ans_2 = df.isin(values)
### END SOLUTION

# Answer check
print(type(ans_2))
ans_2

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,False,True,False
lebron,True,False,False
kemba,False,False,False


In [7]:
### BEGIN HIDDEN TESTS
df_ = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

values_ = {'points': [13], 'minutes': [35]}
ans_2_ = df_.isin(values)
#
#
#
assert values_ == values
pd.testing.assert_frame_equal(ans_2_, ans_2)
### END HIDDEN TESTS

[Back to top](#Index:) 

### Problem 3

#### Using the `.query()` method

**2 Points**

The DataFrame also has a method for querying its columns with a boolean expression.  A resulting DataFrame where boolean expression evaluates to True is returned.  To return the players who scored more than 20 points pass a string expression `points > 20`. 

In [8]:
df.query('points > 20')

Unnamed: 0,minutes,points,team
lebron,35,21,lakers
kemba,40,50,knicks


Use the `.query()` method to select all players who played more than 30 minutes and assign the resulting DataFrame to `ans_3` below.

In [9]:
### GRADED

ans_3 = ''

### BEGIN SOLUTION
ans_3 = df.query('minutes > 30')
### END SOLUTION

# Answer check
print(type(ans_3))
ans_3

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
lebron,35,21,lakers
kemba,40,50,knicks


In [10]:
### BEGIN HIDDEN TESTS
df_ = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

ans_3_ = df_.query('minutes > 30')
#
#
#
pd.testing.assert_frame_equal(ans_3_, ans_3)
### END HIDDEN TESTS

[Back to top](#Index:) 

### Problem 4

#### Passing multiple conditions to the `.query()` method.

**2 Points**

To combine expressions the `.query` method joins arguments using `and` and `or` as a single string.

```python
df.query('column comparison to join column comparison to')
```

Use the `.query` method to create a DataFrame with players who played more than 30 minutes and scored more than 25 points.  Save your resulting DataFrame to `ans_4` below.

In [11]:
### GRADED

ans_4 = ''

### BEGIN SOLUTION
ans_4 = df.query('minutes > 30 and points > 25')
### END SOLUTION
print(type(ans_4))
ans_4

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
kemba,40,50,knicks


In [12]:
### BEGIN HIDDEN TESTS
df_ = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

ans_4_ = df_.query('minutes > 30 and points > 25')
#
#
#
pd.testing.assert_frame_equal(ans_4_, ans_4)
### END HIDDEN TESTS

[Back to top](#Index:) 

### Problem 5

#### Using a string method to query 

**2 Points**

To use object methods from the base python installations, use the name of the column, followed by the appropriate datatype constructor and subsequent method. When using a python method, the engine argument must be specified. For example, to select teams that contain the letter "k".

In [13]:
df.query('team.str.contains("k")',
        engine = "python")

Unnamed: 0,minutes,points,team
drose,30,13,knicks
lebron,35,21,lakers
kemba,40,50,knicks


Use a string method to select all rows where the team starts with the letter "k".  Assign your solution to `ans_5` below.

In [14]:
### GRADED

ans_5 = ''

### BEGIN SOLUTION
ans_5 = df.query('team.str.startswith("k")',
                engine = "python")
### END SOLUTION
print(type(ans_5))
ans_5

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,30,13,knicks
kemba,40,50,knicks


In [15]:
### BEGIN HIDDEN TESTS
df_ = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

ans_5_ = df_.query('team.str.startswith("k")',
                engine = "python")
#
#
#
pd.testing.assert_frame_equal(ans_5_, ans_5)
### END HIDDEN TESTS