### Self-Study Colab Activity 3.1: pandas Basics

**Expected Time: 30 Minutes**



This assignment focuses on the use of two pandas DataFrame methods -- `.isin()` and `query()`.  These methods are used to select rows and columns of a DataFrame based on conditions.  To explore their use, we create a simple DataFrame with information on basketball players in a game, the number of points they score, and the number of minutes they play
## Index:

- [Problem 1](#Problem-1)
- [Problem 2](#Problem-2)
- [Problem 3](#Problem-3)
- [Problem 4](#Problem-4)
- [Problem 5](#Problem-5)


In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({'minutes': [30, 35, 40],
                  'points': [13, 21, 50],
                  'team': ['knicks', 'lakers', 'knicks']},
                 index = ['drose', 'lebron', 'kemba'])

In [3]:
df

Unnamed: 0,minutes,points,team
drose,30,13,knicks
lebron,35,21,lakers
kemba,40,50,knicks


[Back to top](#Index:)

### Problem 1

#### Passing a list of values


The DataFrame `df` has a method `.isin(values)` that will return a boolean based on the *values* argument which should be an iterable (lists, tuples, sets, series, dict).  

Use the `.isin()` function on `df` with an argument equal to `[30, 21]` and save the resulting DataFrame to `ans_1` below.

In [7]:


ans_1 = df.isin([30, 21])
# ans_1 = df[df["minutes"].isin([30, 21])]


# Answer check
print(type(ans_1))
ans_1

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,True,False,False
lebron,False,True,False
kemba,False,False,False


[Back to top](#Index:)

### Problem 2

#### Passing a dictionary of values



To pass a dictionary of values to the method, specify the column and values of interest.  For example

```python
df.isin({'team': ['knicks']})
```

Create a dictionary with keys equal to `points` and `minutes`  Set the values equal to `[13]` for both keys. Assign this dictionary to the variable `values`.

Use the `.isin()` function on `df` with an argument equal to `values` and save the resulting DataFrame to `ans_2` below.

In [8]:


values = {'points': [13], 'minutes': [13]}
ans_2 = df.isin(values)


# Answer check
print(type(ans_2))
ans_2

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,False,True,False
lebron,False,False,False
kemba,False,False,False


[Back to top](#Index:)

### Problem 3

#### Using the `.query()` method



The `query()` method is used on a DataFrame to query its columns. This method returns a  boolean expression.

For example, to return the players who scored more than 20 points pass a string expression `points > 20` to the `query()` method:

In [9]:
df.query('points > 20')

Unnamed: 0,minutes,points,team
lebron,35,21,lakers
kemba,40,50,knicks


Use the `.query()` method on `df` to select all players who played more than 30 minutes.

Assign the resulting DataFrame to `ans_3` below.

In [11]:


ans_3 = df.query('minutes > 30')

# Answer check
print(type(ans_3))
ans_3

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
lebron,35,21,lakers
kemba,40,50,knicks


[Back to top](#Index:)

### Problem 4

#### Passing multiple conditions to the `.query()` method.



To combine expressions the `.query` method joins arguments using `and` and `or` as a single string.


For example, to return the players who scored more than 20 points and played less than 50 minutes pass a string expression `points > 20 and minutes < 50 ` to the `query()` method:


In [12]:
df.query('points > 20 and minutes < 50')

Unnamed: 0,minutes,points,team
lebron,35,21,lakers
kemba,40,50,knicks


Use the `.query` method on `df` to create a DataFrame with players who played more than 30 minutes and scored more than 25 points.  Save your resulting DataFrame to `ans_4` below.

In [14]:


ans_4 = df.query('minutes > 30 and points > 25')


print(type(ans_4))
ans_4

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
kemba,40,50,knicks


[Back to top](#Index:)

### Problem 5

#### Using a string method to query



To use object methods from the base python installations, use the name of the column, followed by the appropriate datatype constructor and subsequent method.

When using a Python method, the `engine` argument must be specified.The code below shows how to select teams that contain the letter `k`:

In [16]:
df.query('team.str.contains("k")', engine = "python")

Unnamed: 0,minutes,points,team
drose,30,13,knicks
lebron,35,21,lakers
kemba,40,50,knicks


Use the string method `startswith` to select all rows where the team starts with the letter `k`.  Assign your solution to `ans_5` below.

In [17]:


ans_5 = df.query('team.str.startswith("k")', engine = "python")


print(type(ans_5))
ans_5

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,minutes,points,team
drose,30,13,knicks
kemba,40,50,knicks
