<img src='images/pandas.png' width='300px' align=left>
<img src='images/gdd-logo.png' width='200px' align='right' style="padding: 15px">



# Frequently asked questions about Pandas

## Is there a `is in` method when filtering?

If you've worked in `SQL` you'll know bout the `IS IN` keywords that allow you to filter rows where a certain column only has specific values. There is something very similar to that in pandas:

In [None]:
import pandas as pd

chickweight = (
    pd.read_csv('data/chickweight.csv')
    .rename(columns=str.lower)
)

chickweight.head()

Imagine there was a task that required to only analyse days `0`, `4` and `12` (column: `time`). This data could be filtered using an or (`|`):

In [None]:
(
    chickweight
    .loc[lambda df: (df['time'] == 0) 
         | (df['time'] == 4)
         | (df['time'] == 12)
        ]
)

But this is quite cumbersome, and imagine if the goal was to retrieve the rows where time is 0, 4, 6, 8, 10, 12, 16, 18, 21. Too many rows!

***So what is the alternative?***

This can be written as one line, using the method `.isin()`. 

In [None]:
(
    chickweight
    .loc[lambda df: df['time'].isin( [0, 4, 12] )]
)

**Note** that the values need to be passed into the `.isin()` method as a list, for example the following would not work:

In [None]:
# (
#     chickweight
#     .loc[lambda df: df['time'].isin( 0, 4, 12 )]
# )

This also means that you can save the list of values as a python list and use it in the method. This can be useful if you need to access this list for another reason later on, and will make it easy to change your code/values as it will be easier to find.

In [None]:
list_of_desired_values = [0, 4, 12]

(
    chickweight
    .loc[lambda df: df['time'].isin( list_of_desired_values )]
)

# Conclusion

You now know the `.isin` method, which can be used when filtering a column to return the rows where the values are equal to a list of specific values.

The values need to be inputted into the `.isin` method within a list (or other Python array type eg. a `tuple` or a `numpy array`). For example (`df['column'].isin(['value1', 'value2', 'value3'])`).

You can also use a pre-defined list, for example `my_list = ['value1', 'value2', 'value3']` which you can then use in the filter `df['column'].isin(my_list)`.