<img src='images/gdd-logo.png' width='300px' align='right' style="padding: 15px">

# Next Level Pandas

Now that we have covered the essentials of pandas, there are some things we should know to ensure our code can go 'to the next level'. 

In this notebook we will briefly go through a few intermediate and advanced techniques to improve our data analytics and data science skills.

- [Using the pandas pipeline](#pipe)
    - [Pipe exercise](#pipe-ex)
- [Plotting with pandas using subplots](#plot)
    - [Plotting exercise](#plot-ex)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
banking = pd.read_csv('data/banking.csv')

<a id='pipe'></a>

----

# <font color='#1EB0E0'>Pipe it together</font>
    
Now that we have this good practice of chaining together, we can actually take this one step further. Let's say we have the following chain where we 
- Drop unwanted columns
- Create the `can_vote` column
- Filter on `exited` and `can_vote`

In [None]:
(
    banking
    .drop(columns = ['customerid', 'surname'])
    .assign(can_vote = lambda df: df['age'] >= 18)
    .loc[lambda df: df['can_vote'] & (df['exited']==1)]
)

What we can do is modularise our concerns into functions. Our functions should **have the first parameter as the dataframe**. Then we can use the `pipe` function to run the function. 

Let's see this with the first chain:

In [None]:
def remove_unwanted_cols(df):
    return df.drop(columns = ['customerid', 'surname'])

In [None]:
(
    banking
    .pipe(remove_unwanted_cols)
)

We can even be a bit smarter about how we specify the unwanted columns:

In [None]:
def remove_unwanted_cols(df, cols):
    return df.drop(columns = cols)

(
    banking
    .pipe(remove_unwanted_cols, cols=['customerid', 'surname'])
)

What benefits do you think come from this method?

<a id='pipe-ex'></a>

## <mark style='background-color:#364069;color:#1EB0E0'>Exercise - The pipe method</mark>

Use the pipe method to add the next two steps:
- The function to add the column `can_vote` is already written 
    - add a pipe method for this function to the chain below
- Create a new function to filter using the `can_vote` and `exited` column 
    - add a pipe method for this function to the chain below

In [None]:
def remove_unwanted_cols(df, cols):
    return df.drop(columns = cols)

def create_can_vote(df, col='age', age=18):
    return df.assign(can_vote = lambda df: df[col] >= age)

# Write your new function here


In [None]:
(
    banking
    .pipe(remove_unwanted_cols, cols=['customerid', 'surname'])
    # Add your new pipes here
    
)

**Answers**

In [None]:
%load answers/ex-pipe.py

---
<a id='plot'></a>

# <font color='#1EB0E0'>Plotting with pandas</font>

Let's have a look at some plots. Say we want to look at the average (mean) balance split by country:

In [None]:
(
    banking
    .groupby('geography')
    ['balance']
    .mean()
)

It might make sense to plot this to add a visual element to our analysis.

We can use the `kind=` parameter to specify a bar plot.

In [None]:
(
    banking
    .groupby('geography')
    ['balance']
    .mean()
    .plot(kind='bar', title='Salary of customer split by Country')
)

**Questions**

Why is a bar plot the best option here?

What is this line of code doing?
```python
.sort_values(ascending=False)
```

With `fig, ax = plt.subplots()` we can control things like the figsize, figure/axes we are working with and also easily control subplots.

<a id='plot-ex'></a>

# <mark style='background-color:#364069;color:#1EB0E0'>Exercise - Using `plt.subplots()`</mark>

Change the code so that
- Each subplot sits side-by-side rather than above one another
- The figsize suits the new layout better
- the colour of the first bar plot is red and the second green

In [None]:
import matplotlib.pyplot as plt

fig, (ax1, ax2) = plt.subplots(2, figsize=(10, 8))

# subplot 1
(
    banking
    .groupby('geography')
    ['balance']
    .mean()
    .plot(kind='bar', ax=ax1, title='Salary of customer split by Country')
)

# subplot 2
(
    banking
    .groupby('geography')
    ['age']
    .mean()
    .plot(kind='bar', ax=ax2, title='Age of customer split by Country')
)

# use to separate graphs slightly
fig.tight_layout()

**Answers**

In [None]:
%load answers/ex-plot.py