# Prettifying pandas DataFrames

## Data
We will use Seaborn penguins dataset. It contains data on penguins from three different species collected from three islands in the Palmer Archipelago, Antarctica. The dataset contains 344 observations and 17 variables. The dataset is available in Seaborn library.


In [29]:
from seaborn import load_dataset
import pandas as pd
import numpy as np

pd.options.display.precision = 2

# Load the data
columns = {'culmen_length_mm': 'length', 
           'culmen_depth_mm': 'depth',
           'flipper_length_mm': 'flipper',
           'body_mass_g': 'mass'}
df = load_dataset('penguins').rename(columns=columns)
df.head()


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper,mass,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


In [30]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   species         344 non-null    object 
 1   island          344 non-null    object 
 2   bill_length_mm  342 non-null    float64
 3   bill_depth_mm   342 non-null    float64
 4   flipper         342 non-null    float64
 5   mass            342 non-null    float64
 6   sex             333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 18.9+ KB


## 1. Prettifying DataFrames with Styler
Styler is a class that allows to format and style DataFrames and Series. It is available in pandas since version 0.17.0. It is a very powerful tool that allows to create nice looking tables. It is also possible to export the styled tables to Excel or HTML files which can be further styled using CSS. 



### 1.1. Gradients
Styler allows to apply gradients to the DataFrame. It is possible to apply gradients to the whole DataFrame or to a subset of columns. The gradient is calculated based on the values in the DataFrame. The values are normalized to the range [0, 1] and the gradient is applied to the normalized values. The gradient is applied to each row separately.

```python
correlation_matrix = df.corr()
correlation_matrix.style.background_gradient(cmap='seismic_r', axis=None)
```
**Example 1**

Adding background gradient takes only an extra line of code. By passing axis=None, the colour gradients are applied along the entire table rather than within a specific axis. The name of the desired colour palette is passed onto the cmap parameter. For this parameter, we can use any Matplotlib colourmap. Here’s a useful tip for colourmaps: If you ever need to flip the colour scale, adding _r suffix to the colour map name will do the trick. For instance, if we used 'seismic' instead of 'seismic_r', negative correlations would have been blue and positive correlations would have been red.

Let's add some more styling to the table. We center-aligned the values `({'text-align': 'center'})` and increased the row height `({'padding': '12px' )` with `.set_properties()`. Then, we added a caption above the table with `.set_caption()`. 

```python
correlation_matrix.style.background_gradient(cmap='seismic_r', axis=None)\
    .set_properties(**{'text-align': 'center', 'padding': '12px'})\
    .set_caption('Correlation matrix')
```
**Example 2**

In this example, we have applied colour gradients to the background. We can also apply colour gradients to the text with `.text_gradient()`:

```python
correlation_matrix.style.text_gradient(cmap='seismic_r', axis=None)\
    .set_properties(**{'text-align': 'center', 'padding': '12px'})\
    .set_caption('Correlation matrix')
```
__Example 3__

If useful, we can chain both types of gradients as well:
```python
(correlation_matrix.style
     .background_gradient(cmap='YlGn', axis=None)
     .text_gradient(cmap='YlGn_r', axis=None))
```
__Example 4__

And, for rump up, let's add some more nice stuff.
```python
# Create made-up predictions
df['predicted'] = df['species']
df.loc[140:160, 'predicted'] = 'Gentoo'
df.loc[210:250, 'predicted'] = 'Adelie'
# Create confusion matrix
confusion_matrix = pd.crosstab(df['species'], df['predicted'])
confusion_matrix

# Create a styler
(confusion_matrix.style
     .background_gradient('Greys')
     .set_caption('CONFUSION MATRIX')
     .set_properties(**{'text-align': 'center', 
                        'padding': '12px', 
                        'width': '80px'})
     .set_table_styles([{'selector': 'th.col_heading', 
                         'props': 'text-align: center'},
                        {'selector': 'caption', 
                         'props': [('text-align', 'center'),
                                   ('font-size', '11pt'),
                                   ('font-weight', 'bold')]}]))
```
__Example 5__

Since we familiarised with the first 5 lines of the code in the previous examples, let’s understand what the remaining code is doing:
- ◼️ `.set_properties(**{'width': '80px'})`: to increase column width
- ◼️ `.set_table_styles([{'selector': 'th.col_heading', 'props': 'text-align: center'}])`: to align column headers in center
- ◼️ `.set_table_styles([{'selector': 'caption', 'props': [('text-align', 'center' ), ('font-size', '11pt'), ('font-weight', 'bold')]}])`: to center-align caption, increase its font size and bold it.



In [31]:
# Example 1
correlation_matrix = df.corr(numeric_only=True)
correlation_matrix.style.background_gradient(cmap='seismic_r', axis=None)

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper,mass
bill_length_mm,1.0,-0.235053,0.656181,0.59511
bill_depth_mm,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


In [32]:
# Example 2 
styler = correlation_matrix.style.background_gradient(cmap='seismic_r', axis=None)\
    .set_properties(**{'text-align': 'center', 'padding': '12px'})\
    .set_caption('Correlation matrix')

if styler:
    display(styler)
else:
    print('Styler is None')

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper,mass
bill_length_mm,1.0,-0.235053,0.656181,0.59511
bill_depth_mm,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


In [33]:
# Example 3
correlation_matrix.style.text_gradient(cmap='seismic_r', axis=None)\
    .set_properties(**{'text-align': 'center', 'padding': '12px'})\
    .set_caption('Correlation matrix')

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper,mass
bill_length_mm,1.0,-0.235053,0.656181,0.59511
bill_depth_mm,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


In [34]:
# Example 4
(correlation_matrix.style
     .background_gradient(cmap='YlGn', axis=None)
     .text_gradient(cmap='YlGn_r', axis=None))

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper,mass
bill_length_mm,1.0,-0.235053,0.656181,0.59511
bill_depth_mm,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


In [35]:
# Example 5
(correlation_matrix.style
     .background_gradient('Greys')
     .set_caption('CONFUSION MATRIX')
     .set_properties(**{'text-align': 'center', 
                        'padding': '12px', 
                        'width': '80px'})
     .set_table_styles([{'selector': 'th.col_heading', 
                         'props': 'text-align: center'},
                        {'selector': 'caption', 
                         'props': [('text-align', 'center'),
                                   ('font-size', '11pt'),
                                   ('font-weight', 'bold')]}]))

Unnamed: 0,bill_length_mm,bill_depth_mm,flipper,mass
bill_length_mm,1.0,-0.235053,0.656181,0.59511
bill_depth_mm,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


### 1.2. Color Bars
Color bars are a great way to visualize the values in the DataFrame. They can be added to the DataFrame using `.background_gradient()` or `.text_gradient()` methods. The color bars are added to the right side of the DataFrame. The color bar shows the range of values in the DataFrame. The color bar can be added to the whole DataFrame or to a subset of columns. The color bar is added to each row separately.

First, let's create a pivot table and then add a color bar to it.

```python
pivot = df.pivot_table('mass', ['species', 'island'], 'sex')
pivot.iloc[(-2,0)] = np.nan
# Style
pivot.style.bar(color='aquamarine')
```
__Example 6__

This can be styled further just like in the previous examples:

```python
(pivot.style
     .bar(color='aquamarine')
     .set_properties(padding='8px', width='50')) 
```
__Example 7__

If you have positive and negative values, you can format the data as follows by passing two colours `(color=['salmon', 'lightgreen'])` and aligning the bars in the middle (align='mid'):

```python
# Style on toy data
(pd.DataFrame({'feature': ['a', 'b', 'c', 'd', 'e', 'f'],  
               'coefficient': [30, 10, 1, -5, -10, -20]}).style
   .bar(color=['salmon', 'lightgreen'], align='mid')
   .set_properties(**{'text-align': 'center'})
   .set_table_styles([{'selector': 'th.col_heading', 
                       'props': 'text-align: center'}]))
```
__Example 8__


In [36]:
# Example 6
pivot = df.pivot_table('mass', ['species', 'island'], 'sex')
pivot.iloc[(-2,0)] = np.nan
# Style
pivot.style.bar(color='aquamarine')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [37]:
# Example 7
(pivot.style
     .bar(color='aquamarine')
     .set_properties(padding='8px', width='50')) 


Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [38]:
# Example 8
(pd.DataFrame({'feature': ['a', 'b', 'c', 'd', 'e', 'f'],  
               'coefficient': [30, 10, 1, -5, -10, -20]}).style
   .bar(color=['salmon', 'lightgreen'], align='mid')
   .set_properties(**{'text-align': 'center'})
   .set_table_styles([{'selector': 'th.col_heading', 
                       'props': 'text-align: center'}]))

Unnamed: 0,feature,coefficient
0,a,30
1,b,10
2,c,1
3,d,-5
4,e,-10
5,f,-20


### 1.3. Highlighting
There are times when highlighting values based on conditions can be useful. In this section, we will learn about a few functions to highlight special values.

Firstly, we can highlight minimum values from each column like this:

```python
pivot.style.highlight_min(color='pin')
```
__Example 9__

We can also highlight maximum values from each column:
```python
pivot.style.highlight_max(color='lightgreen')
```
__Example 10__

We can highlight the minimum and maximum values from each column:
```python
pivot.style.highlight_min(color='pink').highlight_max(color='lightgreen')
```
__Example 11__

There is also a function for highlighting missing values. Let’s add it to the previous code snippet:

```python
(pivot.style
      .highlight_min(color='pink')
      .highlight_max(color='lightgreen')
      .highlight_null(null_color='grey'))
```
__Example 12__

We can highlight values between a range like below:

```python
pivot.style.highlight_between(left=3500, right=4500, color='gold')
```
__Example 13__

We can also highlight quantiles:

```python
pivot.style.highlight_quantile(q_left=0.7, axis=None, 
                               color='#4ADBC8')
```
__Example 14__



In [39]:
# Example 9
pivot.style.highlight_min(color='pink')


Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [40]:
# Example 10
pivot.style.highlight_max(color='lightgreen')


Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [41]:
# Example 11
pivot.style.highlight_min(color='pink').highlight_max(color='lightgreen')


Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [42]:
# Example 12
(pivot.style
      .highlight_min(color='pink')
      .highlight_max(color='lightgreen')
      .highlight_null(color='grey'))



Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [43]:
# Example 13
pivot.style.highlight_between(left=3500, right=4500, color='gold')


Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


In [44]:
# Example 14
pivot.style.highlight_quantile(q_left=0.7, axis=None, 
                               color='#4ADBC8')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


### 1.4. Custom color-code
In this last section, we will look at a few other ways to colour-code DataFrames using custom functions. We will use the following two methods to apply our custom styling functions:
◼️ .applymap(): elementwise
◼️ .apply(): column/row/tablewise


#### 1.4.1. Elementwise application: applymap()
Let’s create a small numerical data by slicing the top 8 rows from the numerical columns. We will use a lambda function to colour values above 190 as blue and the rest as grey:

```python
df_num = df.select_dtypes('number').head(8)
(df_num.style
    .applymap(lambda x: f"color: {'blue' if x>190 else 'grey'}"))
```
__Example 15__

Let’s look at another example:

```python
green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
(df_num.style
       .applymap(lambda value: green if value>190 else pink))
```
__Example 16__

We can convert the lambda function into a regular function and pass it to .applymap():

```python
def highlight_190(value):
    green = 'background-color: lightgreen'
    pink = 'background-color: pink; color: white'
    return green if value > 190 else pink
df_num.style.applymap(highlight_190)
```
__Example 17__


In [ ]:
# Example 15
df_num = df.select_dtypes('number').head(8)
(df_num.style
    .applymap(lambda x: f"color: {'blue' if x>190 else 'grey'}"))


In [ ]:
# Example 16
green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
(df_num.style
       .applymap(lambda value: green if value>190 else pink))


In [ ]:
# Example 17
def highlight_190(value):
    green = 'background-color: lightgreen'
    pink = 'background-color: pink; color: white'
    return green if value > 190 else pink

df_num.style.applymap(highlight_190)
