# 0. Data 📦

In [1]:
import numpy as np
import pandas as pd
pd.options.display.precision = 2
from seaborn import load_dataset
# Load sample data
columns = {'bill_length_mm': 'length', 
           'bill_depth_mm': 'depth',
           'flipper_length_mm': 'flipper',
           'body_mass_g': 'mass'}
df = load_dataset('penguins').rename(columns=columns)
df.head()

Unnamed: 0,species,island,length,depth,flipper,mass,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


# 1. Prettifying DataFrames ✨

In order to style DataFrames, we need to access the `.style` attribute which returns a Styler object:

In [2]:
type(df.style)

pandas.io.formats.style.Styler

This Styler object creates an HTML table which can be further styled using CSS.

## 1.1. Gradients 🌈

In [3]:
correlation_matrix = df.corr()
correlation_matrix

Unnamed: 0,length,depth,flipper,mass
length,1.0,-0.24,0.66,0.6
depth,-0.24,1.0,-0.58,-0.47
flipper,0.66,-0.58,1.0,0.87
mass,0.6,-0.47,0.87,1.0


In [4]:
(correlation_matrix
 .style
 .background_gradient(cmap='seismic_r', axis=None))

Unnamed: 0,length,depth,flipper,mass
length,1.0,-0.235053,0.656181,0.59511
depth,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


By passing `axis=None`, the colour gradients are applied along the entire table rather than within a specific axis.

The name of the desired colour palette is passed onto the `cmap` parameter. For this parameter, we can use any [Matplotlib colourmap](https://matplotlib.org/stable/tutorials/colors/colormaps.html).

Here’s a useful tip for colourmaps: If you ever need to flip the colour scale, adding `_r` suffix to the colour map name will do the trick. For instance, if we used `'seismic'` instead of `'seismic_r'`, negative correlations would have been blue and positive correlations would have been red.

The previous example doesn’t look identical to the example shown at the beginning of this post. It needs a few more customisations to look the same:

In [5]:
(correlation_matrix
 .style
 .background_gradient(cmap='seismic_r', axis=None)
 .set_properties(**{'text-align': 'center', 'padding': '12px'})
 .set_caption('CORRELATION MATRIX'))

Unnamed: 0,length,depth,flipper,mass
length,1.0,-0.235053,0.656181,0.59511
depth,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


We center-aligned the values (`{'text-align': 'center'}`) and increased the row height (`{'padding': '12px'}`) with `.set_properties()`. Then, we added a caption above the table with `.set_caption()`. In this example, we have applied colour gradients to the background. We can also apply colour gradients to the text with `.text_gradient()`:

In [6]:
(correlation_matrix
 .style
 .text_gradient(cmap='seismic_r', axis=None))

Unnamed: 0,length,depth,flipper,mass
length,1.0,-0.235053,0.656181,0.59511
depth,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


If useful, we can chain both types of gradients as well:

In [7]:
(correlation_matrix
 .style
 .background_gradient(cmap='YlGn', axis=None)
 .text_gradient(cmap='YlGn_r', axis=None))

Unnamed: 0,length,depth,flipper,mass
length,1.0,-0.235053,0.656181,0.59511
depth,-0.235053,1.0,-0.583851,-0.471916
flipper,0.656181,-0.583851,1.0,0.871202
mass,0.59511,-0.471916,0.871202,1.0


Before we wrap up this section, I want to show one more useful example. Let’s imagine we had a simple confusion matrix:

In [8]:
# Create made-up predictions
df['predicted'] = df['species']
df.loc[140:160, 'predicted'] = 'Gentoo'
df.loc[210:250, 'predicted'] = 'Adelie'
# Create confusion matrix
confusion_matrix = pd.crosstab(df['species'], df['predicted'])
confusion_matrix

predicted,Adelie,Chinstrap,Gentoo
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,140,0,12
Chinstrap,10,49,9
Gentoo,31,0,93


We can do a bit of make-over to make it more useful and pretty:

In [9]:
(confusion_matrix
 .style
 .background_gradient('Greys')
 .set_caption('CONFUSION MATRIX')
 .set_properties(
     **{'text-align': 'center',
        'padding': '12px',
        'width': '80px'}
 )
 .set_table_styles(
     [{'selector': 'th.col_heading', 
       'props': 'text-align: center'},
      {'selector': 'caption', 
       'props': [
           ('text-align', 'center'), 
           ('font-size', '11pt'), 
           ('font-weight', 'bold')
       ]}
     ]
 )
)

predicted,Adelie,Chinstrap,Gentoo
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,140,0,12
Chinstrap,10,49,9
Gentoo,31,0,93


Since we familiarised with the first 5 lines of the code in the previous examples, let’s understand what the remaining code is doing:

- `.set_properties(**{'width': '80px'})`: to increase column width

- `.set_table_styles([{'selector': 'th.col_heading', 'props': 'text-align: center'}])`: to align column headers in center

- `.set_table_styles([{'selector': 'caption', 'props': [('text-align', 'center' ), ('font-size', '11pt'), ('font-weight', 'bold')]}])`: to center-align caption, increase its font size and bold it.

## 1.2. Colour bars 📊

Now, let’s see how to add data bars to the DataFrame. We will first create a pivot table, then use `.bar()` to create data bars:

In [10]:
# Create a pivot table with missing data
pivot = df.pivot_table('mass', ['species', 'island'], 'sex')
pivot.iloc[-2, 0] = np.nan
# Style
pivot.style.bar(color='aquamarine')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


This can be styled further just like in the previous examples:

In [11]:
(pivot
 .style
 .bar(color='aquamarine')
 .set_properties(padding='8px', width='50'))

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


Previously we got familiar with this format: `.set_properties(**{'padding': '8px', 'width': '50'})`. The code above shows an alternative way to pass your arguments to `.set_properties()`.

If you have positive and negative values, you can format the data as follows by passing two colours (`color=['salmon', 'lightgreen']`) and aligning the bars in the middle (`align='mid'`):

In [12]:
# Style on toy data
(pd.DataFrame({'feature': ['a', 'b', 'c', 'd', 'e', 'f'],
               'coefficient': [30, 10, 1, -5, -10, -20]})
 .style
 .bar(color=['salmon', 'lightgreen'], align='mid')
 .set_properties(**{'text-align': 'center'})
 .set_table_styles([{'selector': 'th.col_heading', 
                     'props': 'text-align: center'}]))

Unnamed: 0,feature,coefficient
0,a,30
1,b,10
2,c,1
3,d,-5
4,e,-10
5,f,-20


## 1.3. Highlights 🔆

Firstly, we can highlight minimum values from each column like this:

In [13]:
pivot.style.highlight_min(color='pink')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


There’s an equivalent function for maximum values:

In [14]:
pivot.style.highlight_max(color='lightgreen')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


We can chain these highlight functions together like this:

In [15]:
(pivot
 .style
 .highlight_min(color='pink')
 .highlight_max(color='lightgreen'))

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


There is also a function for highlighting missing values. Let’s add it to the previous code snippet:

In [16]:
(pivot
 .style
 .highlight_min(color='pink')
 .highlight_max(color='lightgreen')
 .highlight_null(null_color='grey'))

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


We can highlight values between a range like below:

In [17]:
pivot.style.highlight_between(left=3500, right=4500, color='gold')

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


We can also highlight quantiles:

In [18]:
pivot.style.highlight_quantile(
    q_left=0.7, axis=None, color='#4ADBC8'
)

Unnamed: 0_level_0,sex,Female,Male
species,island,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,Biscoe,3369.318182,4050.0
Adelie,Dream,3344.444444,4045.535714
Adelie,Torgersen,3395.833333,4034.782609
Chinstrap,Dream,,3938.970588
Gentoo,Biscoe,4679.741379,5484.836066


Here, we’ve highlighted the top 30%.

We have used a few different colours so far. If you are wondering what other colour names you could use, check out [this resource](https://matplotlib.org/stable/gallery/color/named_colors.html) for colour names. As shown in the example above, you can also use hexadecimal colours which will give you access to a wider range of options (over 16 million colours!). [Here](https://coolors.co/)’s my favourite resource to explore hexadecimal colour code.

## 1.4. Custom colour-code 🎨

In this last section, we will look at a few other ways to colour-code DataFrames using custom functions. We will use the following two methods to apply our custom styling functions:
    
- `.applymap()`: elementwise
- `.apply()`: column/row/tablewise

### Elementwise application: `.applymap()`

Let’s create a small numerical data by slicing the top 8 rows from the numerical columns. We will use a lambda function to colour values above 190 as blue and the rest as grey:

In [19]:
df_num = df.select_dtypes('number').head(8)
(df_num
 .style
 .applymap(lambda x: f"color: {'blue' if x>190 else 'grey'}"))

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


In [20]:
green = 'background-color: lightgreen'
pink = 'background-color: pink; color: white'
(df_num
 .style
 .applymap(lambda value: green if value>190 else pink))

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


We can convert the lambda function into a regular function and pass it to `.applymap()`:

In [21]:
def highlight_190(value):
    green = 'background-color: lightgreen'
    pink = 'background-color: pink; color: white'
    return green if value > 190 else pink

df_num.style.applymap(highlight_190)

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


### Row/Column/Tablewise application: `.apply()`

Let’s see how we could do the same formatting using `.apply()`:

In [22]:
def highlight_190(series):
    green = 'background-color: lightgreen'
    pink = 'background-color: pink; color: white'
    return [green if value > 190 else pink for value in series]

df_num.style.apply(highlight_190)

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


We can also chain them just like the previous functions:

In [23]:
(df_num
 .style
 .apply(highlight_190)
 .applymap(
     lambda value: 'opacity: 40%' if value < 30 else None)
)

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


It’s useful to know how to use both `.apply()` and `.applymap()`. Here’s an example where we can use `.apply()` but not `.applymap()`:

In [24]:
def highlight_above_median(series):
    is_above = series > series.median()
    above = 'background-color: lightgreen'
    below = 'background-color: grey; color: white'
    return [above if value else below for value in is_above]

df_num.style.apply(highlight_above_median)

Unnamed: 0,length,depth,flipper,mass
0,39.1,18.7,181.0,3750.0
1,39.5,17.4,186.0,3800.0
2,40.3,18.0,195.0,3250.0
3,,,,
4,36.7,19.3,193.0,3450.0
5,39.3,20.6,190.0,3650.0
6,38.9,17.8,181.0,3625.0
7,39.2,19.6,195.0,4675.0


We find the median value by each column and highlight values higher than median in green and the rest in grey. We can also style the entire column based on conditions with `.apply()`:

In [25]:
def highlight(data):
    n = len(data)
    if data['sex'] == 'Male':
        return n*['background-color: lightblue']
    if data['sex'] == 'Female':
        return n*['background-color: lightpink']
    else:
        return n*['']
    
df.head(6).style.apply(highlight, axis=1).hide_index()

  df.head(6).style.apply(highlight, axis=1).hide_index()


species,island,length,depth,flipper,mass,sex,predicted
Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male,Adelie
Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female,Adelie
Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female,Adelie
Adelie,Torgersen,,,,,,Adelie
Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female,Adelie
Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male,Adelie


Here, we have hidden DataFrame’s indices with `.hide_index()` for a cleaner look. If needed, you can also hide columns with `.hide_columns()` as well.

Lastly, most of these functions we looked at in this post take optional arguments to customise styling. The following two arguments are common and quite useful to know:

- ️`axis` for along which axis to operate: columns, rows or the entire table

- `subset` to select a subset of columns to style.

To learn more about styling, check out [this useful documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html) by pandas.