## TO DO
**Ask Kainat for support on the following:**
- Setup new Docker job specifically for ipynb of this lesson

# Advanced pandas - Going Beyond the Basics

## DataFrame Table Styles
___
In this notebook, we will go through ___

___
## (1) Import dependencies

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns

# Check pandas version (ensure it is pandas v2 and above)
pd.__version__

'2.0.3'

## (2) Import dataset
- Data Source: https://www.kaggle.com/datasets/datascientistanna/customers-dataset (Database Contents License (DbCL) v1.0)

In [3]:
# Import and read CSV file
df = pd.read_csv('https://raw.githubusercontent.com/kennethleungty/Educative-Advanced-Pandas/main/data/csv/Customers_Mini.csv')

# Set CustomerID as index
df = df.set_index('CustomerID')

# View DataFrame
df

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


___
## (3) View Styler object

We saw earlier in the Introduction section that the `df.style` attribute produces a `Styler` object. We can view it with the code below:

In [4]:
df.style

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


In [5]:
type(df.style)

pandas.io.formats.style.Styler

The output above shows that the `Styler` object returns the same DataFrame table, but its data type is no longer the original `pandas.core.frame.DataFrame`, and is instead `pandas.io.formats.style.Styler`

___
## (4) Formatting
https://pandas.pydata.org/docs/user_guide/style.html#Formatting-Values

### (i) Data

Let us first explore how to adjust the formatting display of the values in DataFrame cells. To control the display of these values, we can use the `format()` method. It contains different parameters that define the format specification (which are strings that express how the data should be presented). It works by assigning a formatting function (known as `formatter`) to each cell in the DataFrame.

For instance, we can apply a generic formatting on the entire DataFrame to convert floats to having a precision of 3, set the decimal point separator as a comma, and set the thousand separator as a period, as shown below:

In [6]:
df.style.format(precision=3, thousands='.', decimal=',') 

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15.0,39,Scientist,1,4
2,Male,21,35.0,81,Engineer,3,3
3,Female,20,86.0,6,Engineer,1,1
4,Female,23,59.0,77,Lawyer,0,2
5,Female,31,38.0,40,Artist,2,6
6,Female,22,58.0,76,Engineer,0,2
7,Female,35,31.0,6,Scientist,1,3


The output above shows that the formatting of the values in the "AnnualIncome" column has been modified accordingly. It is because the "AnnualIncome" column is the only one with values in the thousands range. However, an important thing to note is that the underlying data type is unchanged, as shown below where "AnnualIncome" continues to have the `int64` data type:

In [7]:
df.dtypes

Gender            object
Age                int64
AnnualIncome       int64
SpendingScore      int64
Profession        object
WorkExperience     int64
FamilySize         int64
dtype: object

The `precision`, `thousands`, and `decimal` parameters are just some examples of the parameters native to the `format()` method. The following information explains the various parameters that we can leverage in `format()`:

- `subset`: Defines the columns to apply the formatting function to.
- `na_rep`: Sets the representation for missing values. If `None` (default), no special formatting will be applied.
- `precision`: Floating point precision for the numerical values.
- `decimal`: Character used as the separator for the decimal points.
- `thosands`: Character used as the separate for the thousands value. Default value is `None`.
- `escape`: Defines the method for escaping special characters. Passing `'html'` and `'latex'` will replace the special characters with HTML-safe and LaTeX-safe sequences respectively.
- `hyperlinks`: Converts hyperlink-like string patterns into clickable links. Passing `'html'` and `'latex'` will set clickable URL hyperlinks and LaTeX href commands respectively.

Besides `precision` (default value is 6) and `decimal` (default character is `.`), the other parameters have a default value of `None`.

We can also apply a different format specification for each column by providing dictionaries, as seen in the example below:

In [8]:
df.style.format({
               'WorkExperience': '{:,.2f}', # format as float rounded to 2 decimal places
               'SpendingScore': '{:.1%}',  # format as percentage rounded to 1 decimal place
               'AnnualIncome': '{:,}',    # format with comma as thousand separator
               'Gender': '{}'  # format as string
                })

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,3900.0%,Scientist,1.0,4
2,Male,21,35000,8100.0%,Engineer,3.0,3
3,Female,20,86000,600.0%,Engineer,1.0,1
4,Female,23,59000,7700.0%,Lawyer,0.0,2
5,Female,31,38000,4000.0%,Artist,2.0,6
6,Female,22,58000,7600.0%,Engineer,0.0,2
7,Female,35,31000,600.0%,Scientist,1.0,3


The example above shows that each key-value pair of the dictionary corresponds to a column and its respective format specification, which are strings that define how the data should be presented. 

Let us look at another example. Suppose we now make introduce some null values into our dataset. We can then perform a different set of formatting as shown below:

In [9]:
# Generate df with NaN values
total_values = df.size
n_nulls = 5
df_nan = df.copy()
rows = np.random.choice(df_nan.index, n_nulls)
cols = np.random.choice(df_nan.columns, n_nulls)

# Replace chosen locations with NaN
for row, col in zip(rows, cols):
    df_nan.at[row, col] = np.nan

# View output
df_nan

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,,1.0,4
2,Male,21,35000,81,Engineer,,3
3,,20,86000,6,Engineer,1.0,1
4,Female,23,59000,77,Lawyer,0.0,2
5,Female,31,38000,40,Artist,2.0,6
6,Female,22,58000,76,Engineer,0.0,2
7,,35,31000,6,Scientist,,3


In [10]:
# Apply formatting
func = lambda s: 'JOB' if isinstance(s, str) else 'NO JOB'

# Perform series of formatting changes
df_nan.style.format({'Age': '{:.1f}', 
                     'Profession': func}, 
                     na_rep='MISSING')

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19.0,15000,39,MISSING,1.000000,4
2,Male,21.0,35000,81,JOB,MISSING,3
3,MISSING,20.0,86000,6,JOB,1.000000,1
4,Female,23.0,59000,77,JOB,0.000000,2
5,Female,31.0,38000,40,JOB,2.000000,6
6,Female,22.0,58000,76,JOB,0.000000,2
7,MISSING,35.0,31000,6,JOB,MISSING,3


The output above shows that we have performed the following formatting changes:
- Set values in "Age" column to be displayed as 1 decimal place. The `.1` means we want to round the number to 1 decimal place.
- Replace non-null values in "Profession" column with the string "JOB" using custom lambda function
- Replace null values in entire DataFrame with the string "MISSING"

If the code for defining formatting looks familiar, it is because the `pandas` `format()` method is related to Python's string format syntax, where they both use the same underlying mechanics to format the representation of numbers.

For example, the following Python code format using replacement fields surrounded by curly braces {}, where we convert a float into numerical dollar value with 2 decimal places:

In [11]:
# Convert number into numerical dollar value with 2 decimal places and comma as thousands separator
x = 1012.3456
print('${:,.2f}'.format(x))

$1,012.35


The same concept is applied when we perform the same conversion for the values in the "AnnualIncome" column, as seen below:

In [12]:
# Format the annual income column
df.style.format({'AnnualIncome': '${:,.2f}'})

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,"$15,000.00",39,Scientist,1,4
2,Male,21,"$35,000.00",81,Engineer,3,3
3,Female,20,"$86,000.00",6,Engineer,1,1
4,Female,23,"$59,000.00",77,Lawyer,0,2
5,Female,31,"$38,000.00",40,Artist,2,6
6,Female,22,"$58,000.00",76,Engineer,0,2
7,Female,35,"$31,000.00",6,Scientist,1,3


We can see from the output that each number in the DataFrame column is displayed with two decimal places, commas as thousand separators, and prefixed by a dollar sign.

### (ii) Index
Besides formatting the data within the DataFrame cells, we have methods to format the text display value of index labels or column headers too. These methods are namely `format_index()` and `relabel_index()`.

For example, we can uppercase all the column headers with `format_index()`, as shown below:

In [13]:
df.style.format_index(lambda v: str.upper(v), # String uppercase
                        axis=1 # Represents columns (axis=1)
                      )

Unnamed: 0_level_0,GENDER,AGE,ANNUALINCOME,SPENDINGSCORE,PROFESSION,WORKEXPERIENCE,FAMILYSIZE
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


The `format_index()` method has parameters similar to `format()`, except that it has additional parameters like `axis` and `level` to determine the specific way these indices are modified.

For renaming of index labels or column headers, we can use the `relabel_index()` method. For instance, the example below shows how we can use it to rename the index labels (which are the customer IDs):

In [14]:
# Relabel the values in the index (CustomerID)
df.style.relabel_index([f'Customer{i+1}' for i in range(7)], 
                        axis=0 # Represents index (axis=0)
                       )

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Customer1,Male,19,15000,39,Scientist,1,4
Customer2,Male,21,35000,81,Engineer,3,3
Customer3,Female,20,86000,6,Engineer,1,1
Customer4,Female,23,59000,77,Lawyer,0,2
Customer5,Female,31,38000,40,Artist,2,6
Customer6,Female,22,58000,76,Engineer,0,2
Customer7,Female,35,31000,6,Scientist,1,3


### (iii) Hide

We can use the `hide()` method to hide certain columns, the index and/or column headers, or index names.

The index can be hidden from rendering by calling `hide()` without any arguments, which might be useful if your index is integer based. Similarly column headers can be hidden by calling `hide(axis="columns")` without any other arguments.

For example, we can hide the "Age" and "Gender" columns from being displayed, as shown below:

In [15]:
# Hide Age and Gender columns
df.style.hide(subset=['Age', 'Gender'],
                      axis='columns' # We can also use axis=1 
                      )

Unnamed: 0_level_0,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,15000,39,Scientist,1,4
2,35000,81,Engineer,3,3
3,86000,6,Engineer,1,1
4,59000,77,Lawyer,0,2
5,38000,40,Artist,2,6
6,58000,76,Engineer,0,2
7,31000,6,Scientist,1,3


We can even chain this `hide()` method with the methods we learned earlier, such as `relabel_index()`. For example, we can first hide the "Age" and "Gender columns, before renaming the rest, as demonstrated below:

In [16]:
# Chaining of methods for styling
df.style.hide(['Age', 'Gender'], axis=1)\
        .relabel_index(['A', 'B', 'C', 'D', 'E'], axis=1)

Unnamed: 0_level_0,A,B,C,D,E
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,15000,39,Scientist,1,4
2,35000,81,Engineer,3,3
3,86000,6,Engineer,1,1
4,59000,77,Lawyer,0,2
5,38000,40,Artist,2,6
6,58000,76,Engineer,0,2
7,31000,6,Scientist,1,3


### (iv) Concatenate

We can also use the `concat()` method to append another `Styler` object for combining output into a single table. The purpose of this method is to extend existing styled DataFrame with other metrics that may be useful but may not conform to the original's structure. It could be for common use cases such as  adding a sub total row, or displaying metrics such as means, variance or counts.

Suppose we first subset the original DataFrame into only the numerical columns (i.e., "Age", "AnnualIncome", "SpendingScore", "WorkExperience", and "FamilySize". With that, we then generate the mean value for each column as shown below:

In [17]:
# Subset into numerical columns only
df_subset = df[["Age", "AnnualIncome", "SpendingScore", "WorkExperience", "FamilySize"]]

# Obtain mean value for each numerical column
df_mean = df_subset.mean().to_frame(name='Mean').T

# View output
df_mean

Unnamed: 0,Age,AnnualIncome,SpendingScore,WorkExperience,FamilySize
Mean,24.428571,46000.0,46.428571,1.142857,3.0


We can then concatenate the `Styler` object version of both DataFrames (the original DataFrame and the DataFrame with summary statistics we generated above):

In [18]:
# Subset to numerical columns
df_subset = df[["Age", "AnnualIncome", "SpendingScore", "WorkExperience", "FamilySize"]]

# Concatenate Styler objects (obtained with .style) of DataFrames
df_subset.style.concat(df_mean.style)

Unnamed: 0_level_0,Age,AnnualIncome,SpendingScore,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,19.0,15000.0,39.0,1.0,4.0
2,21.0,35000.0,81.0,3.0,3.0
3,20.0,86000.0,6.0,1.0,1.0
4,23.0,59000.0,77.0,0.0,2.0
5,31.0,38000.0,40.0,2.0,6.0
6,22.0,58000.0,76.0,0.0,2.0
7,35.0,31000.0,6.0,1.0,3.0
Mean,24.428571,46000.0,46.428571,1.142857,3.0


The output shows how we have performed concatenation such that summary statistics has been easily appended to the original DataFrame as part of the output display.

> **Note**: The other Styler object to be appended in the concatenation must have the same columns as the original.

We can also perform formatting on the `Styler` objects together with the concatenation, as shown below:

In [19]:
# Format objects and concatenate
df_subset.style.format({'AnnualIncome': '${:,.2f}'})\
               .concat(df_mean.style.format('{:.1f}') # 1 decimal place
                      )

Unnamed: 0_level_0,Age,AnnualIncome,SpendingScore,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,19.0,"$15,000.00",39.0,1.0,4.0
2,21.0,"$35,000.00",81.0,3.0,3.0
3,20.0,"$86,000.00",6.0,1.0,1.0
4,23.0,"$59,000.00",77.0,0.0,2.0
5,31.0,"$38,000.00",40.0,2.0,6.0
6,22.0,"$58,000.00",76.0,0.0,2.0
7,35.0,"$31,000.00",6.0,1.0,3.0
Mean,24.4,46000.0,46.4,1.1,3.0


___
## (5) Built-in styles

Having focused on styling the text and numerical values in the DataFrame cells earlier, let us now take a look at how we can incorporate colors and highlights into the table styling.

In particular, we will start by looking at several styling functions that are common enough such that they have been built into the `Styler` objects. It means that we can easily utilize them without requiring to write custom functions for those specific styles.

Let us explore how each of these built-in style functions works:

### Null

The `highlight_null()` function allows us to easily highlight and identify missing and null data in the DataFrame, as shown below:

In [20]:
# Generate df with NaN values
total_values = df.size
n_nulls = 5
df_nan = df.copy()
rows = np.random.choice(df_nan.index, n_nulls)
cols = np.random.choice(df_nan.columns, n_nulls)

# Replace chosen locations with NaN
for row, col in zip(rows, cols):
    df_nan.at[row, col] = np.nan

In [21]:
# Highlight null values with yellow color
df_nan.style.highlight_null(color='yellow')

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,,15000.0,39,Scientist,1.0,4.0
2,Male,21.0,35000.0,81,Engineer,3.0,3.0
3,Female,20.0,86000.0,6,Engineer,,
4,Female,23.0,,77,Lawyer,0.0,2.0
5,Female,31.0,38000.0,40,Artist,2.0,6.0
6,Female,22.0,58000.0,76,Engineer,0.0,2.0
7,Female,35.0,31000.0,6,Scientist,1.0,3.0


### Minimum and maximum

We can use `highlight_min()` and `highlight_max()` to identifying extremities in numerical data, as demonstrated below:

In [22]:
# Highlight minimum values with defined color
df.style.highlight_min(color='orange')

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


In [23]:
# Highlight maximum values with defined color
df.style.highlight_max(color='green')

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


As described earlier, the styling of table involves CSS. As such, here is an example that involves slightly more specification on how the output will be displayed. In particular, we can leverage the `props` parameter, which specifies the CSS properties to use for highlighting.

In [24]:
# Highlight maximum values with custom CSS properties
df.style.highlight_max(props='color:white; font-weight:bold; \
                              background-color:blue;')

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


> **Note**: When values are passed into `props`, the `color` parameter will not be used.

### Quantiles

`highlight_quantile()` is useful for detecting the highest or lowest percentile values based on our specifications. It only works on numerical columns.

In [25]:
# Subset to numerical cols
df_subset = df[["Age", "AnnualIncome", "SpendingScore", "WorkExperience", "FamilySize"]]

# Highlight values belonging to 75th quantile and above
df_subset.style.highlight_quantile(q_left=0.75, 
                                   props='color:white; background-color:brown;')

Unnamed: 0_level_0,Age,AnnualIncome,SpendingScore,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,19,15000,39,1,4
2,21,35000,81,3,3
3,20,86000,6,1,1
4,23,59000,77,0,2
5,31,38000,40,2,6
6,22,58000,76,0,2
7,35,31000,6,1,3


### Between
`highlight_between()` is useful for detecting values within a specified range.  It can take ranges of values as a means to specify the numerical limits for each column, as shown below:

In [29]:
# Subset to numerical cols
numerical_cols = ["Age", "AnnualIncome", "SpendingScore", "WorkExperience", "FamilySize"]
df_subset = df[numerical_cols]

# Define left (aka minimum) limits (each limit value corresponds to column name in index)
left = pd.Series([25, 40000, 50, 2, 3], 
                 index=numerical_cols)

# Highlight based on values within the ranges set for each column
df_subset.style.highlight_between(left=left, 
                                  right=1000, # Set blanket right limit of 1000
                                  axis=1, # Define columns as axis
                                  color='pink')

Unnamed: 0_level_0,Age,AnnualIncome,SpendingScore,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,19,15000,39,1,4
2,21,35000,81,3,3
3,20,86000,6,1,1
4,23,59000,77,0,2
5,31,38000,40,2,6
6,22,58000,76,0,2
7,35,31000,6,1,3


### Gradients

Besides providing solid colors, we can apply color gradients based on the numerical scale of the values. In particular, we can do either do background gradients (changes cell background color) or text gradients (changes text color), as shown below. We will also be using `seaborn` color palettes for generating visually appealing colormaps, which we will pass into the `cmap` parameter:

In [27]:
# Set color palette
color_map = sns.color_palette('mako', as_cmap=True)

# Generate background gradients
df.style.background_gradient(cmap=color_map)

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


We can repeat the above gradient coloring but this time using `text_gradient()` for just the text alone:

In [31]:
# Set color palette
color_map = sns.color_palette('magma', as_cmap=True)

# Generate text gradients
df.style.text_gradient(cmap=color_map)

Unnamed: 0_level_0,Gender,Age,AnnualIncome,SpendingScore,Profession,WorkExperience,FamilySize
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Male,19,15000,39,Scientist,1,4
2,Male,21,35000,81,Engineer,3,3
3,Female,20,86000,6,Engineer,1,1
4,Female,23,59000,77,Lawyer,0,2
5,Female,31,38000,40,Artist,2,6
6,Female,22,58000,76,Engineer,0,2
7,Female,35,31000,6,Scientist,1,3


Notice that the non-numerical values for both gradient methods are not highlighted. 

### Bar

### Combining color displays
SHOW THIS EXAMPLE: https://pandas.pydata.org/docs/user_guide/style.html#:~:text=To%20showcase%20an%20example%20here%E2%80%99s%20how%20you%20can%20change%20the%20above%20with

Need to showcase this point: .background_gradient and .text_gradient have a number of keyword arguments to customise the gradients and colors. See the documentation.

___
## (6) Basic CSS style customization

https://pandas.pydata.org/docs/user_guide/style.html#Acting-on-Data

In [None]:
set_properties()

Give common CSS in a tableform

In [None]:
talk about applymap etc.

### Data

applymap()
apply()

### Index and column headers

applymap_index()
apply_index() 

___
## (7) Advanced style customization

test

https://pandas.pydata.org/docs/user_guide/style.html#Table-Styles
https://pandas.pydata.org/docs/user_guide/style.html#Methods-to-Add-Styles

___
## (8) Export styled tables

>Many other styles that are out of scope of this lesson, like tooltips and captions (https://pandas.pydata.org/docs/user_guide/style.html#Tooltips-and-Captions) and optimization https://pandas.pydata.org/docs/user_guide/style.html#Optimization