# Formatting Tables in Pandas


In [72]:
# Import pandas
import pandas as pd
# Import numpy
import numpy as np



## The data
Let's create some simulated data for two widgets, A and B. We'll create a dataframe for each widget, then concatenate them together. We'll also sort the dataframe by month and reset the index.

In [73]:
# simulated data for widget A
df_a = pd.DataFrame(
    {
        'Month':pd.date_range(
            start = '01-01-2012',
            end = '31-12-2022',
            freq = 'MS'
        ),
        'Quotes':np.random.randint(
            low = 1_000_000,
            high = 2_500_000,
            size = 132
        ),
        'Numbers':np.random.randint(
            low = 300_000,
            high = 500_000,
            size = 132
        ),
        'Amounts':np.random.randint(
            low = 750_000,
            high = 1_250_000,
            size = 132
        )
    }
)

df_a['Product'] = 'A'

# simulated data for widget B
df_b = pd.DataFrame(
    {
        'Month':pd.date_range(
            start = '01-01-2012',
            end = '31-12-2022',
            freq = 'MS'
        ),
        'Quotes':np.random.randint(
            low = 100_000,
            high = 800_000,
            size = 132
        ),
        'Numbers':np.random.randint(
            low = 10_000,
            high = 95_000,
            size = 132
        ),
        'Amounts':np.random.randint(
            low = 450_000,
            high = 750_000,
            size = 132
        )
    }
)

df_b['Product'] = 'B'

# put it together & sort
df = pd.concat([df_a,df_b],axis = 0)
df.sort_values(by = 'Month',inplace = True)
df.reset_index(drop = True,inplace = True)

Let’s calculate a few “interesting” statistics — average sale amounts and product conversion:

In [74]:
# average sale
df['Average sale'] = df['Amounts'] / df['Numbers'].replace({0: np.nan})

# conversion
df['Product conversion'] = df['Numbers'] / df['Quotes'].replace({0: np.nan})

In [75]:
df.head(3)

Unnamed: 0,Month,Quotes,Numbers,Amounts,Product,Average sale,Product conversion
0,2012-01-01,1083423,351717,1160524,A,3.299596,0.324635
1,2012-01-01,116615,56446,501802,B,8.889948,0.484037
2,2012-02-01,2425239,399001,1203083,A,3.015238,0.16452


## Date Formatting
There’s arguably nothing __wrong__ with the formatting, but it could be better. For instance, since all the monthly data is reflected as at the first of each month, there’s probably little sense in keeping the day element of each Month entry as it tells the reader very little.

In [79]:
# format the date as YYYY-MM
styler = df.iloc[:3].style.format({'Month':'{:%Y-%m}'})
if styler:
    display(styler)

Unnamed: 0,Month,Quotes,Numbers,Amounts,Product,Average sale,Product conversion
0,2012-01,1083423,351717,1160524,A,3.299596,0.324635
1,2012-01,116615,56446,501802,B,8.889948,0.484037
2,2012-02,2425239,399001,1203083,A,3.015238,0.16452


Now, we can improve readability even further by using the name of each month rather than the month number, and we can do this __*without having to alter the underlying data*__.

In [80]:
styler = df.iloc[:3].style.format({'Month':'{:%B %Y}'})
if styler:
    display(styler)

Unnamed: 0,Month,Quotes,Numbers,Amounts,Product,Average sale,Product conversion
0,January 2012,1083423,351717,1160524,A,3.299596,0.324635
1,January 2012,116615,56446,501802,B,8.889948,0.484037
2,February 2012,2425239,399001,1203083,A,3.015238,0.16452


Maybe a little too wordy now — let’s use abbreviations instead (e.g. “Jan” instead of “January”) and we’ll also add a comma before the year.

In [81]:
styler = df.iloc[:3].style.format({'Month':'{:%b, %Y}'})
if styler:
    display(styler)

Unnamed: 0,Month,Quotes,Numbers,Amounts,Product,Average sale,Product conversion
0,"Jan, 2012",1083423,351717,1160524,A,3.299596,0.324635
1,"Jan, 2012",116615,56446,501802,B,8.889948,0.484037
2,"Feb, 2012",2425239,399001,1203083,A,3.015238,0.16452


## Formatting numbers with a thousand separator
A fairly straightforward formatting experience here as we separate thousands of Quotes and Numbers using commas.

What is important to note however, is that if we also want to retain the formatting we applied to the Month column (we do), then we need to extend the formatting dictionary. 


In [82]:
styler = df.iloc[:3].style.format(
    {
        'Month':'{:%b, %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}'
    }
)
if styler:
    display(styler)

Unnamed: 0,Month,Quotes,Numbers,Amounts,Product,Average sale,Product conversion
0,"Jan, 2012",1083423,351717,1160524,A,3.299596,0.324635
1,"Jan, 2012",116615,56446,501802,B,8.889948,0.484037
2,"Feb, 2012",2425239,399001,1203083,A,3.015238,0.16452


## Formatting currencies
The Widget Company just so happens to produce and sell its widgets in a country that uses a currency denoted by £ (I hope somewhere warmer and sunnier than the country where I earn my £).

Let’s reflect that in the table, reminding ourselves that:

- At an overall level, using decimal points is probably a little much
- At a lower level — say for instance, the average sale value — using decimals can be useful.
So we add currency formatting for Amounts and Average sale to our formatting dictionary:

In [None]:
styler = df.iloc[:3].style.format(
    {
        'Month':'{:%b, %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}'
    }
)
if styler:
    display(styler)