### The `tabulate` Library

Before doing anything else, I'm going to configure my notebook so that Jupyter does not auto wrap outputs.

In [1]:
%%html
<style>
div.output_area pre {
    white-space: pre;
}
</style>

The `tabulate` library is a very useful, and simple to use library for generating tabulated data using a variety of different styles and a variety of data structures.

Documentation for the `tabulate` library can be found [here](https://github.com/astanin/python-tabulate).

To install it, activate your virtual environment, and pip install it:

```bash
pip install tabulate
```

The `tabulate` library comes with both a CLI and the library itself, there are options to only install the library and skip the CLI - refer to the documentation linked above for more details on this.

For this video I have a CSV file I obtained (and slightly modified) from the World Bank [here](https://data.worldbank.org/indicator/SP.POP.TOTL).

The file is located in the same directory as this notebook, and is named `population.csv`.

Let's take a look at the first few rows of that file:

In [2]:
with open('population.csv') as f:
    for _ in range(3):
        print(next(f).strip(), end=f"\n{'-' * 20}\n")

Country Name,Country Code,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
--------------------
Aruba,ABW,101665,102050,102565,103165,103776,104339,104865,105361,105846,106310,106766,107195
--------------------
Africa Eastern and Southern,AFE,518468229,532760424,547482863,562601578,578075373,593871847,609978946,626392880,643090131,660046272,677243299,694665117
--------------------


We could try and display it in a slightly better format ourselves:

In [3]:
import csv

In [4]:
with open('population.csv') as f:
    reader = csv.reader(f)
    headers = next(reader)
    print('\t'.join(headers))
    for _ in range(3):
        print("\t".join(next(reader)))

Country Name	Country Code	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
Aruba	ABW	101665	102050	102565	103165	103776	104339	104865	105361	105846	106310	106766	107195
Africa Eastern and Southern	AFE	518468229	532760424	547482863	562601578	578075373	593871847	609978946	626392880	643090131	660046272	677243299	694665117
Afghanistan	AFG	29185511	30117411	31161378	32269592	33370804	34413603	35383028	36296111	37171922	38041757	38928341	39835428


But, as you can see this is still really not great - Although we instructed Jupyter not to wrap output, using `\t` yields inconsistent alignments.

Instead, we can use the `tabulate` library to do all the display formatting hard work for us:

In [5]:
from tabulate import tabulate, tabulate_formats

In [6]:
with open('population.csv') as f:
    reader = csv.reader(f)
    display_data = tabulate(reader)
    
print(display_data)

----------------------------------------------------  ------------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Country Name                                          Country Code  2010        2011        2012        2013        2014        2015        2016        2017        2018        2019        2020        2021
Aruba                                                 ABW           101665      102050      102565      103165      103776      104339      104865      105361      105846      106310      106766      107195
Africa Eastern and Southern                           AFE           518468229   532760424   547482863   562601578   578075373   593871847   609978946   626392880   643090131   660046272   677243299   694665117
Afghanistan                                           AFG           29185511    30117411    31161378    32269592    33370804    34413603    35383028    36296111    371

We now need to make that first line of data a header in the table.

Since that is already part of our CSV file, the simplest way to do this is to use the `headers="firstrow"` argument for `tabulate()`:

In [7]:
with open('population.csv') as f:
    reader = csv.reader(f)
    display_data = tabulate(reader, headers="firstrow")
    
print(display_data)

Country Name                                          Country Code    2010        2011        2012        2013        2014        2015        2016        2017        2018        2019        2020        2021
----------------------------------------------------  --------------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Aruba                                                 ABW             101665      102050      102565      103165      103776      104339      104865      105361      105846      106310      106766      107195
Africa Eastern and Southern                           AFE             518468229   532760424   547482863   562601578   578075373   593871847   609978946   626392880   643090131   660046272   677243299   694665117
Afghanistan                                           AFG             29185511    30117411    31161378    32269592    33370804    34413603    35383028    36296

If we did not want to use the first row in the CSV file as headers, or if it did not have column headers, we can also specify them directly for `tabulate`.

In [8]:
with open('population.csv') as f:
    reader = csv.reader(f)
    headers_do_not_use = next(reader)
    data = list(reader)

Now we have `data` which does not contain headers row.

Let's generate some column headers:

In [9]:
headers = [f"col_{i}" for i in range(len(headers_do_not_use))]

And let's use those with `tabulate`:

In [10]:
print(tabulate(data, headers=headers))

col_0                                                 col_1    col_2       col_3       col_4       col_5       col_6       col_7       col_8       col_9       col_10      col_11      col_12      col_13
----------------------------------------------------  -------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
Aruba                                                 ABW      101665      102050      102565      103165      103776      104339      104865      105361      105846      106310      106766      107195
Africa Eastern and Southern                           AFE      518468229   532760424   547482863   562601578   578075373   593871847   609978946   626392880   643090131   660046272   677243299   694665117
Afghanistan                                           AFG      29185511    30117411    31161378    32269592    33370804    34413603    35383028    36296111    37171922    38041757    38

We can also have `tabulate` display a row index:

In [11]:
with open('population.csv') as f:
    reader = csv.reader(f)
    data = list(reader)

In [12]:
print(tabulate(data, headers="firstrow", showindex=True))

     Country Name                                          Country Code    2010        2011        2012        2013        2014        2015        2016        2017        2018        2019        2020        2021
---  ----------------------------------------------------  --------------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
  0  Aruba                                                 ABW             101665      102050      102565      103165      103776      104339      104865      105361      105846      106310      106766      107195
  1  Africa Eastern and Southern                           AFE             518468229   532760424   547482863   562601578   578075373   593871847   609978946   626392880   643090131   660046272   677243299   694665117
  2  Afghanistan                                           AFG             29185511    30117411    31161378    32269592    33370804    3441

We can even provide a list (or more generally, iterable) of our own row indices if we prefer.

Let's generate a list of index values, which needs to be the same length as the number of data rows in `data` (so one less than length, since first row is a header).

In [13]:
indexes = range(5, 10_000, 5)
indexes = indexes[:len(data) - 1]
print(indexes[0:5])

range(5, 30, 5)


In [14]:
print(tabulate(data, headers="firstrow", showindex=indexes))

      Country Name                                          Country Code    2010        2011        2012        2013        2014        2015        2016        2017        2018        2019        2020        2021
----  ----------------------------------------------------  --------------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------  ----------
   5  Aruba                                                 ABW             101665      102050      102565      103165      103776      104339      104865      105361      105846      106310      106766      107195
  10  Africa Eastern and Southern                           AFE             518468229   532760424   547482863   562601578   578075373   593871847   609978946   626392880   643090131   660046272   677243299   694665117
  15  Afghanistan                                           AFG             29185511    30117411    31161378    32269592    33370804   

`tabulate` also offers a wide variety of styles we can use when generating a table.

In [15]:
tabulate_formats

['fancy_grid',
 'fancy_outline',
 'github',
 'grid',
 'html',
 'jira',
 'latex',
 'latex_booktabs',
 'latex_longtable',
 'latex_raw',
 'mediawiki',
 'moinmoin',
 'orgtbl',
 'pipe',
 'plain',
 'presto',
 'pretty',
 'psql',
 'rst',
 'simple',
 'textile',
 'tsv',
 'unsafehtml',
 'youtrack']

As you can see there is quiet a variety of table styles to choose from - some of them you can use to generate tables for use in other system, like markdown files, or even latex.

If you find yourself generating markdown files and manually building up tables for example, then you can replace all that code and use the `tabulate` output string instead, using the particular style you need.

Let's take a look at a few examples:

In [16]:
print(tabulate(data, headers="firstrow", tablefmt="fancy_outline"))

╒══════════════════════════════════════════════════════╤════════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╤════════════╕
│ Country Name                                         │ Country Code   │ 2010       │ 2011       │ 2012       │ 2013       │ 2014       │ 2015       │ 2016       │ 2017       │ 2018       │ 2019       │ 2020       │ 2021       │
╞══════════════════════════════════════════════════════╪════════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡
│ Aruba                                                │ ABW            │ 101665     │ 102050     │ 102565     │ 103165     │ 103776     │ 104339     │ 104865     │ 105361     │ 105846     │ 106310     │ 106766     │ 107195     │
│ Africa Eastern and Southern                          │ AFE            │ 518468

If we want to generate tables structured for things like Jira, MediaWiki, GitHub, HTML, latex (and more), we simply specify the appropriate style (you can see examples of all the styles and some explanation of what each one is in the docs I linked at the top of this notebook).

In [17]:
print(tabulate(data, headers="firstrow", tablefmt="jira"))

|| Country Name                                         || Country Code   || 2010       || 2011       || 2012       || 2013       || 2014       || 2015       || 2016       || 2017       || 2018       || 2019       || 2020       || 2021       ||
| Aruba                                                | ABW            | 101665     | 102050     | 102565     | 103165     | 103776     | 104339     | 104865     | 105361     | 105846     | 106310     | 106766     | 107195     |
| Africa Eastern and Southern                          | AFE            | 518468229  | 532760424  | 547482863  | 562601578  | 578075373  | 593871847  | 609978946  | 626392880  | 643090131  | 660046272  | 677243299  | 694665117  |
| Afghanistan                                          | AFG            | 29185511   | 30117411   | 31161378   | 32269592   | 33370804   | 34413603   | 35383028   | 36296111   | 37171922   | 38041757   | 38928341   | 39835428   |
| Africa Western and Central                           | AFW     

In [18]:
print(tabulate(data, headers="firstrow", tablefmt="mediawiki"))

{| class="wikitable" style="text-align: left;"
|+ <!-- caption -->
|-
! Country Name                                         !! Country Code   !! 2010       !! 2011       !! 2012       !! 2013       !! 2014       !! 2015       !! 2016       !! 2017       !! 2018       !! 2019       !! 2020       !! 2021
|-
| Aruba                                                || ABW            || 101665     || 102050     || 102565     || 103165     || 103776     || 104339     || 104865     || 105361     || 105846     || 106310     || 106766     || 107195
|-
| Africa Eastern and Southern                          || AFE            || 518468229  || 532760424  || 547482863  || 562601578  || 578075373  || 593871847  || 609978946  || 626392880  || 643090131  || 660046272  || 677243299  || 694665117
|-
| Afghanistan                                          || AFG            || 29185511   || 30117411   || 31161378   || 32269592   || 33370804   || 34413603   || 35383028   || 36296111   || 37171922   || 3804175

|}


In [19]:
print(tabulate(data, headers="firstrow", tablefmt="html"))

<table>
<thead>
<tr><th>Country Name                                        </th><th>Country Code  </th><th>2010      </th><th>2011      </th><th>2012      </th><th>2013      </th><th>2014      </th><th>2015      </th><th>2016      </th><th>2017      </th><th>2018      </th><th>2019      </th><th>2020      </th><th>2021      </th></tr>
</thead>
<tbody>
<tr><td>Aruba                                               </td><td>ABW           </td><td>101665    </td><td>102050    </td><td>102565    </td><td>103165    </td><td>103776    </td><td>104339    </td><td>104865    </td><td>105361    </td><td>105846    </td><td>106310    </td><td>106766    </td><td>107195    </td></tr>
<tr><td>Africa Eastern and Southern                         </td><td>AFE           </td><td>518468229 </td><td>532760424 </td><td>547482863 </td><td>562601578 </td><td>578075373 </td><td>593871847 </td><td>609978946 </td><td>626392880 </td><td>643090131 </td><td>660046272 </td><td>677243299 </td><td>694665117 </td></tr>
<

In [20]:
print(tabulate(data, headers="firstrow", tablefmt="github"))

| Country Name                                         | Country Code   | 2010       | 2011       | 2012       | 2013       | 2014       | 2015       | 2016       | 2017       | 2018       | 2019       | 2020       | 2021       |
|------------------------------------------------------|----------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|------------|
| Aruba                                                | ABW            | 101665     | 102050     | 102565     | 103165     | 103776     | 104339     | 104865     | 105361     | 105846     | 106310     | 106766     | 107195     |
| Africa Eastern and Southern                          | AFE            | 518468229  | 532760424  | 547482863  | 562601578  | 578075373  | 593871847  | 609978946  | 626392880  | 643090131  | 660046272  | 677243299  | 694665117  |
| Afghanistan                                          | AFG            | 291855

`tabulate` isn't restricted to tabulating lists of lists, it can handle iterables of dicts (using keys as columns), lists of dataclasses, pandas dataframes, and more. Refer to the docs linked at the top of this notebook.

Let's see how it handles dictionaries:

In [21]:
d = [
    {'a': 1, 'b': 2, 'c': 3},
    {'b': 20, 'c': 30, 'd': 40},
    {'a': 100, 'b': 200, 'c': 300, 'd': 400}
]

In [22]:
print(tabulate(d))

---  ---  ---  ---
  1    2    3
      20   30   40
100  200  300  400
---  ---  ---  ---


As you can see it uses the union of all the keys in the list of dictionaries to determine all the columns - smart!

What about headers? We can provide tell it to use the keys as the headers:

In [23]:
print(tabulate(d, headers="keys"))

  a    b    c    d
---  ---  ---  ---
  1    2    3
      20   30   40
100  200  300  400


There are quite a few more options available, in particular for controlling column alignments if the defaults are not sufficient. I rarely use this, as the defaults have always worked great for me.

Look at this example, of how it is able to intelligently apply alignments for types such as floats, ints and strings:

In [24]:
headers = ["Floats", "Integers", "Strings"]
data2 = [
    ["3.14", 10, "*" * 10],
    [400.567, 100, "*" * 5],
    [0.7, 1000, "*" * 15],
    [1, -500, "*" * 2],
]

print(tabulate(data2, headers=headers, tablefmt="fancy_outline"))

╒══════════╤════════════╤═════════════════╕
│   Floats │   Integers │ Strings         │
╞══════════╪════════════╪═════════════════╡
│    3.14  │         10 │ **********      │
│  400.567 │        100 │ *****           │
│    0.7   │       1000 │ *************** │
│    1     │       -500 │ **              │
╘══════════╧════════════╧═════════════════╛
