# Making Tables Pretty (and sometimes more useful)

Data tables have a number of build-in formatters to make tables look pretty. The full reference list is [here.](https://www.data8.org/datascience/formats.html)

In this notebook we will review a few of the most useful.

In [1]:
import numpy as np
from datascience import *
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Load some data for the demonstration
COVID_data = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us.csv'
COVID=Table.read_table(COVID_data)
COVID.show(3)

date,geoid,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
2020-01-21,USA,1,0.14,0,0,0,0
2020-01-22,USA,0,0.14,0,0,0,0
2020-01-23,USA,0,0.14,0,0,0,0


## Dateformatter

This dataset has a `date` column, but the values were actually read in as strings.

In [3]:
COVID.column('date')

array(['2020-01-21', '2020-01-22', '2020-01-23', ..., '2023-03-21',
       '2023-03-22', '2023-03-23'],
      dtype='<U10')

Python doesn't know they are dates. That means there is no easy was to select a particular date range. How could we find the data for just March, 2020, for example?

We need to format this column so Python knows these are dates.

In [4]:
COVID = COVID.set_format('date', DateFormatter(format='%Y-%m-%d',))
COVID.show(3)

date,geoid,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
2020-01-21,USA,1,0.14,0,0,0,0
2020-01-22,USA,0,0.14,0,0,0,0
2020-01-23,USA,0,0.14,0,0,0,0


Looks the same, right? But let's see what type of data is stored in the `data` column now.

In [5]:
COVID.column('date')

array([  1.57956480e+09,   1.57965120e+09,   1.57973760e+09, ...,
         1.67935680e+09,   1.67944320e+09,   1.67952960e+09])

The array contains numbers that are actually time stamps. Now we can select a date range.

In [6]:
import time                # Python time functions
from time import strptime 
#time.time() # Seconds since common epoch

# Convert string representing our start and end times to datetimes.
time1 = time.mktime(strptime('2020-03-01', '%Y-%m-%d'))
time2 = time.mktime(strptime('2020-03-31', '%Y-%m-%d')) # Seconds since epoch

# Filter the table
Jan2022 = COVID.where('date',are.between(time1,time2))
Jan2022.show(3)

date,geoid,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
2020-03-01,USA,18,6.66,0,2,0.23,0
2020-03-02,USA,16,9.8,0,3,0.43,0
2020-03-03,USA,21,11.68,0,4,0.67,0


## NumberFormatter
Suppose we want a column to display with a particular number of decimal places. Let's say we want `deaths_avg` to three decimal places.

In [7]:
Jan2022 = Jan2022.set_format('deaths_avg', NumberFormatter(decimals=3))
Jan2022.show(3)

date,geoid,cases,cases_avg,cases_avg_per_100k,deaths,deaths_avg,deaths_avg_per_100k
2020-03-01,USA,18,6.66,0,2,0.23,0
2020-03-02,USA,16,9.8,0,3,0.43,0
2020-03-03,USA,21,11.68,0,4,0.67,0


Number formatter has additional super powers. Suppose we had a table such as this one:

In [8]:
numbers = make_array('23,000', '32,134', '12,112', '34,244')
comma_numbers = Table().with_columns('some_numbers', numbers)
comma_numbers

some_numbers
23000
32134
12112
34244


It is not unusal to obtain data with numbers using commas like this example. The column has an array of strings, but we want actual numbers. NumberFormatter to the rescue!

In [9]:
comma_numbers = comma_numbers.set_format('some_numbers', NumberFormatter())
comma_numbers

some_numbers
23000
32134
12112
34244


The numbers are still displayed with commas, but now the array hold numbers, not strings.

In [10]:
comma_numbers.column('some_numbers')

array([23000, 32134, 12112, 34244])

What if you don't like commas in your numbers? You change the `separator` from ',' to the empty string ''

In [None]:
comma_numbers = comma_numbers.set_format('some_numbers', NumberFormatter(separator=''))
comma_numbers

## CurrencyFormatter
What if those numbers represented currency. We'd like to display a dollar sign in front of each number.

In [None]:
comma_numbers = comma_numbers.set_format('some_numbers', CurrencyFormatter())
comma_numbers

Notice that for currency, the commas were put back in place. It looks better that way. The actual array is still just numbers.

In [11]:
comma_numbers.column('some_numbers')

array([23000, 32134, 12112, 34244])

## PercentFormater
We often hava a column of floating point numbers that represent percentages. Let's create simple example.

In [12]:
netflix = Table().with_columns(
    'Time', make_array('Time spent looking for a movie', 'Time spent watching a movie'),
    'Percentage', make_array(.95, .5)
)
netflix

Time,Percentage
Time spent looking for a movie,0.95
Time spent watching a movie,0.5


Display numbers as percentages.

In [13]:
netflix = netflix.set_format('Percentage', PercentFormatter)
netflix

Time,Percentage
Time spent looking for a movie,95.00%
Time spent watching a movie,50.00%


**That is a good start. You can look at the documentation to find other formatting options. I'm going to end this tutorial on formatting and go look for a movie to watch.**