# Reading and plotting tables

This notebook provides a basic overview of reading, manipulating, and plotting tables of data. It serves as a very basic introduction to using the `astropy.tables` module and the `matplotlib` package. Questions, comments, and suggestions to the author are welcome.

1. [Reading in Data](#reading-in-data)
1. [Accessing Rows and Columns](#accessing-rows-and-columns)
1. [Joining Tables](#joining-tables)
1. [Creating Plots](#creating-plots)


## Reading in Data

The astropy package makes reading data from file very straight forward. Lets start by reading in a csv file and printing out the first ten rows of the table:

In [1]:
from astropy.table import Table

sn_table = Table.read('sn1a_list.csv', format='ascii.csv')
print(sn_table)

    sn    amusing_name      ra         dec      ...  COL   COLerr  Bmag  Bmagerr
--------- ------------ ----------- ------------ ... ------ ------ ------ -------
 SN2012fr      NGC1365 03:33:35.99  -36:07:37.7 ...   0.01  0.014 12.007   0.011
 SN2006mr    NGC1316_1 03:22:42.84  -37:12:28.5 ...  0.695  0.016 15.327   0.018
 SN2006dd    NGC1316_1 03:22:41.64  -37:12:13.2 ... -0.049  0.014 12.242   0.016
  SN2001A      NGC4261 12:19:23.01  +05:49:40.5 ...    nan    nan    nan     nan
  SN2009Y    NGC5728_1 14:42:23.68  -17:14:48.4 ...  0.167  0.015 13.981   0.044
 SN1999ee       IC5179 22:16:09.40  -36:50:31.5 ...  0.252  0.002 14.847   0.009
SN2017dps       IC4296 13:36:40.04 -33:58:01.29 ...    nan    nan    nan     nan
 SN2008ee      NGC0307 00:56:32.96  -01:46:16.0 ...    nan    nan    nan     nan
SN2017cze    2MIG_1546 11:09:46.82 -13:22:50.66 ...    nan    nan    nan     nan
 SN2000ey       IC1481 23:19:25.09  +05:54:21.9 ...    nan    nan    nan     nan
      ...          ...      

When we use print to see a table, what we get in return isn't very pretty, or even very useful to look at. Fortunatly, astropy tables have a special method called `show_in_notebook` that lets us visualize tables in a cleaner way. This method allows us to control how many entries we see at once, and even search for entries in the table. However, ss you might expect from the name, this method only works when using python in a Jupyter notebook.

Try interacting with the following table:

In [2]:
sn_table.show_in_notebook(display_length=10)

idx,sn,amusing_name,ra,dec,z,ST,STerr,COL,COLerr,Bmag,Bmagerr
0,SN2012fr,NGC1365,03:33:35.99,-36:07:37.7,0.0054,1.077,0.004,0.01,0.014,12.007,0.011
1,SN2006mr,NGC1316_1,03:22:42.84,-37:12:28.5,0.005508,0.5,0.004,0.695,0.016,15.327,0.018
2,SN2006dd,NGC1316_1,03:22:41.64,-37:12:13.2,0.005871,0.972,0.009,-0.049,0.014,12.242,0.016
3,SN2001A,NGC4261,12:19:23.01,+05:49:40.5,0.007469,,,,,,
4,SN2009Y,NGC5728_1,14:42:23.68,-17:14:48.4,0.009316,1.018,0.014,0.167,0.015,13.981,0.044
5,SN1999ee,IC5179,22:16:09.40,-36:50:31.5,0.01141,1.063,0.004,0.252,0.002,14.847,0.009
6,SN2017dps,IC4296,13:36:40.04,-33:58:01.29,0.012465,,,,,,
7,SN2008ee,NGC0307,00:56:32.96,-01:46:16.0,0.0134,,,,,,
8,SN2017cze,2MIG_1546,11:09:46.82,-13:22:50.66,0.01486,,,,,,
9,SN2000ey,IC1481,23:19:25.09,+05:54:21.9,0.02041,,,,,,


Notice that in the above example there was no need to explicity state the format of the file we are reading. This is because `Table.read()` will automatically iterate through multiple possible formats and try to find one that works. Unfortunatly this doesn't always work, and sometimes we need to provide a little more information.

This repository contains some data from the AMUSING survey as an ascii text files. You are encourage to open this files using a text editor if you don't already know what an ascii data table looks like.

The `ascii` format is not one of the formats that `Table.read` will try automatically. If we try to read the table in the same way as the cell above we get an error, demonstrated in the following cell. If astropy cannot read a file, it will suggest a few formats that you can try. However, if you ever need to look up the format table, its easier to look it up online [here](http://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table.read).

In [3]:
# The following line is expected to raise an error
amusing_data = Table.read('amusing_data.txt')

IORegistryError: Format could not be identified.
The available formats are:
           Format           Read Write Auto-identify Deprecated
--------------------------- ---- ----- ------------- ----------
                      ascii  Yes   Yes            No           
               ascii.aastex  Yes   Yes            No           
                ascii.basic  Yes   Yes            No           
                  ascii.cds  Yes    No            No           
     ascii.commented_header  Yes   Yes            No           
                  ascii.csv  Yes   Yes            No           
              ascii.daophot  Yes    No            No           
                 ascii.ecsv  Yes   Yes            No           
           ascii.fast_basic  Yes   Yes            No           
ascii.fast_commented_header  Yes   Yes            No           
             ascii.fast_csv  Yes   Yes            No           
       ascii.fast_no_header  Yes   Yes            No           
             ascii.fast_rdb  Yes   Yes            No           
             ascii.fast_tab  Yes   Yes            No           
          ascii.fixed_width  Yes   Yes            No           
ascii.fixed_width_no_header  Yes   Yes            No           
 ascii.fixed_width_two_line  Yes   Yes            No           
                 ascii.html  Yes   Yes           Yes           
                 ascii.ipac  Yes   Yes            No           
                ascii.latex  Yes   Yes           Yes           
            ascii.no_header  Yes   Yes            No           
                  ascii.rdb  Yes   Yes           Yes           
                  ascii.rst  Yes   Yes            No           
           ascii.sextractor  Yes    No            No           
                  ascii.tab  Yes   Yes            No           
                       fits  Yes   Yes           Yes           
                       hdf5  Yes   Yes           Yes           
                    votable  Yes   Yes           Yes           
                     aastex  Yes   Yes            No        Yes
                        cds  Yes    No            No        Yes
                        csv  Yes   Yes           Yes        Yes
                    daophot  Yes    No            No        Yes
                       html  Yes   Yes            No        Yes
                       ipac  Yes   Yes            No        Yes
                      latex  Yes   Yes            No        Yes
                        rdb  Yes   Yes            No        Yes

Lets try reading some ascii tables the right way. Notice that we use the descriptive variable name `amusing_data` so that we know what kind of data is in the table. This is always a good practice, especially when dealing with multiple tables.

In [None]:
# The following line should not raise an error
amusing_data = Table.read('amusing_data.txt', format='ascii')
amusing_data.show_in_notebook(display_length=10)

Depending on the file you are trying to read, finding the correct format argument to use can be a bit of trial and error. This is unfortunatly something that comes best through experiance.

## todo: Add example for delimiter and skip first n lines

## Accessing Rows and Columns

One of the biggest benefits of using astropy tables is how easy they make it to access and modify information. astropy lets us access both the rows and columns of a table using ***indexing*** Here is a brief list of things that you should keep in mind:

First lets try accessing some columns in our table `sn_table`. Columns can be accessed by indexing the table using the desired column name as a string. We can index a table using as many column names as we want:

In [4]:
print("The 'sn' column of sn_table:\n")
sn_only_table = sn_table['sn']
print(sn_only_table)

print("\n\nWe can also select multiple columns at once:\n")
multiple_colum_table = sn_table['sn', 'ra', 'dec']
print(multiple_colum_table)



The 'sn' column of sn_table:

    sn   
---------
 SN2012fr
 SN2006mr
 SN2006dd
  SN2001A
  SN2009Y
 SN1999ee
SN2017dps
 SN2008ee
SN2017cze
 SN2000ey
      ...
 SN2008gp
 SN2008hu
  SN2008O
  SN2008R
 SN2009ab
 SN2009ad
 SN2009ag
 SN2009al
 SN2009cz
 SN2009dc
  SN2009F
Length = 625 rows


We can also select multiple columns at once:

    sn         ra         dec     
--------- ----------- ------------
 SN2012fr 03:33:35.99  -36:07:37.7
 SN2006mr 03:22:42.84  -37:12:28.5
 SN2006dd 03:22:41.64  -37:12:13.2
  SN2001A 12:19:23.01  +05:49:40.5
  SN2009Y 14:42:23.68  -17:14:48.4
 SN1999ee 22:16:09.40  -36:50:31.5
SN2017dps 13:36:40.04 -33:58:01.29
 SN2008ee 00:56:32.96  -01:46:16.0
SN2017cze 11:09:46.82 -13:22:50.66
 SN2000ey 23:19:25.09  +05:54:21.9
      ...         ...          ...
 SN2008gp 03:23:00.73  +01:21:42.8
 SN2008hu 08:09:14.76  -18:39:13.1
  SN2008O 06:57:34.46  -45:48:44.3
  SN2008R 03:03:53.70  -11:59:39.4
 SN2009ab 04:16:36.39  +02:45:51.0
 SN2009ad 05:03:33.38  +06:39:35.7

In [None]:
print("\n\n\nHere is the first row of sn_table:\n")
print(sn_table[0])

print("\n\n\nHere is the first row of sn_table:\n")
print(sn_table[0:2])

print("\n\n\nThe first row of the 'sn' column:\n")
print(sn_table['sn'][0])

print("\n\n\nThe 'sn' column of the first row. Notice that this is the same as before:\n")
print(sn_table[0]['sn'])

Its also very easy to add and remove rows from a table. For example:


In [None]:
# Lets make a copy of the table so that we can preserve the original
example_table = sn_table.copy()

# Remove a single row by its index
example_table.remove_row(0)

# remove multiple rows by their index
sn_table.remove_rows([0, 1, 5])

# Append a new row to the end of the table
new_row = [123456789, 'NGC1365', '03:33:35.99', '-36:07:37.7', 0.0054, 1.03,
           'arx', 0.5, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1.077, 0.004, 0.01, 0.014,
           12.007, 0.011, 'PSNJ03333599-3607377']
example_table.add_row(new_row)

# remove the column 'ST'
example_table.remove_column('ST')

example_table.keep_columns(['sn', 'ra', 'dec'])
print(example_table)

bad_indices = np.where(my_table['my_col'] < 12)
my_table.remove_rows(bad_indices[0][0])


## Joining Tables

Combining two tables together is called ***joining*** them. This lets us combine information from multiple different tables together into a single, master table. Joining works by referencing ***key*** values. These are values that are used to match rows together so that we know what rows correspond to each other in different tables.

Imagine you have multiple tables of supernova data. A sensible key to pick when joining tables might be the supernova's name. Unfortunatly, its very common for different data tables to use different naming stratagies. The supernova "sn2012fr" might be stored in a table as "2012fr", "sn12fr", "2012fr", or even by a different, propriatery name like "NGC1365".

Heres an example of how to combine sn_table and the amusing data


In [None]:
from astropy.table import join

amusing_data_copy = amusing_data.copy()
sn_table_copy = sn_table.copy()

sn_table_copy.rename_column('amusing_name', 'key')
amusing_data_copy.rename_column('SNNAME', 'key')

combined_table = join(sn_table_copy, amusing_data_copy, keys='key')

combined_table.show_in_notebook(display_length=10)