# Reading in files and astropy tables

In astronomy, data is often saved in a table, and most online journals provide the option to download those tables as text files. It's really important to know how to open those tables and extract information from them. We have already read in several tables over the course of these tutorials, and now we're going to learn how to do it ourselves.

Now, there are several ways to read in tables: I know that numpy has several built in routines, and you may have heard of pandas too. If you already know how to use those, and find those easier to use, then go right ahead and stick with those. I, however, personally love using astropy tables - I think they are user-friendly and straightforward, IF the table you are trying to read in is properly formatted. Generally speaking, most of the tables you download from major journals (like ApJ or MNRAS) can be read in pretty easily by astropy, but sometimes you will have to go into the table's text file and format it manually. Python's open() routine can be used as a last resort if none of your table-reading modules work: this routine opens and reads text files line by line. open() will work for any kind of table, but it is very rudimentary and you will have to put in a lot more work to format things nicely.

This tutorial will be a little different to the previous ones, and we will work through a real table from a real paper. The example we will work with is Table 4 from the following Next Generation Virgo Survey (NGVS) paper (https://ui.adsabs.harvard.edu/abs/2020ApJS..250...17L/abstract). This paper is about a survey for ultracompact dwarf galaxies in the Virgo galaxy cluster.

Let's start by loading in the modules that we need:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from astropy.io import ascii
from astropy.table import Table

### Reading in a table

In order to read in the table, we need to specify the table file's directory path. If the file is in the same directory as your Python file/notebook, one can use './table_name.txt' (here './' points to the same directory as the Python file). Otherwise, you will need to use your terminal to determine what the path is.

One can read in the table using the astropy ascii.read() routine:

In [2]:
data = ascii.read('./ngvs_ucd_tab4.txt')
data

Name,MB,MV,rh,e_rh,Mstar,HRV,r_HRV,Class,Envelope,Method,OName
Unnamed: 0_level_1,mag,mag,pc,pc,dex(Msun),km / s,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
str11,float64,float64,float64,float64,float64,int64,str33,int64,int64,str2,str19
NGVS-UCD1,-11.62,-11.38,14.03,0.39,6.5,-38,SDSS,5,0,m1,--
NGVS-UCD2,-9.68,-9.21,29.86,0.76,6.3,--,--,1,0,m5,--
NGVS-UCD3,-9.27,-8.76,15.34,0.38,6.1,--,--,1,0,m5,--
NGVS-UCD4,-9.7,-9.27,29.48,0.54,6.2,--,--,1,0,m5,--
NGVS-UCD5,-9.33,-8.8,22.44,0.28,6.1,--,--,3,0,m5,--
NGVS-UCD6,-9.36,-8.93,24.29,1.07,6.0,--,--,1,0,m5,--
NGVS-UCD7,-9.4,-8.97,24.51,0.37,6.1,--,--,1,0,m5,--
NGVS-UCD8,-9.41,-8.95,23.33,0.37,6.2,--,--,1,0,m5,--
NGVS-UCD9,-9.3,-8.8,25.0,0.68,6.2,--,--,1,0,m5,--
...,...,...,...,...,...,...,...,...,...,...,...


ascii.read() creates a Table object. Python will usually only display a truncated version of this table: you can view the whole thing with .pprint_all() (in our case, we saved the table into the 'data' variable, so we would use "data.pprint_all()" to view the entire table).  

### Extracting information from a table

Let's extract some information from the table. To index a row, one would type in data[row number] - index rows just like you would index a list or an array, remember to start counting from 0! To index a column, one would type in data['column name']. A lot of tables will not come with column names, if this is the case you will most likely see 'col1', 'col2', 'col3'... as column headers.

Let's index the first row of the table, as well as the the stellar mass column (column name: Mstar) and half-light radius column (column name: rh).

In [3]:
first_row = data[0]
print(first_row)

   Name     MB     MV     rh  e_rh   Mstar    HRV   r_HRV Class Envelope Method OName
           mag    mag     pc   pc  dex(Msun) km / s                                  
--------- ------ ------ ----- ---- --------- ------ ----- ----- -------- ------ -----
NGVS-UCD1 -11.62 -11.38 14.03 0.39       6.5    -38  SDSS     5        0     m1    --


In [4]:
stellar_mass = data['Mstar']
print(stellar_mass)

  Mstar  
dex(Msun)
---------
      6.5
      6.3
      6.1
      6.2
      6.1
      6.0
      6.1
      6.2
      6.2
      ...
      6.8
      6.1
      5.3
      6.4
      6.4
      6.3
      6.6
      6.4
      6.8
      6.6
Length = 828 rows


In [5]:
radius = data['rh']
print(radius)

  rh 
  pc 
-----
14.03
29.86
15.34
29.48
22.44
24.29
24.51
23.33
 25.0
  ...
12.48
23.83
30.57
11.44
12.13
16.08
19.77
11.57
19.86
 31.5
Length = 828 rows


We created variables stellar_mass and radius to hold the stellar mass and radius columns, which are Column objects. In my experience, I have found that Column objects can be treated in a similar way to numpy arrays, but it's good practice to convert them to be actual arrays. One can do this with np.array(COLUMN). After doing so, one can perform any regular numpy mathematical operation on the column.

### Adding constraints to a table

Often, we're interested in certain subsets of a table: what if we want to look at only the largest UCDs (in radii)? Astropy tables have functions that make constraints easy to apply. Let's say we want to focus on UCDs with a radius of >50 parsecs:

In [None]:
# this radius_constraint variable is an array of booleans: the value will be True if the radius is > 50, False if not.
# You can print radius_constraint to check.
radius_constraint = np.array(radius) > 50

# applying the radius constraint to our UCD table is as simple as indexing the radius_constraint variable.
# Indexing the radius constraint will spit out a new table of only UCDs with rh > 50 parsecs. Let's save this new table as big_ucds.
# Column headers are conserved in the new table.
big_ucds = data[radius_constraint]

big_ucds

Name,MB,MV,rh,e_rh,Mstar,HRV,r_HRV,Class,Envelope,Method,OName
Unnamed: 0_level_1,mag,mag,pc,pc,dex(Msun),km / s,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
str11,float64,float64,float64,float64,float64,int64,str33,int64,int64,str2,str19
NGVS-UCD10,-11.8,-11.39,65.92,0.72,7.0,--,--,3,0,m5,--
NGVS-UCD43,-11.95,-11.51,52.91,1.22,7.2,--,--,3,0,m5,--
NGVS-UCD93,--,--,91.28,6.85,8.9,2074,"VCC,NED",2,0,m1,VCC-373
NGVS-UCD117,-11.11,-11.17,93.67,2.33,6.1,211,NED,6,0,m1,--
NGVS-UCD182,-14.09,-13.32,70.79,7.38,8.4,6840,"SDSS,NED",3,0,m8,--
NGVS-UCD224,-14.05,-13.35,59.0,1.16,8.3,13663,"VCC,SDSS,NED",3,0,m5,--
NGVS-UCD226,-11.06,-11.06,54.75,1.32,5.9,-3,SDSS,6,0,m1,--
NGVS-UCD276,-15.3,-14.57,62.09,4.33,8.9,655,"VCC,NED",2,0,m7,VCC-1146
NGVS-UCD278,-13.12,-12.38,80.46,6.29,8.0,1416,"VCC,SDSS,NED",2,0,m1,VCC-1148
...,...,...,...,...,...,...,...,...,...,...,...


Explicitly, what happened here: we defined an array of booleans called "radius_constraint", which was True if the UCD's radius was above 50pc, False if it wasn't. We then applied this constraint to our original table ("data"), and saved the cropped table as "big_ucds". After printing big_ucds, note that the table length has shrunk dramatically: out of the 828 UCDs we originally had, only 23 have a radius larger than 50 parsecs.

### EXERCISE 1

Using the provided table ngvs_ucd_tab4.txt:

- Compute the B - V color of each UCD, 

- Identify the reddest UCDs. 

When thinking about colors, remember that A), colors always subtract a red filter from a blue filter, and B), that smaller = brighter with magnitudes. This can be very confusing - explicitly, a UCD with a large (B-V) color is redder than a UCD with a small (B-V) color. Use -0.3 as the blue/red threshold in this exercise. 

In [None]:
### YOUR SOLUTION HERE

### Modifying and writing tables

Another important skill is to be able to create and save your own tables. This can be very useful if you wanted to save a copy of your data somewhere to be used later. To create a table, one should initialize a Table object:

In [28]:
mydata = Table()

Once the Table object has been initialized, one can add columns to it by doing the following:

In [None]:
mydata['column_1'] = np.arange(1, 10, 1) # adding a column that goes from 1 -> 9
mydata

column_1
int64
1
2
3
4
5
6
7
8
9


Here, the string within the square brackets is the column name, and you are assigning np.arange(1, 10, 1) to be the values within the column. 

We can add more columns by doing the same thing as above, or by using the add_column() routine:

In [None]:
mydata['column_2'] = np.full(9, 0) # adding a column full of zeroes, calling it "column_2"

mydata.add_column([1,1,1,1,1,1,1,1,1], name = 'column_3') # adding a column full of 1s, calling it "column_3"

mydata

column_1,column_2,column_3
int64,int64,int64
1,0,1
2,0,1
3,0,1
4,0,1
5,0,1
6,0,1
7,0,1
8,0,1
9,0,1


Similarly, one can remove columns using the .remove_column() routine:

In [None]:
mydata.remove_column('column_2') # remove column_2
mydata

column_1,column_3
int64,int64
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1


Columns can also be replaced, if one calls their header:

In [None]:
mydata['column_3'] = np.full(9, 3) # replacing column_3 (originally a column full of 1s) with a column full of 3s
mydata

column_1,column_3
int64,int64
1,3
2,3
3,3
4,3
5,3
6,3
7,3
8,3
9,3


One can also modify individual values in a column:

In [None]:
mydata['column_3'][0] = 5 # change the first row in column_3 to be 5
mydata

column_1,column_3
int64,int64
1,5
2,3
3,3
4,3
5,3
6,3
7,3
8,3
9,3


Note: all of the above is just a quick intro to some of the basic modifications you can make to a table. Astropy does have a lot more functionality, and is a lot more powerful than what I've briefly described. To get a better idea of what else you can do...[read the docs!](https://docs.astropy.org/en/stable/table/modify_table.html)

Finally, let's save our table. We use the ascii.write() routine to save tables, this works very similarly to saving figures (see Tutorial 3). Within ascii.write, you need to specify the table you want to save, and the directory you want to save the table in. 

In [35]:
ascii.write(mydata, 'my_first_table.txt')

I didn't specify a directory here, so "my_first_table.txt" is going to be saved in the same directory as this Python notebook. Some of the most common file extensions for text tables are '.txt', '.dat', and '.csv'. These tables, when written out, can be read into Python wtih ascii.read().

### EXERCISE 2

Using the provided table ngcs_ucd_tab4.txt:

- Delete all columns except for the 'Name' and 'Class' columns.
- Find all of the UCDs that are classified with 'Class = 1'.
- Make a new column full of 'yes' or 'no': if the UCD is 'Class = 1', it's 'yes', otherwise it's 'no'.
    - There are several ways to do this!
- Save the new modified table as "true_UCDs.txt".

In [None]:
## YOUR SOLUTION HERE