# Data from the Monson et al paper

In this notebook we are going over the machine readable tables in the paper by Monson et al, 2017. We will be extracting data from the different tables such as magnitude, right ascension, declination, type of RR Lyrae and other parameters which may help choosing stars for observation in our project.

## Analysing Table 5: Intensity Mean Magnitudes from GLOESS Light Curves

### Importing the data

Start by getting the data from the link given in the paper. The table can easily be read by a human, but passing it to a pandas dataframe will require some playing around.

In [None]:
url_table_5 = "https://cfn-live-content-bucket-iop-org.s3.amazonaws.com/journals/1538-3881/153/3/96/revision1/ajaa531bt5_mrt.txt?AWSAccessKeyId=AKIAYDKQL6LTV7YY2HIK&Expires=1666086155&Signature=DId1BBAb%2F0gP54YSV2Oox0Mh9m0%3D"

In [208]:
import pandas as pd

# Pass the table to a df
dataTable5 = pd.read_table(url_table_5, names = ['Data table 5'])

### Getting the header titles

First we want the header titles. These are in the first lines of the table, presented like this (only first lines shown)
```
   Bytes Format Units   Label   Explanations
--------------------------------------------------------------------------------
   1-  9 A9     ---     Name    Star Name
  11- 16 F6.3   mag     Umag    ?="" Intensity Mean U band magnitude
  18- 22 F5.3   mag   e_Umag    ?="" Uncertainty in Umag
  24- 29 F6.3   mag     Bmag    ?="" Intensity Mean B band magnitude
  31- 35 F5.3   mag   e_Bmag    ?="" Uncertainty in Bmag
  .
  .
  .
```
From here, we want to extract the labels and find the characters of the table they correspond to. It should be noted that due to the way that pandas imported, what we have at the moment is a dataframe with a single column and several rows, and we want to separate that, in this particular case, into 5 columns each with its own information.

In [156]:
# Rows in table 5 corresponding to the information in the header: 6-29
header_rows_temp_table_5 = dataTable5.iloc[6:29]

# Rename the column to something different, code can be refactored to delete this line
header_rows_temp_table_5 = header_rows_temp_table_5.rename(columns = {header_rows_temp_table_5.columns[0]:str(header_rows_temp_table_5.loc[7].values).strip('[\'   \']')})

# Reset index to make the df start from 1
header_rows_temp_table_5.reset_index(inplace = True, drop = True)

# Remove the first 3 lines and reset index again
header_rows_temp_table_5 = header_rows_temp_table_5[3:].reset_index(drop = True)

# Convert to series to be able to work with the Series.str method
header_rows_temp_table_5 = header_rows_temp_table_5[header_rows_temp_table_5.columns[0]]

Now that we have it in a series, we can use the pd.Series.str[slice_start:slice_end] method to divide each column into different ones and get the 5 columns we wanted at the beginning.

In [202]:
header_rows_final_table_5 = pd.DataFrame({'Bytes': header_rows_temp_table_5.str[0:8],
                       'Format': header_rows_temp_table_5.str[8:15],
                       'Units': header_rows_temp_table_5.str[15:20],
                       'Label': header_rows_temp_table_5.str[20:30],
                       'Explanation': header_rows_temp_table_5.str[30:]})
header_rows_final_table_5

Unnamed: 0,Bytes,Format,Units,Label,Explanation
0,1- 9,A9,---,Name,Star Name
1,11- 16,F6.3,mag,Umag,"?="""" Intensity Mean U band magnitude"
2,18- 22,F5.3,mag,e_Umag,"?="""" Uncertainty in Umag"
3,24- 29,F6.3,mag,Bmag,"?="""" Intensity Mean B band magnitude"
4,31- 35,F5.3,mag,e_Bmag,"?="""" Uncertainty in Bmag"
5,37- 42,F6.3,mag,Vmag,"?="""" Intensity Mean V band magnitude"
6,44- 48,F5.3,mag,e_Vmag,"?="""" Uncertainty in Vmag"
7,50- 55,F6.3,mag,Rcmag,"?="""" Intensity Mean Rc band magnitude"
8,57- 61,F5.3,mag,e_Rcmag,"?="""" Uncertainty in Rcmag"
9,63- 68,F6.3,mag,Icmag,"?="""" Intensity Mean Ic band magnitude"


### Extracting the star data

We now follow the same procedure to get the actual magnitude from stars.

In [198]:
data_rows_temp_table_5 = dataTable5.iloc[31:]

# Make it into a series
data_rows_temp_table_5 = data_rows_temp_table_5[data_rows_temp_table_5.columns[0]]

And we use the column labels in the label column from our header dataframe to name the columns we will be splitting the dataframe into. First it needs a bit of formatting, however...

In [195]:
# getting the labels for the columns
column_labels_table_5 = str(header_rows_final_table_5['Label'].values).strip('[\'   \']')
# Get rid of white spaces
column_labels_table_5 = column_labels_table_5.replace(' ', '')
# Get rid of line breaks
column_labels_table_5 = column_labels_table_5.replace('\n', '')
# Get rid of all the weird '' in the text
column_labels_table_5 = column_labels_table_5.replace('\'\'', ',')
#Make it into a list
column_labels_table_5 = column_labels_table_5.split(',')

Finally, we can use the list we just created and the bytes position from the byte column in the header dataframe to split everything nicely. Python indexing starts at 0, so that's why every slice starts a bit earlier that its header dataframe counterpart.

In [207]:
data_rows_final_table_5 = pd.DataFrame({column_labels_table_5[0]: data_rows_temp_table_5.str[0:8],
                                column_labels_table_5[1]: data_rows_temp_table_5.str[10:15],
                                column_labels_table_5[2]: data_rows_temp_table_5.str[17:21],
                                column_labels_table_5[3]: data_rows_temp_table_5.str[23:28],
                                column_labels_table_5[4]: data_rows_temp_table_5.str[30:34],
                                column_labels_table_5[5]: data_rows_temp_table_5.str[36:41],
                                column_labels_table_5[6]: data_rows_temp_table_5.str[43:47],
                                column_labels_table_5[7]: data_rows_temp_table_5.str[49:54],
                                column_labels_table_5[8]: data_rows_temp_table_5.str[56:60],
                                column_labels_table_5[9]: data_rows_temp_table_5.str[62:67],
                                column_labels_table_5[10]: data_rows_temp_table_5.str[69:73],
                                column_labels_table_5[11]: data_rows_temp_table_5.str[75:80],
                                column_labels_table_5[12]: data_rows_temp_table_5.str[82:86],
                                column_labels_table_5[13]: data_rows_temp_table_5.str[88:93],
                                column_labels_table_5[14]: data_rows_temp_table_5.str[95:99],
                                column_labels_table_5[15]: data_rows_temp_table_5.str[101:106],
                                column_labels_table_5[16]: data_rows_temp_table_5.str[108:112],
                                column_labels_table_5[17]: data_rows_temp_table_5.str[114:119],
                                column_labels_table_5[18]: data_rows_temp_table_5.str[121:125],
                                column_labels_table_5[19]: data_rows_temp_table_5.str[127:132]})

data_rows_final_table_5.head()

Unnamed: 0,Name,Umag,e_Umag,Bmag,e_Bmag,Vmag,e_Vmag,Rcmag,e_Rcmag,Icmag,e_Icmag,Jmag,e_Jmag,Hmag,e_Hmag,Ksmag,e_Ksmag,3.6mag,e_3.6mag,4.5mag
31,SW And,10.28,0.02,10.09,0.0,9.69,0.0,9.43,0.02,9.16,0.0,8.75,0.02,8.59,0.01,8.51,0.0,8.48,0.0,8.47
32,XX And,,,11.01,0.0,10.67,0.0,,,10.14,0.0,,,,,,,9.4,0.0,9.38
33,WY Ant,11.26,0.02,11.21,0.0,10.85,0.0,10.6,0.02,10.32,0.0,9.91,0.02,9.69,0.01,9.59,0.0,9.56,0.0,9.54
34,X Ari,10.25,0.01,10.06,0.0,9.56,0.0,9.23,0.02,8.86,0.0,8.3,0.02,8.05,0.01,7.92,0.0,7.88,0.0,7.85
35,AE Boo,,,10.88,0.0,10.64,0.0,,,10.25,0.0,,,,,,,9.75,0.01,9.74


## Analysing Table 1: RRL Galactic Calibrators and Ephemerides

The type of RR Lyrae is in table 2, we are going to repeat the process from earlier with this table and join the tables together.

### Importing the data


In [209]:
url_table_1 = 'https://cfn-live-content-bucket-iop-org.s3.amazonaws.com/journals/1538-3881/153/3/96/revision1/ajaa531bt1_mrt.txt?AWSAccessKeyId=AKIAYDKQL6LTV7YY2HIK&Expires=1666086155&Signature=%2Fny9aRsg73RzeCoz28H2%2Fmh8wW8%3D'

In [210]:
dataTable1 = pd.read_table(url_table_1, names = ['Data table 1'])

### Getting the header titles

In [222]:
# Rows in table 2 corresponding to the information in the header: 6-29
header_rows_temp_table_1 = dataTable1.iloc[6:19]

# Rename the column to something different, code can be refactored to delete this line
header_rows_temp_table_1 = header_rows_temp_table_1.rename(columns = {header_rows_temp_table_1.columns[0]:str(header_rows_temp_table_1.loc[7].values).strip('[\'   \']')})

# Reset index to make the df start from 1
header_rows_temp_table_1.reset_index(inplace = True, drop = True)

# Remove the first 3 lines and reset index again
header_rows_temp_table_1 = header_rows_temp_table_1[3:].reset_index(drop = True)

# Convert to series to be able to work with the Series.str method
header_rows_temp_table_1 = header_rows_temp_table_1[header_rows_temp_table_1.columns[0]]

In [226]:
header_rows_final_table_1 = pd.DataFrame({'Bytes': header_rows_temp_table_1.str[0:8],
                                          'Format': header_rows_temp_table_1.str[8:15],
                                          'Units': header_rows_temp_table_1.str[15:21],
                                          'Label': header_rows_temp_table_1.str[21:30],
                                          'Explanation': header_rows_temp_table_1.str[30:]})
header_rows_final_table_1.head()

Unnamed: 0,Bytes,Format,Units,Label,Explanation
0,1- 9,A9,---,Name,Star Name
1,11- 21,F11.9,d,PerF,Final period
2,23- 34,F12.4,d,HJD-ma,x TMMT HJD-max
3,36- 45,E10.3,d/yr,{zeta},"?="""" Quadratic O-C shape term, if required."
4,47- 50,A4,---,RRL,RR Lyrae Class


### Getting the star data

In [229]:
data_rows_temp_table_1 = dataTable1.iloc[40:]
# Make it into a series
data_rows_temp_table_1 = data_rows_temp_table_1[data_rows_temp_table_1.columns[0]]

In [234]:
# Getting the column labels
column_labels_table_1 = str(header_rows_final_table_1['Label'].values).strip('[\'   \']')
# Get rid of white spaces
column_labels_table_1 = column_labels_table_1.replace(' ', '')
# Get rid of line breaks
column_labels_table_1 = column_labels_table_1.replace('\n', '')
# Get rid of all the weird '' in the text
column_labels_table_1 = column_labels_table_1.replace('\'\'', ',')
#Make it into a list
column_labels_table_1 = column_labels_table_1.split(',')

In [240]:
data_rows_final_table_1 = pd.DataFrame({column_labels_table_1[0]: data_rows_temp_table_1.str[0:8],
                                        column_labels_table_1[1]: data_rows_temp_table_1.str[10:20],
                                        column_labels_table_1[2]: data_rows_temp_table_1.str[22:33],
                                        column_labels_table_1[3]: data_rows_temp_table_1.str[35:44],
                                        column_labels_table_1[4]: data_rows_temp_table_1.str[46:49],
                                        column_labels_table_1[5]: data_rows_temp_table_1.str[51:57],
                                        column_labels_table_1[6]: data_rows_temp_table_1.str[59:63],
                                        column_labels_table_1[7]: data_rows_temp_table_1.str[65:67],
                                        column_labels_table_1[8]: data_rows_temp_table_1.str[69:71],
                                        column_labels_table_1[9]: data_rows_temp_table_1.str[73:75],
                                        })
data_rows_final_table_1.head()

Unnamed: 0,Name,PerF,HJD-ma,{zeta},RRL,PerBL,[Fe/H],r_Par-HI,r_Par-BW,r_Par-HS
40,SW And,0.4422602,2456876.92,1.72,RRa,36.8,-0.2,HI,1.0,
41,XX And,0.722757,2456750.915,,RRa,,-1.9,HI,,
42,WY Ant,0.5743456,2456750.384,-1.46,RRa,,-1.4,HI,,
43,X Ari,0.65117288,2456750.387,-2.4,RRa,,-2.4,HI,4.0,
44,ST Boo,0.622286,2456750.525,,RRa,284.0,-1.7,HI,,


##