# Data collation and analysis
This file handles collecting all collected data into one consistent table.

## Data Collection
First data is collected, then any non-gaia stellar IDs (such as HD numbers) are matched 
with a corresponding Gaia star entry. 

## Analysis
After data has been collected into a common table, mass analysis can be performed

In [1]:
from astropy.table import Table

from astroquery.gaia import Gaia

import pandas as pd

Gaia.MAIN_GAIA_TABLE = "gaiadr3.gaia_source"  # Reselect Data Release 3, default

## Log in to Gaia
Use credentials from gaia/CREDENTIALS file

In [2]:
Gaia.login(credentials_file='gaia/CREDENTIALS')
username = 'mwidmaie'

INFO: Login to gaia TAP server [astroquery.gaia.core]
OK
INFO: Login to gaia data server [astroquery.gaia.core]
OK


## Radii Data
The initial radii dataset was insufficient, so we need to use another dataset and combine them into one. 
We will use the data from `data/stellar_ldd.vot` as well as some Gaia data to manually calculate the stellar radii (using trigonometry).

In [None]:
stellar_ldd_table_name = 'user_' + username + ".stellar_ldd"

try:
    stellar_ldd = Gaia.load_table(stellar_ldd_table_name)
except:
    # Table doesn't exist so we're gonna upload from the local stellar_ldd.vot file
    print("Table does not exist, uploading from local machine.")
    Gaia.upload_table(upload_resource="data/stellar_ldd.vot", table_name='stellar_ldd', format="VOTable")
    stellar_ldd = Gaia.load_table(stellar_ldd_table_name)

## Update metadata for stellar_ldd and stellar_radii
This will make the next steps a lot easier. (and faster)

In [None]:
    Gaia.update_user_table(table_name=stellar_ldd_table_name, list_of_changes=[['"RAJ2000"', 'flags', 'Ra'], ['"DEJ2000"', 'flags', 'Dec'], ['"RAJ2000"', 'indexed', True], ['"DEJ2000"', 'indexed', True]])

## Retrieving Gaia Parallax and Parallax Error
We need to retrieve the parallax and parallax error from Gaia for each star in the stellar_ldd table.

With this data, we can calculate the distance to the star, and then use the angular diameter from the stellar_ldd table to calculate the stellar radius.
We can combine the data using the following ADQL query:
```sql
SELECT recno, source_id, Name, RAJ2000, DEJ2000, LDD, e_LDD, ra as gaia_ra, dec as gaia_dec, vmag, parallax, parallax_error,
DISTANCE(
    POINT(RAJ2000, DEJ2000),
    POINT(ra, dec)
) * 3600. AS dist_arcsec
FROM user_<username>.stellar_ldd AS stellar_ldd
JOIN gaiadr3.gaia_source AS gaia
-- Geometric Cross-Match: =======
ON DISTANCE(
    POINT(RAJ2000, DEJ2000),
    POINT(ra, dec)
) < 1.8 / 3600.
-- Condition: ===================
WHERE ABS(vmag - phot_g_mean_mag) < 2.
```

Processing will be done in Python.

In [None]:
import gzip

job = Gaia.launch_job(f"""SELECT TOP 466000 recno, source_id, Name, RAJ2000, DEJ2000, ra as gaia_ra, dec as gaia_dec, vmag, parallax, parallax_error, LDD, e_LDD,
DISTANCE(
    POINT(RAJ2000, DEJ2000),
    POINT(ra, dec)
) * 3600. AS dist_arcsec
FROM {stellar_ldd_table_name} AS stellar_ldd
JOIN gaiadr3.gaia_source AS gaia
-- Geometric Cross-Match: =======
ON DISTANCE(
    POINT(RAJ2000, DEJ2000),
    POINT(ra, dec)
) < 1.8 / 3600.
-- Condition: ===================
WHERE ABS(vmag - phot_g_mean_mag) < 2.""", dump_to_file=True, output_format='votable', output_file='data/stellar_ldd_gaia.vot.gz', verbose=True)

results = job.get_results()

with gzip.open('data/stellar_ldd_gaia.vot.gz', 'rb') as f:
    with open('data/stellar_ldd_gaia.vot', 'wb') as f2:
        f2.write(f.read())

## Calculating Stellar Radii
### Calculating Distance
We can calculate the distance to the star using the parallax from Gaia, using the following formula:
![Distance Formula](imgs/latex_distance.png)
Where:
d = distance to star (in parsecs)
p = parallax angle (in mas)

### Calculating Stellar Radius
We can calculate the stellar radius using the angular diameter from the stellar_ldd table, using the following formula:
![Stellar Radius Formula](imgs/latex_radius.png)
Where:
r_star = stellar radius (in kilometers)
d = distance to star (in parsecs)
theta = angular diameter (in radians)

### Calculating Stellar Radius Error
We can calculate the stellar radius error using the following formula:
![Stellar Radius Error Formula](imgs/latex_radius_error.png)

In [4]:
import numpy as np
df = Table.read('data/stellar_ldd_gaia.vot').to_pandas()
df['distance'] = 1000 / df['parallax']  # Convert parallax to distance in km
df['LDD_rad'] = df['LDD'] / 1000 / 3600 * np.pi / 180  # Convert LDD to radians (First convert to arcseconds, then to degrees, then to radians)
df['e_LDD_rad'] = df['e_LDD'] / 1000 / 3600 * np.pi / 180  # Convert e_LDD to radians
df['radius'] = 1/2 * df['distance'] * np.tan(df['LDD_rad']) * 3.086e+13  # Calculate radius in km 
df['distance_error'] = 1000 / np.square(df['parallax']) * df['parallax_error']  # Calculate distance error in parsecs
df['distance_error_ratio'] = df['distance_error'] / df['distance']  # Calculate distance error as a ratio
df['radius_error'] = np.sqrt(np.square(1/2 * df['distance'] * 1/(np.square(np.cos(df['LDD_rad']))) * df['e_LDD_rad']) + np.square(1/2 * df['distance_error'] * df['parallax_error'] * np.tan(df['LDD_rad']))) * 3.086e+13 # Calculate radius error in km
df['radius_error_ratio'] = df['radius_error'] / df['radius']  # Calculate radius error as a ratio

df['radius'] = df['radius'] / 695700  # Convert radius to solar radii
df['radius_error'] = df['radius_error'] / 695700  # Convert radius error to solar radii

df.drop(columns=['LDD_rad', 'e_LDD_rad'], inplace=True)
df.dropna(subset=['distance', 'distance_error', 'radius', 'radius_error', 'source_id', 'gaia_dec', 'gaia_ra'], inplace=True)
df.drop(columns=['parallax', 'parallax_error', 'LDD', 'e_LDD', 'RAJ2000', 'DEJ2000', 'Name', 'dist_arcsec', 'vmag', 'distance', 'distance_error', 'distance_error_ratio', 'recno'], inplace=True)
df

Unnamed: 0,source_id,gaia_ra,gaia_dec,radius,radius_error,radius_error_ratio
0,7632157690368,45.034343,0.235390,5.713006,0.127737,0.022359
1,16733192740608,45.152965,0.386342,1.489124,0.040466,0.027174
2,30343944744320,45.094992,0.476836,3.069619,0.085569,0.027876
3,44358422235136,45.501454,0.497697,1.223615,0.030880,0.025237
4,64531884461952,45.490906,0.741372,29.043697,0.770821,0.026540
...,...,...,...,...,...,...
285576,4538856292348877568,278.769402,27.059397,18.348716,0.537857,0.029313
285577,4538859320307829504,278.737038,27.078939,2.003283,0.055428,0.027669
285578,4538864504326336256,278.942034,27.211519,19.537612,0.512465,0.026230
285579,4538867944602450048,278.848242,27.246558,1.626073,0.045976,0.028274


In [6]:
# Save as a VOTable and upload to Gaia
stellar_radii_table_name = 'user_' + username + ".stellar_radii"
stellar_radii_vot = Table.from_pandas(df)
stellar_radii_vot.write('data/stellar_radii_finalized.vot', format='votable', overwrite=True)

Gaia.upload_table(upload_resource='data/stellar_radii_finalized.vot', table_name='stellar_radii', format='votable')
Gaia.update_user_table(table_name=stellar_radii_table_name, list_of_changes=[['"gaia_ra"', 'flags', 'Ra'], ['"gaia_dec"', 'flags', 'Dec'], ['"gaia_ra"', 'indexed', True], ['"gaia_dec"', 'indexed', True]])
Gaia.delete_user_table(table_name=stellar_ldd_table_name) # Delete the old stellar_ldd table

Sending file: data/stellar_radii_finalized.vot
Uploaded table 'stellar_radii'.


NameError: name 'stellar_ldd_table_name' is not defined

## Collating with stellar mass data
Using data from `data/stellar_radii_combined.vot` we can pair up the Gaia source ids from `data/stellar_masses.rawdata` to form a new dataset including both radii and mass, 2 of the 3 data points we need to calculate stellar core temperature

In [None]:
# We need to download and pre-process the data from https://cdsarc.cds.unistra.fr/viz-bin/nph-Cat/txt?I/360/binmass.dat.gz
# This is a table of stellar masses
import requests

url = 'https://cdsarc.cds.unistra.fr/viz-bin/nph-Cat/txt?I/360/binmass.dat'
print("Downloading data from " + url + "... This may take a while.")
r = requests.get(url, allow_redirects=True)

with open('data/binmass.rawdata', 'wb') as f:
   f.write(r.content)
    
print('Download complete, processing...')
fixed_content = ''
# Now we need to process the data, since it's not in a standard format
with open('data/binmass.rawdata', 'r') as f:
    lines = f.readlines()[4:-1]
    for line in lines:
        if line[0] == '#' or line[0] == '-':
            continue
        content = line.replace('|', ',')
        ra_and_dec = content.split(',')[13]
        ra = ra_and_dec.split(' ')[0]
        dec = ra_and_dec.split(' ')[-1]
        content = content.replace(ra_and_dec, ra + ',' + dec).replace(' ', '')
        fixed_content += content
        
with open('data/stellar_masses.rawdata', 'w') as f:
    f.write(fixed_content)
    
print('Done')

## Uploading stellar_masses to Gaia
We need to upload the stellar_masses data to Gaia so we can join it with the stellar_radii data.

In [None]:
# Load stellar_masses.rawdata as a Pandas DataFrame

# Removed the column headers because they weren't specific enough, we'll use these column names instead
columns = ['source_id', 'mass_primary', 'mass_primary_lower', 'mass_primary_upper', 'mass_secondary', 'mass_secondary_lower', 'mass_secondary_upper', 'flux_ratio', 'flux_ratio_lower', 
           'flux_ratio_upper', 'method', 'reference_for_primary_mass', 'flag', 'ra2016', 'dec2016']

# Load the data into a Pandas DataFrame
stellar_masses = pd.read_csv('data/stellar_masses.rawdata', names=columns, skiprows=4)
stellar_masses.drop(columns=['mass_primary_lower', 'mass_primary_upper', 'mass_secondary_lower', 'mass_secondary_upper', 'flux_ratio_lower', 'flux_ratio_upper', 'flag', 'method', 'reference_for_primary_mass'], inplace=True)

# Drop any with NaN ra or dec
stellar_masses.dropna(subset=['ra2016', 'dec2016'], inplace=True)

# Save as a VOTable
stellar_masses_vot = Table.from_pandas(stellar_masses)
stellar_masses_vot.write('data/stellar_masses.vot', format='votable', overwrite=True)

# Upload to Gaia
Gaia.upload_table(upload_resource='data/stellar_masses.vot', table_name='stellar_masses', format='votable')

## Joining stellar_masses and stellar_radii

We can join the stellar_masses and stellar_radii tables using the following ADQL query:
```sql
SELECT TOP 150000 stellar_radii.source_id, gaia_ra as ra, gaia_dec as dec, radius, radius_error, mass_primary, mass_secondary
FROM {stellar_radii_table_name} AS stellar_radii
JOIN {stellar_masses_table_name} AS stellar_masses
ON stellar_radii.source_id = stellar_masses.source_id
```

In [13]:
import gzip

stellar_radii_table_name = 'user_' + username + ".stellar_radii"
stellar_masses_table_name = 'user_' + username + ".stellar_masses"
combined_table_name = 'user_' + username + ".stellar_masses_and_radii"

try:
    stellar_mass_and_radii = Gaia.load_table(combined_table_name)
except:
    # Table doesn't exist so we're gonna upload from the local stellar_ldd.vot file
    print("Table does not exist, running queries and uploading from local machine.")
    job = Gaia.launch_job(f"""SELECT TOP 150000 stellar_radii.source_id, gaia_ra as ra, gaia_dec as dec, radius, radius_error, mass_primary, mass_secondary
FROM {stellar_radii_table_name} AS stellar_radii
JOIN {stellar_masses_table_name} AS stellar_masses
ON stellar_radii.source_id = stellar_masses.source_id""", dump_to_file=True, output_format='votable', output_file='data/stellar_masses_and_radii.vot.gz', verbose=True)
    
    results = job.get_results()
    
    with gzip.open('data/stellar_masses_and_radii.vot.gz', 'rb') as f:
        with open('data/stellar_masses_and_radii.vot', 'wb') as f2:
            f2.write(f.read())
            
    table = Table.read('data/stellar_masses_and_radii.vot')
    # Set units for the table
    table['radius'].unit = 'solar radii'
    table['radius_error'].unit = 'Rsun'
    table['mass_primary'].unit = 'Msun'
    table['mass_secondary'].unit = 'Msun'
    table['ra'].unit = 'deg'
    table['dec'].unit = 'deg'
    
    # Save as a VOTable and upload to Gaia
    table.write('data/stellar_masses_and_radii.vot', format='votable', overwrite=True)
            
    Gaia.upload_table(upload_resource='data/stellar_masses_and_radii.vot', table_name='stellar_masses_and_radii', format='votable')
    Gaia.update_user_table(table_name=combined_table_name, list_of_changes=[['ra', 'flags', 'Ra'], ['dec', 'flags', 'Dec'], ['ra', 'indexed', True], ['dec', 'indexed', True]])
    
    # Remove the old tables
    Gaia.delete_user_table(table_name=stellar_radii_table_name)
    Gaia.delete_user_table(table_name=stellar_masses_table_name)  

Retrieving table 'user_mwidmaie.stellar_masses_and_radii'
500 Error 500:
esavo.tap.TAPException: Code: 404, msg: Table 'user_mwidmaie.stellar_masses_and_radii' not found.
Table does not exist, running queries and uploading from local machine.
Launched query: 'SELECT TOP 150000 stellar_radii.source_id, gaia_ra as ra, gaia_dec as dec, radius, radius_error, mass_primary, mass_secondary
FROM user_mwidmaie.stellar_radii AS stellar_radii
JOIN user_mwidmaie.stellar_masses AS stellar_masses
ON stellar_radii.source_id = stellar_masses.source_id'
------>https
host = gea.esac.esa.int:443
context = /tap-server/tap/sync
Content-type = application/x-www-form-urlencoded
200 200
[('Date', 'Tue, 02 Jan 2024 00:50:44 GMT'), ('Server', 'Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_jk/1.2.43'), ('Cache-Control', 'no-cache, no-store, max-age=0, must-revalidate'), ('Pragma', 'no-cache'), ('Expires', '0'), ('X-XSS-Protection', '1; mode=block'), ('X-Frame-Options', 'SAMEORIGIN'), ('X-Content-Type-Options', '



Sending file: data/stellar_masses_and_radii.vot
Uploaded table 'stellar_masses_and_radii'.
Retrieving table 'user_mwidmaie.stellar_masses_and_radii'
Table 'user_mwidmaie.stellar_masses_and_radii' updated.
Table 'user_mwidmaie.stellar_radii' deleted.
Table 'user_mwidmaie.stellar_masses' deleted.


## Calculating Stellar Core Temperature