# Mesa Data Preparation
So, I've finally decided that there's no way to circumvent the need to use MESA (Modules for Experimental Stellar Astrophysics) to do this research.

This means we will have to prepare the initial simulation data for MESA. We will randomly sample 1000 stars from the Gaia DR3 dataset retrieved using the following ADQL query:

```sql
SELECT TOP 1000000
	gaia.source_id, 
	gaia.ra as ra,
	gaia.dec as dec,
	parameters.mass_flame AS mass,
	parameters.radius_gspphot AS radius,
	parameters.age_flame as age
FROM gaiadr3.astrophysical_parameters AS parameters
INNER JOIN gaiadr3.gaia_source as gaia
ON gaia.source_id = parameters.source_id
WHERE 
	parameters.mass_flame IS NOT NULL
	AND parameters.fem_gspspec IS NOT NULL
	AND parameters.age_flame IS NOT NULL
	AND parameters.evolstage_flame > 100 -- 100 = Zero age main sequence star
	AND parameters.evolstage_flame < 360 -- 360 = main sequence turn off
```

## The Simulations
We will be running an evolutionary simulation for each star up until its current age (estimated by Gaia).
We will provide metallicity, initial mass, age, and initial helium (calculated from the metallicity) as inputs to the simulation.

We will export the simulation parameters to a csv `data.csv` file and use that as input for parallel execution using GNU Parallel.
The data will be exported to the `parallel` directory which will then be used to run the simulations.

In [2]:
# Import libraries
import pandas as pd
import numpy as np
from astropy.table import Table
from astropy.io import fits
import os
import requests
import gzip
from collections import namedtuple
from typing import Union
from pathlib import Path

### Import the Gaia Data and convert to a pandas DataFrame
The above query must be run in the [Gaia Archive Advanced Query Tool (ADQL)](https://gea.esac.esa.int/archive/) and then the results must be downloaded and stored in the `data` directory.

First, we must unzip the file, then we can read it into a pandas DataFrame.

In [2]:
filename = 'data/gaia_astrophysical_parameters.vot.gz'

# If unzipped file does not exist, unzip and decode it
if not os.path.exists('data/gaia_astrophysical_parameters.vot'):
    with gzip.open(filename, 'rb') as f:
        with open('data/gaia_astrophysical_params.vot', 'wb') as g:
            g.write(f.read())

In [3]:
df: pd.DataFrame = Table.read('data/gaia_astrophysical_params.vot').to_pandas()
df

Unnamed: 0,SOURCE_ID,ra,dec,log_surface_gravity,dist,mass,radius,log_fe_h_abundance,log_n_fe_abundance,log_s_fe_abundance,log_metalicity,age,evolution_stage,spectral_type
0,6030024480434041088,256.473786,-28.289514,3.9771,177.857895,1.409851,2.0905,-0.09,0.47,0.11,0.04,2.920355,316,F
1,6030111685419272576,255.038856,-28.811967,4.2980,182.385498,1.171434,1.1288,0.18,,,-0.54,1.242147,160,G
2,5253752950266984704,157.348507,-62.332491,3.7712,279.329987,1.670828,2.9348,-0.19,,0.19,-0.00,1.923422,337,F
3,6030062409175986176,254.932569,-28.931663,3.9315,405.607208,1.477585,2.2852,-0.13,,,0.20,2.654922,327,F
4,5961557062324546304,265.895060,-37.946593,3.9318,342.907501,1.518954,2.1685,0.03,,,0.06,2.108766,277,F
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
205584,1796082381758247552,328.159849,25.054985,4.0129,112.224403,1.343488,1.8720,-0.07,0.02,-0.04,0.25,3.229928,312,F
205585,1794976651018650624,328.512913,22.939668,4.1252,213.870605,1.105674,1.3825,-0.02,,,-0.05,5.438365,337,F
205586,1796123682164850688,326.907823,24.488709,3.9158,355.876312,1.487074,2.3270,-0.16,,0.16,-0.11,2.510785,303,F
205587,1795069284873344640,328.001389,23.488187,4.1705,167.741104,1.119260,1.2895,0.12,,,-0.22,4.767622,320,F


## Spectroscopy Data (APOGEE DR17)
We can improve our MESA simulations by using more fine-grained spectroscopy data.

Download the FITS file from SDSS's (Sloan Digital Sky Survey) [cloud storage](https://data.sdss.org/sas/dr17/apogee/spectro/aspcap/dr17/synspec_rev1/allStar-dr17-synspec_rev1.fits)

This may take a while to download.

In [4]:
url = 'https://data.sdss.org/sas/dr17/apogee/spectro/aspcap/dr17/synspec_rev1/allStar-dr17-synspec_rev1.fits'

if not os.path.exists('data/APOGEE_DR17.fits'):
    response = requests.get(url)
response.raise_for_status()

# Store to 'data/galah_all_stars_spectroscopy.fits'
with open('data/APOGEE_DR17.fits', 'wb') as f:
    f.write(response.content)

In [4]:
# Load the data
cols = ['GAIAEDR3_SOURCE_ID', 'C_FE', 'C_FE_FLAG', 'N_FE', 'N_FE_FLAG', 'O_FE', 'O_FE_FLAG', 'NA_FE', 'NA_FE_FLAG', 'MG_FE', 'MG_FE_FLAG', 'AL_FE', 'AL_FE_FLAG', 'SI_FE', 'SI_FE_FLAG', 'S_FE', 'S_FE_FLAG', 'K_FE', 'K_FE_FLAG', 'CA_FE', 'CA_FE_FLAG', 'TI_FE', 'TI_FE_FLAG', 'CR_FE', 'CR_FE_FLAG', 'MN_FE', 'MN_FE_FLAG', 'FE_H', 'FE_H_FLAG', 'NI_FE', 'NI_FE_FLAG', 'P_FE', 'P_FE_FLAG']
tbl: pd.DataFrame = Table.read('data/APOGEE_DR17.fits', format='fits', hdu=1)
apogee_spectroscopy = tbl[cols].to_pandas()

# Neon is not measured since its spectra are in the UV so we will use the average of the [O/Fe] and [Mg/Fe] spectra since they are created in the same fusion processes as Neon
# Suggested by this paper: https://iopscience.iop.org/article/10.3847/1538-3881/ac9bfa#ajac9bfas3
apogee_spectroscopy['NE_FE'] = (apogee_spectroscopy['O_FE'] + apogee_spectroscopy['MG_FE']) / 2
apogee_spectroscopy['NE_FE_FLAG'] = apogee_spectroscopy['O_FE_FLAG'] + apogee_spectroscopy['MG_FE_FLAG'] # Arbitrary flag that will be zero if the both oxygen and magnesium are good data

apogee_spectroscopy

Unnamed: 0,GAIAEDR3_SOURCE_ID,C_FE,C_FE_FLAG,N_FE,N_FE_FLAG,O_FE,O_FE_FLAG,NA_FE,NA_FE_FLAG,MG_FE,...,MN_FE,MN_FE_FLAG,FE_H,FE_H_FLAG,NI_FE,NI_FE_FLAG,P_FE,P_FE_FLAG,NE_FE,NE_FE_FLAG
0,0,0.004847,0,0.124265,0,0.114938,0,0.146668,0,0.035147,...,0.040870,0,0.003463,0,0.051278,0,,2,0.075043,0
1,538028216707715712,0.009295,0,0.151220,0,0.083402,0,0.050112,0,0.030429,...,,64,-0.160680,0,0.007683,0,,2,0.056915,0
2,2413929812587459072,0.061738,0,-0.111900,256,0.235343,0,-0.190826,0,0.165238,...,-0.078027,0,-0.275530,0,0.013930,0,,2,0.200291,0
3,422596679964513792,0.112730,0,0.114560,32,0.045069,0,-0.179346,0,0.038494,...,0.063633,0,-0.252970,0,-0.108750,0,,2,0.041782,0
4,422596679964513792,0.032651,0,-0.489060,288,0.140573,0,-1.166266,0,-0.001095,...,-0.153027,0,-0.214170,0,-0.069440,0,,2,0.069739,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
733896,2341765776376373376,,0,,0,,0,,0,,...,,0,,0,,0,,2,,0
733897,1998097371124974720,-0.060314,0,0.201880,0,0.088913,0,-0.127408,0,0.083532,...,-0.000038,0,-0.236560,0,0.022203,0,,2,0.086223,0
733898,1994741318040223232,-0.011308,0,0.209650,0,0.050574,0,0.042082,0,0.064306,...,0.036437,0,0.114820,0,0.032992,0,,2,0.057440,0
733899,6379914575198998272,-0.616490,0,-0.004530,288,-0.064257,0,0.043602,0,-0.211707,...,-0.584298,0,-1.050500,0,-0.103507,0,,258,-0.137982,0


In [5]:
## Cross-match the Gaia data with the Galah data (using the `SOURCE_ID` in the Gaia data and the `dr3_source_id` in the Galah data)
apogee_spectroscopy.rename(columns={'GAIAEDR3_SOURCE_ID': 'SOURCE_ID'}, inplace=True)
matched_df = pd.merge(df, apogee_spectroscopy[
    ['SOURCE_ID', 'C_FE', 'C_FE_FLAG', 'O_FE', 'O_FE_FLAG', 'NE_FE_FLAG', 'NE_FE_FLAG', 'MG_FE', 'MG_FE_FLAG', 'AL_FE', 'NE_FE', 'NE_FE_FLAG',
     'AL_FE_FLAG', 'SI_FE', 'SI_FE_FLAG', 'K_FE', 'K_FE_FLAG', 'CA_FE', 'CA_FE_FLAG', 'TI_FE', 'TI_FE_FLAG', 'CR_FE',
     'CR_FE_FLAG', 'MN_FE', 'MN_FE_FLAG', 'NI_FE', 'NI_FE_FLAG', 'FE_H', 'FE_H_FLAG', 'P_FE', 'P_FE_FLAG', 'N_FE','N_FE_FLAG', 'S_FE', 'S_FE_FLAG']], on='SOURCE_ID', how='inner')
matched_df.rename(columns={'C_FE': 'C', 'O_FE': 'O', 'NE_FE': 'Ne',
                           'MG_FE': 'Mg', 'AL_FE': 'Al', 'SI_FE': 'Si',
                           'K_FE': 'K', 'CA_FE': 'Ca', 'TI_FE': 'Ti',
                           'CR_FE': 'Cr', 'MN_FE': 'Mn', 'NI_FE': 'Ni',
                           'FE_H': 'Fe', 'P_FE': 'P', 'Ne_Fe': 'Ne',
                           'N_FE_FLAG': 'flag_N', 'P_FE_FLAG': 'flag_P',
                           'FE_H_FLAG': 'flag_Fe', 'TI_FE_FLAG': 'flag_Ti',
                           'N_FE': 'N', 'S_FE': 'S', 'C_FE_FLAG': 'flag_C',
                           'O_FE_FLAG': 'flag_O', 'NE_FE_FLAG': 'flag_Ne',
                           'MG_FE_FLAG': 'flag_Mg', 'AL_FE_FLAG': 'flag_Al',
                           'SI_FE_FLAG': 'flag_Si', 'K_FE_FLAG': 'flag_K',
                           'CA_FE_FLAG': 'flag_Ca', 'S_FE_FLAG': 'flag_S',
                           'CR_FE_FLAG': 'flag_Cr', 'MN_FE_FLAG': 'flag_Mn',
                           'NI_FE_FLAG': 'flag_Ni'},
                  inplace=True)

matched_df = matched_df.dropna(subset=['Fe', 'N', 'O', 'C', 'Mg', 'Si', 'Ne'])
matched_df = matched_df[matched_df['flag_Fe'] == 0]

matched_df

Unnamed: 0,SOURCE_ID,ra,dec,log_surface_gravity,dist,mass,radius,log_fe_h_abundance,log_n_fe_abundance,log_s_fe_abundance,...,Ni,flag_Ni,Fe,flag_Fe,P,flag_P,N,flag_N,S,flag_S
0,2264702071538672896,286.165066,73.194919,4.1202,275.134705,1.228819,1.4829,0.04,,,...,0.002124,0,-0.089597,0,,2,-0.126015,288,0.002999,0
2,2264719354486576256,287.199870,73.349541,4.4956,134.355392,0.959756,0.8607,-0.14,,,...,0.000854,0,-0.099274,0,,2,-0.018811,256,0.079976,0
3,2268464325450037120,281.577653,75.303370,4.3004,236.193298,1.060581,1.1950,0.21,,,...,0.039530,0,0.190910,0,,2,0.341450,32,-0.031137,0
4,2268469823008193152,282.016475,75.423453,4.5217,145.762207,0.823140,0.8730,-0.00,,,...,0.094350,0,0.211670,0,,2,0.250330,0,-0.087508,0
5,2266478779249533952,276.523338,71.531131,4.3552,131.852295,1.015231,1.1078,-0.14,,,...,0.026220,0,-0.129550,0,,2,0.005228,32,0.048309,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17077,4273113849562926336,276.690990,-0.100930,4.0678,167.999603,1.262112,1.5603,0.15,,0.20,...,-0.020234,0,0.049639,0,,2,-0.351201,288,-0.060317,0
17078,6761953050076021120,283.186731,-28.729720,4.0894,130.914795,1.365962,1.6302,0.01,,0.31,...,-0.108570,0,-0.209000,0,,2,0.927310,32,0.050425,0
17079,6761953050076021120,283.186731,-28.729720,4.0894,130.914795,1.365962,1.6302,0.01,,0.31,...,-0.014040,0,-0.192390,0,,2,0.044069,32,0.103573,0
17080,6762049497853730944,284.426090,-28.255835,4.5397,107.947098,0.809467,0.7792,0.01,,,...,0.024032,0,-0.114710,0,,2,-0.104517,256,0.039461,0


### Necessary Constants

In [35]:
# According to this paper: https://www.aanda.org/articles/aa/full_html/2020/06/aa37694-20/aa37694-20.html
SOLAR_METALLICITY = np.log10(0.017)

# Gases produced by the big bang were approximately 24.9% helium, still according to the above paper
Y_PROTO = 0.249
Y_SUN = 0.279
Z_SUN = 0.017

NUM_STARS = 2000 
SUN_HE_ABUNDANCE = 10.8

HYDROGEN_MASS = 1.00784
HELIUM_MASS = 4.002602
CARBON_MASS = 12.011
NITROGEN_MASS = 14.0067
OXYGEN_MASS = 15.999
NEON_MASS = 20.1797
SODIUM_MASS = 22.989769
MAGNESIUM_MASS = 24.305
ALUMINUM_MASS = 26.981539
SILICON_MASS = 28.0855
PHOSPHORUS_MASS = 30.973762 
SUlFUR_MASS = 32.065
POTASSIUM_MASS = 39.0983
CALCIUM_MASS = 40.078
TITANIUM_MASS = 47.867
CHROMIUM_MASS = 51.9961
MANGANESE_MASS = 54.938044
IRON_MASS = 55.845
NICKEL_MASS = 58.6934

masses = {
    'H': HYDROGEN_MASS,
    'He': HELIUM_MASS,
    'C': CARBON_MASS,
    'N': NITROGEN_MASS,
    'O': OXYGEN_MASS,
    'Ne': NEON_MASS,
    'Na': SODIUM_MASS,
    'Mg': MAGNESIUM_MASS,
    'Al': ALUMINUM_MASS,
    'Si': SILICON_MASS,
    'P': PHOSPHORUS_MASS,
    'S': SUlFUR_MASS,
    'K': POTASSIUM_MASS,
    'Ca': CALCIUM_MASS,
    'Ti': TITANIUM_MASS,
    'Cr': CHROMIUM_MASS,
    'Mn': MANGANESE_MASS,
    'Fe': IRON_MASS,
    'Ni': NICKEL_MASS,
}

SPECTROSCOPY_ATOMS = ['C', 'O', 'N', 'Ne', 'Mg', 'Al', 'Si', 'P', 'S', 'K', 'Ca', 'Ti', 'Cr', 'Mn', 'Fe', 'Ni']

### Stellar Data Model
The stellar data model is a object that contains the following properties for a star:
- mass (mSun)
- relative abundances
- mass fractions (calculated from relative abundances)
- metallicity (calculated from mass fractions)
- age
- radius (rSun)

A stellar data model is the input for a MESA simulation and contains all necessary information to run a stellar evolution simulation.

In [7]:
Abundance = namedtuple('Abundance', ['atom', 'relAbundance'])  # Atom name, relative abundance
AbundanceVal = namedtuple('AbundanceVal', ['aMass', 'abundance'])

### Solar Baseline Model
The solar data model is used to calculate metallicities and absolute abundances for any given star.

The solar model is based on the abundances of the Sun as calculated in this [paper](https://iopscience.iop.org/article/10.3847/1538-3881/ac9bfa#ajac9bfas3).

In [8]:
class SunModel:
    def __init__(self):
        self.mass = 1
        self.abundances = {
            'H': AbundanceVal(HYDROGEN_MASS, 12),
            'He': AbundanceVal(HELIUM_MASS, SUN_HE_ABUNDANCE),
            'C': AbundanceVal(CARBON_MASS, 8.62),
            'N': AbundanceVal(NITROGEN_MASS, 7.94),
            'O': AbundanceVal(OXYGEN_MASS, 8.84),
            'Ne': AbundanceVal(NEON_MASS, 7.57),
            'Na': AbundanceVal(SODIUM_MASS, 6.28),
            'Mg': AbundanceVal(MAGNESIUM_MASS, 7.6),
            'Al': AbundanceVal(ALUMINUM_MASS, 6.52),
            'Si': AbundanceVal(SILICON_MASS, 7.65),
            'P': AbundanceVal(PHOSPHORUS_MASS, 5.5),
            'S': AbundanceVal(SUlFUR_MASS, 7.2),
            'K': AbundanceVal(POTASSIUM_MASS, 5.16),
            'Ca': AbundanceVal(CALCIUM_MASS, 6.35),
            'Ti': AbundanceVal(TITANIUM_MASS, 5.05),
            'Cr': AbundanceVal(CHROMIUM_MASS, 5.71),
            'Mn': AbundanceVal(MANGANESE_MASS, 5.42),
            'Fe': AbundanceVal(IRON_MASS, 7.5),
            'Ni': AbundanceVal(NICKEL_MASS, 6.28),
        }
        self.age = 4.57e9
        self.radius = 1.0
        self.mass_fracs = self.get_mass_fracs()

    def get_mass_fracs(self):
        return {
            'y': Y_SUN,
            'z': Z_SUN
        }

    @property
    def z_frac(self):
        return self.mass_fracs['z']

    @property
    def y_frac(self):
        return self.mass_fracs['y']

    @property
    def abundance_np(self):
        _pyarr = []
        for abundance in self.abundances.values():
            _pyarr.append((abundance.abundance, abundance.aMass))

        return np.array(_pyarr)

### Stellar Data Model
The stellar data model represents a star's simulation data.

In [9]:
# Solar model and calculation constants
sun_model = SunModel()
sun_abundances = sun_model.abundance_np
sun_metals = sun_abundances[2:]
sun_metal = np.sum(10 ** sun_metals[:, 0] * sun_metals[:, 1])

In [10]:
class StellarDataModel:
    def __init__(self, mass: float, relative_abundances: list[Abundance], age: float, radius: float,
                 iters: Union[int, 'converge'] = 'converge', convergence_thresh: float = 1e-7):
        """
        :param mass: Mass of star in MSun 
        :param relative_abundances: List of relative abundances for each element
        :param age: Age of star in Gya
        :param rel_iron: Relative abundance of iron
        :param radius: Radius of star in RSun
        :param iters: Iteration number or 'converge' to use the convergence criteria
        :param convergence_thresh: Convergence threshold (difference between iterations must be less than this)
        """
        self.mass = mass
        self.abundances = StellarDataModel.get_abundances(relative_abundances)
        self.age = age
        self.mass_fracs = self.get_mass_fracs(iters, convergence_thresh)
        self.radius = radius

    @property
    def y_frac(self):
        return self.mass_fracs['y']

    @property
    def z_frac(self):
        return self.mass_fracs['z']

    # Iteratively calculate the mass fractions for each element
    def get_mass_fracs(self, iters: Union[int, 'converge'] = 'converge', convergence_thresh: float = 1e-7):
        idx_to_atom = {}
        _pyarr = []
        for (atom, abundance) in self.abundances.items():
            _pyarr.append((abundance.abundance, abundance.aMass))
            idx_to_atom[len(_pyarr) - 1] = atom

        arr = np.array(_pyarr)

        metal_star = np.sum(10 ** arr[:, 0] * arr[:, 1])
        delta = metal_star / sun_metal
        abundance_helium = np.log10(10 ** SUN_HE_ABUNDANCE * delta)

        new_arr = []
        
        start_idx = len(arr)
        # Add unknown elements to array based on delta
        for element in SPECTROSCOPY_ATOMS:
            if element not in self.abundances:
                idx_to_atom[start_idx] = element
                solar = sun_model.abundances[element].abundance
                estimated_abundance = np.log10(10 ** solar * delta)
                new_arr.append((estimated_abundance, masses[element]))
                start_idx += 1
                
        stack_args = [
            np.array([
                (12, 1),
                (abundance_helium, 4),
            ]),
            arr,
        ]
        
        if len(new_arr) > 0:
            stack_args.append(np.array(new_arr))

        star_new = np.vstack(stack_args)

        f_H_star = np.log10(np.sum(10 ** star_new[:, 0] * star_new[:, 1]))

        atom_fracs = 10 ** (arr[:, 0] - f_H_star)
        mass_fracs = atom_fracs * arr[:, 1]
        metallicity = np.sum(mass_fracs)

        def calc_z_iter(arr: np.ndarray, z: float) -> float:
            delta = z / sun_model.z_frac
            abundance_helium = np.log10(10 ** SUN_HE_ABUNDANCE * delta)
            
            new_arr = []
            
            for element in SPECTROSCOPY_ATOMS:
                if element not in self.abundances:
                    solar = sun_model.abundances[element].abundance
                    estimated_abundance = np.log10(10 ** solar * delta)
                    new_arr.append((estimated_abundance, masses[element]))
                    
            stack_args = [
                np.array([
                    (12, 1),
                    (abundance_helium, 4),
                ]),
                arr,
            ]
            
            if len(new_arr) > 0:
                stack_args.append(np.array(new_arr))
                    
            star_new = np.vstack(stack_args)

            f_H_star = np.log10(np.sum(10 ** star_new[:, 0] * star_new[:, 1]))
            
            if len(new_arr) > 0:
                wanted_atoms = np.vstack((arr, np.array(new_arr)))
            else:
                wanted_atoms = arr

            atom_fracs = 10 ** (wanted_atoms[:, 0] - f_H_star)
            mass_fracs = atom_fracs * wanted_atoms[:, 1]
            metallicity = np.sum(mass_fracs)
            return metallicity, mass_fracs

        z_last = float('inf')
        z_curr = metallicity

        if iters == 'converge':
            iters = 0

            while np.abs(z_curr - z_last) > convergence_thresh:
                z_last = z_curr
                z_curr, mass_fracs = calc_z_iter(arr, z_last)
                iters += 1

            y_frac = (z_curr - sun_model.z_frac) * 2.1 + .279

            mass_fractions = {}
            for i in range(len(mass_fracs)):
                mass_fractions[idx_to_atom[i]] = mass_fracs[i]
                
            mass_fractions['y'] = y_frac
            mass_fractions['z'] = z_curr

            return mass_fractions
        else:
            for i in range(iters - 1):
                z_last = z_curr
                z_curr, mass_fracs = calc_z_iter(arr, z_last)

            y_frac = (z_curr - sun_model.z_frac) * 2.1 + .279

            mass_fractions = {}
            for i in range(len(mass_fracs)):
                mass_fractions[idx_to_atom[i]] = mass_fracs[i]

            mass_fractions['y'] = y_frac
            mass_fractions['z'] = z_curr

            return mass_fractions

    @staticmethod
    def get_abundances(relative_abundances: list[Abundance]) -> dict[str, AbundanceVal]:
        abundances = {}
        for rel_ab in relative_abundances:
            converted_as_base_hydrogen = rel_ab.relAbundance
            abundances[rel_ab.atom] = AbundanceVal(masses[rel_ab.atom],
                                                   converted_as_base_hydrogen + sun_model.abundances[rel_ab.atom].abundance)

        return abundances

    def to_mesa_data(self):
        data = {
            'mass': self.mass,
            'y': self.y_frac,
            'z': self.z_frac,
            'age': self.age * 1e9,
            'radius': self.radius,
        }

        for (atom, mass_frac) in self.mass_fracs.items():
            data[f'{atom}_mass_frac'] = mass_frac

        return data

In [11]:
model = StellarDataModel(1, [
    Abundance('C', 0),
    Abundance('N', 0),
    Abundance('O', 0),
    Abundance('Ne', 0),
    Abundance('Mg', 0),
    Abundance('Al', 0),
    Abundance('Si', 0),
    Abundance('P', 0),
    Abundance('S', 0),
    Abundance('K', 0),
    Abundance('Ca', 0),
    Abundance('Ti', 0),
    Abundance('Cr', 0),
    Abundance('Mn', 0),
    Abundance('Fe', 0),
    Abundance('Ni', 0),
], 4.603, 1)
model.to_mesa_data()

{'mass': 1,
 'y': 0.2806571462460765,
 'z': 0.017789117260036437,
 'age': 4603000000.0,
 'radius': 1,
 'C_mass_frac': 0.003890482537531377,
 'N_mass_frac': 0.0009478947769480424,
 'O_mass_frac': 0.00860037012176686,
 'Ne_mass_frac': 0.0005825577830866611,
 'Mg_mass_frac': 0.0007518304804681824,
 'Al_mass_frac': 6.942102490154739e-05,
 'Si_mass_frac': 0.0009747796707125492,
 'P_mass_frac': 7.6105888463066615e-06,
 'S_mass_frac': 0.0003948712800628494,
 'K_mass_frac': 4.391189486414068e-06,
 'Ca_mass_frac': 6.971566017444136e-05,
 'Ti_mass_frac': 4.173116537229619e-06,
 'Cr_mass_frac': 2.0720253940692677e-05,
 'Mn_mass_frac': 1.1227873118280039e-05,
 'Fe_mass_frac': 0.0013721721440295032,
 'Ni_mass_frac': 8.689875842550044e-05,
 'y_mass_frac': 0.2806571462460765,
 'z_mass_frac': 0.017789117260036437}

In [33]:
# Load past data to exclude from the available stars (don't want to simulate stars that have already been simulated)
glob = Path('simulate/past_data/').glob('*.csv')
past_data = pd.concat([pd.read_csv(file, index_col=0) for file in glob])

# Remove the stars that have already been simulated
available_stars = matched_df[~matched_df.SOURCE_ID.isin(past_data.index)]
available_stars = available_stars.drop_duplicates(subset=['SOURCE_ID'])
available_stars

Unnamed: 0,SOURCE_ID,ra,dec,log_surface_gravity,dist,mass,radius,log_fe_h_abundance,log_n_fe_abundance,log_s_fe_abundance,...,Ni,flag_Ni,Fe,flag_Fe,P,flag_P,N,flag_N,S,flag_S
0,2264702071538672896,286.165066,73.194919,4.1202,275.134705,1.228819,1.4829,0.04,,,...,0.002124,0,-0.089597,0,,2,-0.126015,288,0.002999,0
2,2264719354486576256,287.199870,73.349541,4.4956,134.355392,0.959756,0.8607,-0.14,,,...,0.000854,0,-0.099274,0,,2,-0.018811,256,0.079976,0
3,2268464325450037120,281.577653,75.303370,4.3004,236.193298,1.060581,1.1950,0.21,,,...,0.039530,0,0.190910,0,,2,0.341450,32,-0.031137,0
4,2268469823008193152,282.016475,75.423453,4.5217,145.762207,0.823140,0.8730,-0.00,,,...,0.094350,0,0.211670,0,,2,0.250330,0,-0.087508,0
5,2266478779249533952,276.523338,71.531131,4.3552,131.852295,1.015231,1.1078,-0.14,,,...,0.026220,0,-0.129550,0,,2,0.005228,32,0.048309,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17076,4272885567765195648,276.543831,-0.523455,4.0702,194.401093,1.338504,1.5900,-0.07,,0.11,...,-0.112757,0,-0.013973,0,,2,0.328363,32,-0.021373,0
17077,4273113849562926336,276.690990,-0.100930,4.0678,167.999603,1.262112,1.5603,0.15,,0.20,...,-0.020234,0,0.049639,0,,2,-0.351201,288,-0.060317,0
17078,6761953050076021120,283.186731,-28.729720,4.0894,130.914795,1.365962,1.6302,0.01,,0.31,...,-0.108570,0,-0.209000,0,,2,0.927310,32,0.050425,0
17080,6762049497853730944,284.426090,-28.255835,4.5397,107.947098,0.809467,0.7792,0.01,,,...,0.024032,0,-0.114710,0,,2,-0.104517,256,0.039461,0


### Prepare Dataset
Now we will prepare the dataset for MESA.

First, we will need to convert the age to a time in years. (Convert from Gya to years)
Next, we will need to calculate the initial helium abundance from the metallicity. (y)
Finally, we will add the mass and metallicity to the dataframe. (mass and z respectively)

We will also include the source_id as a column in the dataframe.

In [36]:
def format_test(test_bool):
    return '✔' if test_bool else '✘'


cols = ['source_id', 'mass', 'age', 'y', 'z', 'radius']
for atom in SPECTROSCOPY_ATOMS:
    cols.append(f'{atom}_mass_frac')
dataset = pd.DataFrame(columns=cols)
dataset.set_index('source_id', inplace=True)
df_len = len(available_stars)

# Inference test conditions
print(
    f'The sample is independent if the sample size is less than {df_len / 10}. {NUM_STARS} is less than {df_len / 10}? {format_test(NUM_STARS < df_len)}')
print(
    f'The sample is normal if the sample size is greater than 30 or the sampling or population distribution is normal. We do not know the density profiles of the stars so we will have to rely on the Central Limit Theorem. {NUM_STARS} is greater than 30? {format_test(NUM_STARS > 30)}')
print(f'The sample is random because we used NumPy random to select indices. ✔')

print()
print(f'The steps for inference have been met? {format_test(NUM_STARS < df_len / 10 and NUM_STARS > 30)}')

if NUM_STARS < df_len:
    random_indices = np.random.choice(df_len, NUM_STARS, replace=False)
else:
    random_indices = np.arange(df_len)

for i in random_indices:
    df_row = available_stars.iloc[i]

    abundances: list[Abundance] = []
    # Create stellar data model
    
    for atom in SPECTROSCOPY_ATOMS:
        val = df_row[f'{atom}']
        flag = df_row[f'flag_{atom}']
        if atom == 'Ne':
            flag = df_row['flag_Ne'].iloc[0]
        if (not np.isnan(val) 
                and flag == 0):
            abundances.append(Abundance(atom, val))
    
    # Create stellar data model
    stellar_data_model = StellarDataModel(df_row['mass'], abundances, df_row['age'], df_row['radius'])
    dataset.loc[df_row.SOURCE_ID] = stellar_data_model.to_mesa_data()
    
    if dataset.loc[df_row.SOURCE_ID] is None:
        print(f'Failed to create stellar data model for {df_row.SOURCE_ID}')
    
dataset

The sample is independent if the sample size is less than 1128.9. 2000 is less than 1128.9? ✔
The sample is normal if the sample size is greater than 30 or the sampling or population distribution is normal. We do not know the density profiles of the stars so we will have to rely on the Central Limit Theorem. 2000 is greater than 30? ✔
The sample is random because we used NumPy random to select indices. ✔

The steps for inference have been met? ✘


Unnamed: 0_level_0,mass,age,y,z,radius,C_mass_frac,O_mass_frac,N_mass_frac,Ne_mass_frac,Mg_mass_frac,...,Si_mass_frac,P_mass_frac,S_mass_frac,K_mass_frac,Ca_mass_frac,Ti_mass_frac,Cr_mass_frac,Mn_mass_frac,Fe_mass_frac,Ni_mass_frac
source_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2106822131756839808,1.294340,3.002109e+09,0.281930,0.018395,1.6064,0.002968,0.009667,0.001018,0.000541,0.000577,...,0.001148,0.000008,0.000329,0.000005,0.000067,0.000001,0.000015,0.000012,0.001864,0.000084
1304949779086279040,1.109524,4.443443e+09,0.285753,0.020215,1.2402,0.003288,0.011728,0.001093,0.000646,0.000678,...,0.001131,0.000009,0.000327,0.000004,0.000065,0.000002,0.000010,0.000009,0.001062,0.000083
1467947017783706496,1.167186,3.686911e+09,0.283830,0.019300,1.3236,0.003447,0.010827,0.001056,0.000623,0.000683,...,0.001127,0.000008,0.000305,0.000004,0.000063,0.000002,0.000015,0.000010,0.000965,0.000084
4656164360442701312,1.547226,2.066022e+09,0.284734,0.019730,2.2996,0.003844,0.010686,0.001073,0.000584,0.000608,...,0.001140,0.000009,0.000475,0.000005,0.000066,0.000003,0.000007,0.000010,0.001068,0.000079
1467585518976535296,1.241795,1.227007e+09,0.287069,0.020842,1.2681,0.003907,0.011824,0.001118,0.000627,0.000634,...,0.001092,0.000009,0.000340,0.000005,0.000060,0.000002,0.000020,0.000010,0.001050,0.000080
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1727424889991259648,1.796019,1.461672e+09,0.286384,0.020516,3.2201,0.003845,0.011111,0.001105,0.000620,0.000660,...,0.001205,0.000009,0.000469,0.000005,0.000087,0.000004,0.000045,0.000012,0.001190,0.000070
1175399443584763904,1.220439,2.619569e+09,0.287498,0.021047,1.3447,0.003957,0.012012,0.001127,0.000699,0.000774,...,0.001268,0.000009,0.000399,0.000004,0.000076,0.000005,0.000014,0.000009,0.000540,0.000078
2241723584189636736,1.153977,3.198005e+09,0.280731,0.017824,1.2472,0.002493,0.009742,0.000993,0.000554,0.000601,...,0.001018,0.000008,0.000285,0.000002,0.000101,0.000004,0.000011,0.000013,0.001835,0.000087
464647326066028288,0.868027,1.283141e+10,0.284280,0.019514,1.0144,0.003590,0.010217,0.001065,0.000731,0.000997,...,0.001235,0.000009,0.000358,0.000006,0.000073,0.000004,0.000011,0.000009,0.000990,0.000083


In [37]:
# In case you want to scout out the original rows
matched_df.iloc[random_indices]

Unnamed: 0,SOURCE_ID,ra,dec,log_surface_gravity,dist,mass,radius,log_fe_h_abundance,log_n_fe_abundance,log_s_fe_abundance,...,Ni,flag_Ni,Fe,flag_Fe,P,flag_P,N,flag_N,S,flag_S
2368,4069294119280934016,269.418000,-23.699117,4.2516,154.121506,1.198429,1.2585,0.11,,,...,-0.001540,0,-0.224760,0,,2,-0.429040,288,0.021021,0
3098,2509043065948320640,25.448676,-0.361859,3.9629,359.010498,1.432280,2.1645,-0.07,,0.14,...,0.014493,0,0.011086,0,,2,0.363442,32,0.009373,0
11315,1332662763505172224,248.051483,39.792582,4.0350,170.675598,1.231028,1.5853,-0.16,,0.13,...,-0.009810,0,-0.154550,0,,2,-0.356980,288,0.039713,0
702,4455301246958459264,237.874456,9.231977,4.3515,82.934196,1.036051,1.0467,-0.07,,-0.05,...,0.008750,0,-0.074333,0,,2,0.029246,32,-0.033136,0
11288,1332545081401165312,247.905201,39.357942,3.9654,285.105103,1.485658,1.9719,-0.03,0.4,0.07,...,-0.010871,0,-0.051399,0,,2,0.632270,32,-0.184238,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4920,705170309303241984,131.119886,28.456273,4.2619,131.669006,1.027591,1.1895,-0.19,,-0.01,...,0.041270,0,0.128060,0,,2,0.133950,32,-0.074705,0
11929,6049456389831655040,246.053667,-23.500645,4.3350,78.549500,1.074634,1.1380,-0.17,,,...,-0.010284,0,0.078026,0,,258,-0.156175,288,-0.023792,0
260,5278490621832478976,95.529151,-70.456190,4.1856,240.418594,1.308594,1.4246,-0.05,,0.08,...,-0.031980,0,-0.247970,0,,2,0.215710,32,0.023199,0
7946,2436596317809021056,352.209723,-11.174963,4.2105,142.865799,1.166406,1.2637,-0.15,,0.29,...,-0.043770,0,-0.134250,0,,2,0.528510,32,-0.204977,0


In [38]:
# Export the data to a csv file
dataset.to_csv('simulate/03.csv')

### Setting up for simulations
Please be aware that you will need to download and install both GNU Parallel and MESA.

Please follow the instructions on [MESA's website](https://docs.mesastar.org/en/latest/installation.html).

I used SDK Version **23.7.3** and MESA Version **24.03.1**.

I would highly recommend downloading the source directly from the [MESA GitHub](https://github.com/MESAHub/mesa/releases/tag/r24.03.1) 
rather than downloading it from Zenodo since it can be **slow** to download from Zenodo.

I used a **Fedora 40** distribution tailored for astronomy. I downloaded the ISO image from [here](https://fedoraproject.org/labs/astronomy/).