# Tutorial Outline

**Tutorial contents**:
- Adding new yield tables
- Choosing element set
- Training of a neural network
- Running MCMC analysis
- Computing Bayes/LOO-CV scores

The above are based on the Philcox & Rybizki (2017) paper which should be cited when using this code. This is based on the $\mathit{Chempy}$ software, described in Rybizki et al. (2017, arXiv:1702.08729) and full tutorials for this can be found at https://github.com/jan-rybizki/Chempy/tree/master/tutorials.

** Requirements**:
Before running this tutorial, the $\mathit{ChempyScoring}$ code and its dependencies must be installed (https://github.com/oliverphilcox/ChempyScoring/blob/master/requirements.txt)

The authors Oliver Philcox (ohep2@cam.ac.uk) and Jan Rybizki (rybizki@mpia.de) are happy to assist with any problems which may arise

## Step 1: Load Yield Tables

First we must load in the Nucleosynthetic yield table to be tested. Here we will test the SN2 net yields of Frischknecht et al. (2016, arXiv:1511.05730). These include s-process elements for stars of mass 15-40Msun, with rotation. Here we implement the yield tables for standard rotation and differing metallicities.

The yield tables provide data for masses 15,20,25,40 Msun, metalicities of solar (0.0134), 0.001, 1e-5 and 1e-7 and differing stellar rotation speeds. Here we only use solar, 0.001 and 1e-5 metallicities and standard rotations, since only these have data for all masses. 

To add this into *Chempy* we add a `Frischknecht16_net` function to the `SN2_feedback()` class in `yields.py` as shown below:

In [None]:
## NB: The Frischknecht16 definition should be inserted into the yields.py file

from Chempy import localpath # For file locations
import numpy as np

class SN2_feedback(object):
    def __init__(self):   
        """
        This is the object that holds the feedback table for SN2 stars.
                The different methods load different tables from the literature. They are in the input/yields/ folder.
        """

    def Frischknecht16_net(self):
        """SN2 yields from Frischknecht et al. 2016. These are implemented for masses of 15-40Msun, for rotating stars.
        Yields from stars with 'normal' rotations are used here.
        These are net yields automatically, so no conversions need to be made
        """
        import numpy.lib.recfunctions as rcfuncs
        import os

        # Define metallicites 
        self.metallicities = [0.0134,1e-3,1e-5] # First is solar value

        # Define masses
        self.masses=  np.array((15,20,25,40))

        # Load yield table dictionary in correct format from npy file if it exists
        saved_yields = localpath+'input/yields/Frischknecht16_net.npy'
        if os.path.exists(saved_yields):
            self.table = np.load(saved_yields).item()

        else:
            # If not, create yield table from .txt file

            # Define data types
            dt = np.dtype('U8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8,f8')

            # Initialise yield table
            yield_table = {}


            # Import full table with correct rows and data-types
            z = np.genfromtxt(localpath+'input/yields/Frischknecht16/yields_total.txt',skip_header=62,dtype=dt)

            # Define isotope indexing. For radioactive isotopes with half-lives << Chempy time_step they are assigned to their daughter element
            # NB: we only use elements up to Ge here, as in the paper
            indexing={}
            indexing['H']=['p','d']
            indexing['He'] = ['he3','he4']
            indexing['Li'] = ['li6','li7']
            indexing['Be']  = ['be9']
            indexing['B']  = ['b10','b11']
            indexing['C']  = ['c12','c13']
            indexing['N']  = ['n14','n15']
            indexing['O']  = ['o16','o17','o18']
            indexing['F']  = ['f19']
            indexing['Ne']  = ['ne20','ne21','ne22']
            indexing['Na']  = ['na23']
            indexing['Mg']  = ['mg24','mg25','mg26','al26']
            indexing['Al']  = ['al27']
            indexing['Si']  = ['si28','si29','si30']
            indexing['P']  = ['p31']
            indexing['S']  = ['s32','s33','s34','s36']
            indexing['Cl']  = ['cl35','cl37']
            indexing['Ar']  = ['ar36','ar38','ar40']
            indexing['K']  = ['k39','k41']
            indexing['Ca']  = ['ca40','ca42','ca43','ca44','ca46','ca48']
            indexing['Sc']  = ['sc45']
            indexing['Ti']  = ['ti46','ti47','ti48','ti49','ti50']
            indexing['V']  = ['v50','v51']
            indexing['Cr']  = ['cr50','cr52','cr53','cr54']
            indexing['Mn']  = ['mn55']
            indexing['Fe']  = ['fe54', 'fe56','fe57','fe58']
            indexing['Co']  = ['fe60', 'co59']
            indexing['Ni']  = ['ni58','ni60','ni61','ni62','ni64']
            indexing['Cu']  = ['cu63','cu65']
            indexing['Zn']  = ['zn64','zn66','zn67','zn68','zn70']
            indexing['Ga']  = ['ga69','ga71']
            indexing['Ge']  = ['ge70','ge72','ge73','ge74','ge76']

            # Define indexed elements 
            self.elements = list(indexing.keys())

            # Create model dictionary indexed by metallicity, giving relevant model number for each choice of mass
            # See Frischknecht info_yields.txt file for model information
            model_dict = {}
            model_dict[0.0134] = [2,8,14,27]
            model_dict[1e-3]=[4,10,16,28]
            model_dict[1e-5]=[6,12,18,29]

            # Import list of remnant masses for each model (from row 32-60, column 6 of .txt file) 
            # NB: these are in solar masses
            rem_mass_table = np.loadtxt(localpath+'input/yields/Frischknecht16/yields_total.txt',skiprows=31,usecols=6)[:29]

            # Create one subtable for each metallicity 
            for metallicity in self.metallicities:
                additional_keys = ['Mass', 'mass_in_remnants','unprocessed_mass_in_winds'] # List of keys for table
                names = additional_keys + self.elements

                # Initialise table and arrays   
                base = np.zeros(len(self.masses))
                list_of_arrays = []
                for i in range(len(names)):
                    list_of_arrays.append(base)
                yield_subtable = np.core.records.fromarrays(list_of_arrays,names=names)
                mass_in_remnants = np.zeros(len(self.masses))
                total_mass_fraction = np.zeros(len(self.masses))
                element_mass = np.zeros(len(self.masses))

                # Add masses to table
                yield_subtable['Mass'] = self.masses


                # Extract remnant masses (in solar masses) for each model:
                for mass_index,model_index in enumerate(model_dict[metallicity]):
                    mass_in_remnants[mass_index] = rem_mass_table[model_index-1] 

               # Iterate over all elements 
                for element in self.elements:
                    element_mass = np.zeros(len(self.masses))
                    for isotope in indexing[element]: # Iterate over isotopes of each element
                        for mass_index,model_index in enumerate(model_dict[metallicity]): # Iterate over masses 
                            for row in z: # Find required row in table 
                                if row[0] == isotope:
                                    element_mass[mass_index]+=row[model_index] # Compute cumulative mass for all isotopes
                    yield_subtable[element]=element_mass # Add entry to subtable

                all_fractions = [row[model_index] for row in z] # This lists all elements (not just up to Ge)
                total_mass_fraction[mass_index] = np.sum(all_fractions) # Compute total net mass fraction (sums to approximately 0)

                # Add fields for remnant mass (now as a mass fraction) and unprocessed mass fraction
                yield_subtable['mass_in_remnants']=np.divide(mass_in_remnants,self.masses)                    
                yield_subtable['unprocessed_mass_in_winds'] = 1.-(yield_subtable['mass_in_remnants']+total_mass_fraction) # This is all mass not from yields/remnants

                # Add subtable to full table
                yield_table[metallicity]=yield_subtable

            # Define final yield table for output
            self.table = yield_table

            # Save yield table to avoid reloading each time
            np.save(saved_yields,self.table)


We can now test the new yield table (using Ca as an example element):

In [2]:
# Define correct yield table
from Chempy.wrapper import SN2_feedback
basic_sn2 = SN2_feedback()
getattr(basic_sn2, 'Frischknecht16_net')()

print("Ca Yields")
for metallicity in basic_sn2.metallicities:
    print("\n Metallicity = %.2e" %(metallicity))
    for i in range(len(basic_sn2.masses)):
        print("Mass = %d, Yield  = %.6e" %(basic_sn2.masses[i],basic_sn2.table[metallicity]['Ca'][i]))


Ca Yields

 Metallicity = 1.34e-02
Mass = 15, Yield  = -4.661234e-05
Mass = 20, Yield  = -1.013723e-04
Mass = 25, Yield  = -1.498169e-04
Mass = 40, Yield  = -3.603430e-04

 Metallicity = 1.00e-03
Mass = 15, Yield  = -2.467764e-06
Mass = 20, Yield  = -4.861126e-06
Mass = 25, Yield  = -8.312735e-06
Mass = 40, Yield  = -1.565401e-05

 Metallicity = 1.00e-05
Mass = 15, Yield  = -2.234603e-08
Mass = 20, Yield  = -4.682224e-08
Mass = 25, Yield  = -6.924936e-08
Mass = 40, Yield  = -1.481142e-07


In [3]:
basic_sn2.table[1e-3]['B']

array([ -8.71712663e-10,  -1.17302861e-09,  -1.90546586e-09,
        -2.32895755e-09])

## *(Optional: Choice of Elements)*

*In the scoring paper we use all chemical elements up to Ge, but this can be changed (e.g. for the paper analysis excluding Sc). To select the required elements we simply modify the `Chempy/parameter.py` file.*

*The `elements_to_trace` field contains a list of elements which are in the proto-solar data-file, including B, Be, Li and H which are not predicted directly by the neural network. The network predicts those elements in `initial_neural_names` (as [X/Fe] or [Fe/H] abundances).*

*Both fields must be changed to alter the element choices.*

*In addition, if extra elements are added, it should be checked that they are predicted by the yield tables and feature in the `Chempy/input/stars/proto_sun_all.npy` observational data-set.*

*To properly compare yield tables using the metrics described in Philcox et al. (2017) we should keep the element choice constant (28 elements up to Ge is the default).*

In [2]:
## Modify these lines in Chempy/parameter.py

# This field should contain all required elements  (and B,Be,Li,H) in alphabetical order
elements_to_trace = ['Al', 'Ar', 'B', 'Be', 'C', 'Ca', 'Cl', 'Co', 'Cr', 'Cu', 'F', 'Fe', 'Ga', 'Ge', 'H', 'He', 'K', 'Li', 'Mg', 'Mn', 'N', 'Na', 'Ne', 'Ni', 'O', 'P', 'S', 'Sc', 'Si', 'Ti', 'V', 'Zn']

# This field contains names of elements predicted by neural network
initial_neural_names = ['Al', 'Ar', 'C', 'Ca', 'Cl', 'Co', 'Cr', 'Cu', 'F', 'Fe', 'Ga', 'Ge', 'He', 'K', 'Mg', 'Mn', 'N', 'Na', 'Ne', 'Ni', 'O', 'P', 'S', 'Sc', 'Si', 'Ti', 'V', 'Zn']


## Step 2: Create Neural Network Dataset

Now that the yield set has been implemented, we must next create a training data-set for the neural network.

Firstly it is important to change the `parameter.py` file such that *Chempy* uses the correct yields. Here we add the new SN2 yield table name and set *Chempy* to use it by default. 

In [3]:
## Modify these lines in Chempy/parameter.py to add new yield set
yield_table_name_sn2_list = ['chieffi04','Nugrid','Nomoto2013','Portinari', 'chieffi04_net', 'Nomoto2013_net','Frischknecht16_net']
yield_table_name_sn2_index = 6
yield_table_name_sn2 = yield_table_name_sn2_list[yield_table_name_sn2_index]


We can test this as follows:

In [1]:
# Load parameter file
from Chempy.parameter import ModelParameters
a = ModelParameters()

# Print new SN2 yield table name
print(a.yield_table_name_sn2)

Frischknecht16_net


We must also set the list of parameters to optimize over to include only the 5 free *Chempy* parameters (i.e. not $\beta$). This will be changed later in the analysis, but must be done at this point, else the `training_data()` routine will fail. 

For compatibility reasons $\beta$ (when later added) is normally included in the SSP_parameters definition.


In [5]:
# Modify these lines in Chempy/parameter.py file

if True:
    # Priors
    SSP_parameters =  [-2.29 ,-2.75 ]
    SSP_parameters_to_optimize = ['high_mass_slope', 'log10_N_0']
else:
    SSP_parameters = []
    SSP_parameters_to_optimize = []
assert len(SSP_parameters) == len(SSP_parameters_to_optimize)
if True:
    # Priors
    ISM_parameters =  [-0.3, 0.55,0.5]
    ISM_parameters_to_optimize = ['log10_starformation_efficiency', 'log10_sfr_scale', 'outflow_feedback_fraction']
else:
    ISM_parameters = []
    ISM_parameters_to_optimize = []
assert len(ISM_parameters) == len(ISM_parameters_to_optimize)


We must also turn OFF the neural network predictions:

In [6]:
# In Chempy/parameter.py
UseNeural=False

# To test
a.UseNeural

False

Data-sets can be created using the `Chempy.neural` module as follows and are saved in the `Neural/` directory. 

This uses multiprocessing to create a training data-set using 10 values of each of the 5 free *Chempy* parameters. (This value can be changed using `training_size` in `parameter.py`). These are written to file as `Neural/training_abundances.npy` (abundance output) and `Neural/training_norm_grid.npy` (normalised input)

In [1]:
from Chempy.neural import training_data
#training_data()

The above was run on a 64-core machine, taking 45 minutes. 

## Step 3: Train Neural Network

Next we must train the neural network using the previously constructed data-sets. This can be simply done with the `Chempy/neural.py` `create_network()` function. This creates and trains a 30-neuron network over 1000 training epochs, using a learning rate of 0.007 by default (optimised via a validation data-set).

In [None]:
from Chempy.neural import create_network
#create_network(Plot=True)

The above was run on an 8-core machine taking 6 minutes. The trained network is saved as `Neural/neural_model.npz`. A loss plot is also produced, showing the network loss function against training epoch (if `Plot=True`).

Using `Chempy/neural.py`'s `neural_output()` function we may simulate the output of *Chempy* for any set of input parameters;

In [13]:
from Chempy.parameter import ModelParameters
a = ModelParameters()

from Chempy.neural import neural_output
# Here we compute the neural network predictions for the prior values of the free parameters (a.p0) 
output = neural_output(a.p0)

print("Element \t Predicted abundance")
print("------------------------------------")
for i in range(len(output)):
    print(a.initial_neural_names[i],"\t\t",output[i])

Element 	 Predicted abundance
------------------------------------
Al 		 0.0468287557287
Ar 		 0.0430388074658
C 		 -0.283768263563
Ca 		 -0.0203430137616
Cl 		 -0.505086196204
Co 		 -0.344452923144
Cr 		 0.104266876841
Cu 		 0.0177271392126
F 		 -0.0997805051791
Fe 		 0.185911006511
Ga 		 0.347348328843
Ge 		 0.565114782287
He 		 -0.14640519163
K 		 -0.878388191207
Mg 		 -0.040879649148
Mn 		 0.185649103103
N 		 0.0513031459996
Na 		 0.28427847381
Ne 		 0.207458257613
Ni 		 0.162557021237
O 		 0.0730409237704
P 		 0.0323074674887
S 		 0.2053703491
Sc 		 -0.916246261917
Si 		 0.234703496874
Ti 		 -0.431241206649
V 		 -0.242653704898
Zn 		 0.00616649432223


## Step 4: Compute Bayes + LOO-CV Scores as a function of $\beta$

We are now ready to compute both the Bayes and LOO-CV scores for the network. Before computation, we instruct *Chempy* to use the trained neural network which is done by altering the `Chempy/parameter.py` file:

In [2]:
## Modify these lines in Chempy/parameter.py
UseNeural = True 

# To test if this has worked:
a.UseNeural

True

## Step 5: Compute Overall Scores