# atmodeller

## Tutorial 3: Monte Carlo experiment and initial solution guess

### Monte Carlo

We can devise a simple Monte Carlo (MC) approach to sample the probable atmospheres that can arise for different planetary conditions.

In [None]:
from atmodeller import debug_logger
from atmodeller.interior_atmosphere import InteriorAtmosphereSystem, Planet
from atmodeller.constraints import SystemConstraints, ElementMassConstraint, BufferedFugacityConstraint
from atmodeller.thermodata.redox_buffers import IronWustiteBuffer
from atmodeller.core import GasSpecies, Species
from atmodeller.solubility.hydrogen_species import H2O_peridotite_sossi
from atmodeller.solubility.carbon_species import CO2_basalt_dixon
from atmodeller.solubility.other_species import N2_basalt_libourel
from atmodeller.utilities import earth_oceans_to_hydrogen_mass
from atmodeller.initial_solution import InitialSolutionRegressor, InitialSolutionSwitchRegressor, InitialSolutionDict
import numpy as np
import logging

For production runs, make sure to set the logger to INFO or higher (i.e. WARNING, ERROR, or CRITICAL). Otherwise you will find that your MC runs slower just because of writing the output to the logger.

In [None]:
logger = debug_logger()
logger.setLevel(logging.DEBUG)

We now create the species that we are interested in.

In [None]:
H2O_g = GasSpecies("H2O", solubility=H2O_peridotite_sossi())
H2_g = GasSpecies("H2")
O2_g = GasSpecies("O2")
CO_g = GasSpecies("CO")
N2_g = GasSpecies("N2", solubility=N2_basalt_libourel())

species = Species([H2O_g, H2_g, O2_g, CO_g, N2_g])

Now create a planet. We recall that we can sample different planetary properties by updating the attributes of this object, even though in this tutorial we don't do this.

In [None]:
planet: Planet = Planet()

Now set up the main driver of the Monte Carlo (MC) approach. This establishes the ranges over which we sample certain properties.

In [None]:
def monte_carlo(interior_atmosphere: InteriorAtmosphereSystem, number_of_realisations:int=100):
    """Monte Carlo driver
    
    Args:
        interior_atmosphere: An interior-atmosphere system
        number_of_realisation: Number of simulations to perform
    """

    # Parameters are normally distributed between bounds.
    number_ocean_moles = np.random.uniform(1, 10, number_of_realisations)
    ch_ratios = np.random.uniform(0.1, 1, number_of_realisations)
    fo2_shifts = np.random.uniform(-4, 4, number_of_realisations)

    # ppmw of Nitrogen in the mantle. 2.8 is the mantle value of N.
    N_ppmw = 2.8

    # The nitrogen mass is constant
    mass_N = N_ppmw * 1.0e-6 * planet.mantle_mass

    for realisation in range(number_of_realisations):

        mass_H = earth_oceans_to_hydrogen_mass(number_ocean_moles[realisation])
        mass_C = ch_ratios[realisation] * mass_H
        constraints = SystemConstraints([
            ElementMassConstraint("H", mass_H),
            ElementMassConstraint("C", mass_C),
            ElementMassConstraint("N", mass_N),
            BufferedFugacityConstraint(O2_g, IronWustiteBuffer(log10_shift=fo2_shifts[realisation]))
        ])

        # Extra quantities to write to the output
        # For example, it's often helpful to have the constraints expressed in a more convenient
        # form for analysis and plotting.
        extra = {'fO2_shift': fo2_shifts[realisation], 'C/H ratio':ch_ratios[realisation],
            'Number of ocean moles':number_ocean_moles[realisation]}

        interior_atmosphere.solve(constraints, extra_output=extra)


We can run the MC as follows. This may take a minute or two to run.

In [None]:
interior_atmosphere: InteriorAtmosphereSystem = InteriorAtmosphereSystem(species=species, planet=planet)
monte_carlo(interior_atmosphere)

The simulation data can be exported to an Excel or a pickle file by setting the appropriate keyword argument in the output method:

In [None]:
interior_atmosphere.output(file_prefix='tutorial3_monte_carlo', to_excel=True, to_pickle=True)

If you just want to access the dataframes in a dictionary you can use:

In [None]:
output_data = interior_atmosphere.output(to_dataframes=True)
output_data

### Improving the initial guess

When performing a MC simulation, sometimes problems can arise when chosen model parameters would result in a solution that is far from the initial guess. Internally, atmodeller chooses an initial guess for the solution and uses this as a starting point for the numerical solution technique. But if this initial guess is far from the actual solution, the solver may fail. To address this, it is often convenient to run a smaller MC simulation with reduced parameter bounds in order to generate some output. Then we can use this output to train a new initial condition to provide an improved initial guess for a new MC run.

In the following, we use the generated output from the previous run to inform the selection of the initial condition:

In [None]:
initial_solution = InitialSolutionRegressor.from_pickle('tutorial3_monte_carlo.pkl', species=species, fit=True, fit_batch_size=100, partial_fit=True, partial_fit_batch_size=50)

In the above, fit = True, which means the trained data from the previous run (as computed from the output in the pickle file) is only used for the first fit_batch_size = 100 simulations. Subsequently the regressor will re-train itself on just the (fit_batch_size = 100) samples generated from the current model, discarding knowledge of the previous data it was trained on. Then, every partial_fit_batch_size = 100 simulations, it will update its training with the last batch of newly generated samples in order to better inform the selection of subsequent initial solutions. This is known as a dynamic or online learning approach.

It is necessary to pass the initial solution to the interior atmosphere system when it is created:

In [None]:
interior_atmosphere_ic = InteriorAtmosphereSystem(species=species, initial_solution=initial_solution, planet=planet)
monte_carlo(interior_atmosphere_ic, number_of_realisations=200)

If you compare the log output for the two MC runs, you will see in the second MC example that the initial solution re-trained itself after 100 samples had been generated and then partially retrained itself every 50 samples. This keeps the RMSE between the initial solution and actual solution to a smaller value than simply guessing a constant initial solution. Also, fit = True allows you to train an initial solution on a similar but not identical model (for example, different solubility laws or gas equations of states), where once enough samples have been generated you would prefer to only use the new model to generate new estimates (since the behavior of the new model and the previous similar-but-not-the-same model will diverge).

If you want to combine an initial solution that is constant, and then switches to a regressor, you can use:

In [None]:
initial_solution_start = InitialSolutionDict(fill_log10_number_density=25, species=species)
initial_solution_switch = InitialSolutionSwitchRegressor(initial_solution_start, species=species, fit=True, switch_iteration=100, fit_batch_size=100, partial_fit=True, partial_fit_batch_size=50)

In the above, a constant initial solution will be used for the first 100 samples, after which the initial solution will train itself on those first 100 samples. Then, for every subsequent 50 samples, the initial solution will partially re-train itself. This approach can work well, but of course it relies on finding those first 100 samples to train the regressor.

Although it is tempting to set fit_batch_size to as small a number as possible to begin training the initial solution, formally this initial sample should capture some of the variability in the solution since it is used to calibrate the scalings. This is important because the scalings are fixed for an initial solution regressor and are not updated (unlike the trained model) during partial fitting. Hence in practice, it may be necessary to incrementally add complexity in terms of solubility and equations of state, where the initial solution for each MC is trained on a previous simpler MC run.

In [None]:
interior_atmosphere_ic_switch: InteriorAtmosphereSystem = InteriorAtmosphereSystem(species=species, initial_solution=initial_solution_switch, planet=planet)
monte_carlo(interior_atmosphere_ic_switch, number_of_realisations=200)

### Adding species complexity

Let's say that we now want to use the simpler system solved above to inform the initial solution for a more complex system. Here, we now include CO2 as a species, although in principle we could include any number of extra species or even remove species compared to the previous species list. Also, we note that the species list does not have to be in the same order as before. However, the constraints must be the same.

In [None]:
H2O_g = GasSpecies("H2O", solubility=H2O_peridotite_sossi())
H2_g = GasSpecies("H2")
O2_g = GasSpecies("O2")
CO_g = GasSpecies("CO")
N2_g = GasSpecies("N2", solubility=N2_basalt_libourel())
CO2_g = GasSpecies("CO2", solubility=CO2_basalt_dixon())

species = Species([H2O_g, H2_g, O2_g, CO_g, CO2_g, N2_g])

In the new species list above, we also specify CO2 compared to the original species list at the start of this notebook.  Hence in the below we choose to provide an override solution with the initial value for CO2. Note that the species list must be restricted to only the species that should be overridden.

In [None]:
initial_solution_override = InitialSolutionDict({CO2_g: 1E25}, species=Species([CO2_g]))
initial_solution = InitialSolutionRegressor.from_pickle('tutorial3_monte_carlo.pkl', species=species, fit=True, fit_batch_size=100, partial_fit=True, partial_fit_batch_size=50)

It is necessary to pass the initial solution to the interior atmosphere system when it is created:

In [None]:
interior_atmosphere_ic: InteriorAtmosphereSystem = InteriorAtmosphereSystem(species=species, initial_solution=initial_solution, planet=planet)
monte_carlo(interior_atmosphere_ic, number_of_realisations=200)

In [None]:
interior_atmosphere_ic.output("tutorial3_monte_carlo_CO2", to_excel=True, to_pickle=True)