# The Survey Pipeline

If you have a galaxy catalog (either of `Parametric` origin or from a simulation), an [`EmissionModel`](../emission_models/emission_models.rst), and a set of [instruments](../instrumentation/instrument_example.ipynb) you want observables for, you can easily write a pipeline to generate the observations you want using the Synthesizer UI. However, lets say you have a new catalog you want to run the same analysis on, or a whole different set of instruments you want to use. You could modify your old pipeline or write a whole new pipeline, but thats a lot of work and boilerplate. 

This is where the `Survey` shines. Instead, of having to write a pipeline, the `Survey` class is a high-level interface that allows you to easily generate observations for a given catalog, emission model, and set of instruments. All you need to do is define a galaxy loader, setup the ``Survey`` object, and run the observable methods you want to include. Possible observables include:

- Spectra.
- Emission Lines.
- Photometry.
- Images (with or without PSF convolution/noise).
- Spectral data cubes (IFUs) [WIP].
- Instrument specific spectroscopy [WIP].

The ``Survey`` will generate all the requested observations for all (compatible) instruments and galaxies, before writing them out to a standardised HDF5 format.

As a bonus, the abstraction into the `Survey` class allows for easy parallelization of the analysis, not only over local threads but optionally over MPI. 

In the following sections we will show how to instantiate and use a ``Survey`` object to generate observations for a given catalog, emission model, and set of instruments.

## Setting up a ``Survey`` object

Before we instatiate a survey we need to define its "dependencies". These are a method to load a galaxy catalog, an emission model, and a set of instruments. 

### Defining a galaxy loader

The galaxy loader function can be distributed across a number of threads (also MPI ranks but we'll cover this in more detail below). To ensure the galaxy loader works and is parallelisable it must adhere to a set of rules:

- It must return a single ``Galaxy`` object or ``None``. The latter of these is to handle any galaxies which failed to be loaded. These will be sanitised out of the catalog before any analysis is run.
- It's first argument must be the galaxy's "index" in the catalog. This argument must be called "gal_index" since this is how we check the function is compatible under the hood. For instance, if you have a HDF5 file from which you are loading a ``Galaxy`` this index should be the index into the file for the galaxy you want to load. 
- It can take any number of additional arguments and keyword arguments.

Below we define a fake galaxy loader for illustrative purposes. This function generates a particle based galaxy from a parametric star formation and metallicity history.

In [None]:
import numpy as np
from unyt import Msun, Mpc, Myr

from synthesizer.particle.stars import sample_sfhz
from synthesizer.parametric.stars import Stars as ParametricStars
from synthesizer.parametric import SFH, ZDist
from synthesizer import Galaxy

def galaxy_loader(gal_index, grid):
    """
    Load a fake particle Galaxy.
    
    Args:
        gal_index (int): 
            The index of the galaxy to load. (Here, this is unused 
            but must be included.)
        grid (synthesizer.grid.Grid): 
            The grid object to use for defining the SFZH.
    """
    # Initialise the parametric Stars object
    param_stars = ParametricStars(
        grid.log10age,
        grid.metallicity,
        sf_hist=SFH.Constant(max_age= 100 * Myr),
        metal_dist=ZDist.DeltaConstant(metallicity=0.01),
        initial_mass=10**10 *Msun,
    )

    # Define the number of stellar particles we want
    n = int(100 * (np.random.rand() + 0.5))

    # Sample the parametric SFZH, producing a particle Stars object
    # we will also pass some keyword arguments for some example attributes
    part_stars = sample_sfhz(
        sfzh=param_stars.sfzh,
        log10ages=param_stars.log10ages,
        log10metallicities=param_stars.log10metallicities,
        nstar=n,
        current_masses=np.full(n, 10**9 / n) * Msun,
        redshift=1,
        coordinates=np.random.normal(0, 0.01, (n, 3)) * Mpc,
        centre=np.zeros(3) * Mpc,
        smoothing_lengths=np.ones(n) * 0.01 * np.random.rand(n) * Mpc,
    )

    # And create the galaxy
    gal = Galaxy(
        stars=part_stars,
        redshift=1,
    )

    return gal

This way of defining a loader leaves the definition of a ``Galaxy`` entirely in the users hands. You are free to add whatever attributes you see fit, and can load data from any source you desire. 

Notice here how we have 2 arguments. The required ``gal_index``, and then the ``grid`` which our loader needs to define the SFZH grid axes. We'll provide this argument later on when we want to run our survey. 

### Defining an emission model

The ``EmissionModel`` defines the emissions we'll generate, including the origin and any reprocessing the emission undergoes. For more details see the ``EmissionModel`` [docs](../emission_models/emission_models.rst). 

For demonstration, we'll use a simple premade ``IntrinsicEmission`` model which defines the intrinsic stellar emission (i.e. stellar emission without any ISM dust reprocessing).

In [None]:
from synthesizer.emission_models import IntrinsicEmission
from synthesizer.grid import Grid

# Get the grid
grid_dir = "../../../tests/test_grid/"
grid_name = "test_grid"
grid = Grid(grid_name, grid_dir=grid_dir)

model = IntrinsicEmission(grid, fesc=0.1)
model.set_per_particle(True)  # we want per particle emissions

### Defining the instruments

We don't need any instruments if all we want is spectra at the resolution of the ``Grid`` or emission lines. However, to get anything more sophisticated we need ``Instruments`` that define the technical specifications of the observations we want to generate. For a full breakdown see the instrumentation [docs](../instrumentation/instrument_example.ipynb).

Here we'll define a simple set of instruments including a subset of NIRCam filters (capable of imaging with a 0.1 kpc resolution) and a set of UVJ top hat filters (only capable of photometry).

In [None]:
from unyt import angstrom, kpc
from synthesizer.instruments import FilterCollection, UVJ
from synthesizer.instruments import Instrument


# Get the filters
lam = np.linspace(10**3, 10**5, 1000) * angstrom
webb_filters = FilterCollection(
    filter_codes=[
    f"JWST/NIRCam.{f}"
    for f in ["F090W", "F150W", "F200W", "F277W", "F356W", "F444W"]
],
new_lam=lam,
)
uvj_filters= UVJ(new_lam=lam)

# Instatiate the instruments
webb_inst = Instrument("JWST", filters=webb_filters, resolution=0.1 * kpc)
uvj_inst = Instrument("UVJ", filters=uvj_filters)
instruments = webb_inst + uvj_inst 

print(instruments)

### Instantiating the ``Survey`` object

Now we have all the ingredients we need to instantiate a ``Survey`` object. All we need to do now is pass them into the ``Survey`` object alongside the number of galaxies in the catalog in total and the number of threads we want to use during the analysis.

In [None]:
from synthesizer.survey.survey import Survey

survey = Survey(gal_loader_func=galaxy_loader,  emission_model=model, n_galaxies=10, instruments=instruments, nthreads=4,)

Notice that we got a log out of the ``Survey`` object detailing the basic setup. The ``Survey`` will automatically output logging information to the console but this can be supressed by passing ``verbose=0`` which limit the outputs to saying hello, goodbye, and any errors that occur.

In [None]:
survey = Survey(gal_loader_func=galaxy_loader,  emission_model=model, n_galaxies=10, instruments=instruments, nthreads=4, verbose=0,)

In [None]:




from synthesizer.kernel_functions import Kernel

from synthesizer.parametric.morphology import Sersic2D

from unyt import kpc, angstrom, Msun, Myr, Mpc
import numpy as np
from astropy.cosmology import Planck18 as cosmo

In [None]:




# Get the SPH kernel
sph_kernel = Kernel()
kernel = sph_kernel.get_kernel()



In [None]:
survey = Survey(galaxy_loader, model, 10, instruments, nthreads=8)
survey.load_galaxies(grid=grid)
survey.get_spectra(cosmo=cosmo)
survey.get_lines(line_ids=grid.available_lines)
survey.get_photometry_luminosities()
survey.get_photometry_fluxes()
survey.get_images_luminosity(fov=50 * kpc, kernel=kernel)
survey.get_images_flux(fov=50* kpc, kernel=kernel)
survey.write("output.hdf5")


In [None]:
survey.galaxies[0].stars.particle_photo_lnu