Database 1: Samples
===================

After fitting a large suite of strong lens data, we can use the aggregator to load the database's results. We can then
manipulate, interpret and visualize them using a Python script or Jupyter notebook.

This script uses the results generated by the script `/autolens_workspace/database/tutorial_0_model_fits.py`, which
fitted 3 simulated strong lenses with:

 - An `EllIsothermal` `MassProfile` for the lens galaxy's mass.
 - An `EllSersic` `LightProfile` representing a bulge for the source galaxy's light.

__Samples__

This script covers how to manipulate the `Samples` object returned from a *PyAutoLens* model-fit, which you have most
likely already encountered when analysing the results of a model-fit. Nevertheless, we'll also learn how to use the
`Aggregator`!

In [1]:
%matplotlib inline
from pyprojroot import here
workspace_path = str(here())
%cd $workspace_path
print(f"Working Directory has been set to `{workspace_path}`")

import os
from os import path
import autofit as af

/mnt/c/Users/Jammy/Code/PyAuto/autolens_workspace
Working Directory has been set to `/mnt/c/Users/Jammy/Code/PyAuto/autolens_workspace`


We now load the results in the `output` folder into a sqlite database using the `Aggregator`. We simply point to the 
path where we want the database to be created and add the directory `autolens_workspace/output/database`.

Checkout the output folder, you should see a `database.sqlite` file which contains the model-fits to the 3 `Gaussian`
datasets.

In [2]:
# from autofit.database.aggregator import Aggregator
#
# database_file = path.join("output", "database", "database.sqlite")
#
# if path.isfile(database_file):
#     os.remove(database_file)
#
# agg = Aggregator.from_database(path.join(database_file))
# agg.add_directory(path.join("output", "database"))

agg = af.Aggregator(directory=path.join("output", "database"))

Aggregator loading search_outputs... could take some time.

 A total of 6 search_outputs and results were found.


Before using the aggregator to inspect results, let me quickly cover Python generators. A generator is an object that 
iterates over a function when it is called. The aggregator creates all of the objects that it loads from the database 
as generators (as opposed to a list, or dictionary, or other Python type).

Why? Because lists and dictionaries store every entry in memory simultaneously. If you fit many datasets, this will use 
a lot of memory and crash your laptop! On the other hand, a generator only stores the object in memory when it is used; 
Python is then free to overwrite it afterwards. Thus, your laptop won't crash!

There are two things to bare in mind with generators:

 1) A generator has no length and to determine how many entries it contains you first must turn it into a list.

 2) Once we use a generator, we cannot use it again and need to remake it. For this reason, we typically avoid 
 storing the generator as a variable and instead use the aggregator to create them on use.

We can now create a `samples` generator of every fit. As we saw in the `result.py` example scripts, an instance of 
the `Samples` class acts as an interface to the results of the non-linear search.

In [3]:
samples_gen = agg.values("samples")

When we print this the length of this generator converted to a list of outputs we see 3 different NestSamples 
instances. These correspond to each fit of each search to each of our 3 images.

In [4]:
print("NestedSampler Samples: \n")
print(samples_gen)
print()
print("Total Samples Objects = ", len(list(samples_gen)), "\n")

NestedSampler Samples: 

<map object at 0x7fc7305cabb0>

Total Samples Objects =  6 



The `Samples` class contains all the parameter samples, which is a list of lists where:
 
 - The outer list is the size of the total number of samples.
 - The inner list is the size of the number of free parameters in the fit.

In [5]:
for samples in agg.values("samples"):

    print("All parameters of the very first sample")
    print(samples.parameters[0])
    print("The third parameter of the tenth sample")
    print(samples.parameters[9][2])

print("Samples: \n")
print(agg.values("samples"))
print()
print("Total Samples Objects = ", len(list(agg.values("samples"))), "\n")

All parameters of the very first sample
[-0.00018404201165731502, -0.00020283023113736408, 0.05212600640460746, -0.004667410484537495, 1.607816716372974, -0.001285068283737329, -0.0013132901986751123, 0.0524377232013, -0.000128356460447991, 0.9990330756458674, 0.8003503835610669, 3.999129016275198]
The third parameter of the tenth sample
0.05225756837166581
All parameters of the very first sample
[-0.15570866280033258, 0.18645606180638363, 0.1759574961128006, -0.01745526511305181, 2.330495057219531, 0.2051529486809672, 0.2895611558810568, -0.2337052438691021, 0.09014458861724935, 252269.01391967977, 17.651204720876144, 2.509999057457366]
The third parameter of the tenth sample
-0.4178913714730734
All parameters of the very first sample
[1.2380328786650727e-05, 0.0006405351601923813, 0.05221874493430732, 0.00034954619202387516, 1.6052459736872122, -0.0007194830672506537, -0.00031832054904504036, 0.05261093858195233, 1.6440849502267364e-06, 0.9998082574229821, 0.7997019381021037, 3.99993

The `Samples` class contains the log likelihood, log prior, log posterior and weights of every sample, where:

 - The log likelihood is the value evaluated from the likelihood function (e.g. -0.5 * chi_squared + the noise  
 normalization).
    
 - The log prior encodes information on how the priors on the parameters maps the log likelihood value to the log
 posterior value.
      
 - The log posterior is log_likelihood + log_prior.
    
 - The weight gives information on how samples should be combined to estimate the posterior. The weight values 
 depend on the sampler used, for example for MCMC they will all be 1`s.

In [6]:
for samples in agg.values("samples"):
    print("log(likelihood), log(prior), log(posterior) and weight of the tenth sample.")
    print(samples.log_likelihoods[9])
    print(samples.log_priors[9])
    print(samples.log_posteriors[9])
    print(samples.weights[9])

log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-1e+99
8.056974987112707
-1e+99
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-2.0743222487187868e+16
6.293091985143382
-2.074322248718786e+16
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-1e+99
3.358583230601245
-1e+99
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-3.059214675171362e+16
3.245887331817932
-3.0592146751713616e+16
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-1e+99
4.232557729883772
-1e+99
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-2.6330629472910172e+16
5.5986403397993305
-2.633062947291017e+16
0.0


We can use the outputs to create a list of the maximum log likelihood model of each fit to our three images.

In [7]:
ml_vector = [samps.max_log_likelihood_vector for samps in agg.values("samples")]

print("Max Log Likelihood Model Parameter Lists: \n")
print(ml_vector, "\n\n")

Max Log Likelihood Model Parameter Lists: 

[[-0.0002672141518261873, -0.001066292869419615, 0.05217771187591518, 0.0018556408554747912, 1.599500971505542, -0.0005134542187038124, -0.0010578160686560887, 0.05253379311028625, 4.5428583175606966e-05, 1.0000312903292794, 0.7998193517260072, 4.000780765934079], [0.0006769528282948431, -0.0023651751083101453, -0.001323235535297087, 0.24916122140309466, 0.800040924229582, 0.1021443943039724, 0.09997293261366325, 0.001123740723307094, 0.24637459200874393, 0.3137202711434677, 0.9671301418967997, 1.9675560456243992], [0.0017370057736058229, 0.00045792679402902487, 0.052588118995531015, 0.001999027392053909, 1.5955725519185895, 1.5731242958029745e-05, 0.0006979262579355166, 0.05246107932104077, 8.540201604954474e-06, 0.9991309241218923, 0.7996079707844024, 3.999718768492638], [-0.004310775839461669, -0.0034890308733216547, 0.24936616998413486, 0.0016739761261757382, 0.9980551801403019, 0.1994827255529306, 0.20210740762888718, -0.0022027339999807

This provides us with lists of all model parameters. However, this isn't that much use, which values correspond to 
which parameters?

The list of parameter names are available as a property of the `Model` included with the `Samples`, as are labels 
which can be used for labeling figures.

In [8]:
for samples in agg.values("samples"):
    model = samples.model
    print(model)
    print(model.parameter_names)
    print(model.parameter_labels)

Galaxy (centre_0, GaussianPrior, mean = 0.0, sigma = 0.001), (centre_1, GaussianPrior, mean = 0.0, sigma = 0.001), (elliptical_comps_0, UniformPrior, lower_limit = 0.052, upper_limit = 0.053), (elliptical_comps_1, UniformPrior, lower_limit = -0.005, upper_limit = 0.005), (einstein_radius, UniformPrior, lower_limit = 1.59, upper_limit = 1.61), Galaxy (centre_0, GaussianPrior, mean = 0.0, sigma = 0.001), (centre_1, GaussianPrior, mean = 0.0, sigma = 0.001), (elliptical_comps_0, GaussianPrior, mean = 0.0526, sigma = 0.0001), (elliptical_comps_1, GaussianPrior, mean = 0.0, sigma = 0.0001), (intensity, LogUniformPrior, lower_limit = 0.999, upper_limit = 1.0001), (effective_radius, LogUniformPrior, lower_limit = 0.799, upper_limit = 0.801), (sersic_index, UniformPrior, lower_limit = 3.999, upper_limit = 4.001)
['centre_0', 'centre_1', 'elliptical_comps_0', 'elliptical_comps_1', 'einstein_radius', 'centre_0', 'centre_1', 'elliptical_comps_0', 'elliptical_comps_1', 'intensity', 'effective_radi

These lists will be used later for visualization, how it is often more useful to create the model instance of every fit.

In [9]:
ml_instances = [samps.max_log_likelihood_instance for samps in agg.values("samples")]
print("Maximum Log Likelihood Model Instances: \n")
print(ml_instances, "\n")

Maximum Log Likelihood Model Instances: 

[<autofit.mapper.model.ModelInstance object at 0x7fc7121c7b80>, <autofit.mapper.model.ModelInstance object at 0x7fc6efcda1c0>, <autofit.mapper.model.ModelInstance object at 0x7fc7121bd0d0>, <autofit.mapper.model.ModelInstance object at 0x7fc7316444c0>, <autofit.mapper.model.ModelInstance object at 0x7fc6ef739e80>, <autofit.mapper.model.ModelInstance object at 0x7fc7305e55b0>] 



A model instance contains all the model components of our fit, most importantly the list of galaxies we specified in 
the pipeline.

In [10]:
print(ml_instances[0].galaxies)
print(ml_instances[1].galaxies)
print(ml_instances[2].galaxies)

<autofit.mapper.model.ModelInstance object at 0x7fc7121bd250>
<autofit.mapper.model.ModelInstance object at 0x7fc6ef6e6730>
<autofit.mapper.model.ModelInstance object at 0x7fc700ba5220>


These galaxies will be named according to the search (in this case, `lens` and `source`).

In [11]:
print(ml_instances[0].galaxies.lens)
print()
print(ml_instances[1].galaxies.source)

Redshift: 0.5
Mass Profiles:
EllIsothermal
centre: (-0.0002672141518261873, -0.001066292869419615)
elliptical_comps: (0.05217771187591518, 0.0018556408554747912)
axis_ratio: 0.9007599933964339
angle: 43.98159965990632
einstein_radius: 1.599500971505542
slope: 2.0
core_radius: 0.0
id: 1
_is_frozen: False
_frozen_cache: {}
_assertions: []
cls: <class 'autogalaxy.profiles.mass_profiles.total_mass_profiles.EllIsothermal'>

Redshift: 1.0
Light Profiles:
EllSersic
centre: (0.1021443943039724, 0.09997293261366325)
elliptical_comps: (0.001123740723307094, 0.24637459200874393)
axis_ratio: 0.6046507209992402
angle: 0.13066516662096977
intensity: 0.3137202711434677
effective_radius: 0.9671301418967997
sersic_index: 1.9675560456243992
id: 8
_is_frozen: False
_frozen_cache: {}
_assertions: []
cls: <class 'autogalaxy.profiles.light_profiles.EllSersic'>


Their `LightProfile`'s and `MassProfile`'s are also named according to the search.

In [12]:
print(ml_instances[0].galaxies.lens.mass)
print(ml_instances[1].galaxies.source.bulge)

EllIsothermal
centre: (-0.0002672141518261873, -0.001066292869419615)
elliptical_comps: (0.05217771187591518, 0.0018556408554747912)
axis_ratio: 0.9007599933964339
angle: 43.98159965990632
einstein_radius: 1.599500971505542
slope: 2.0
core_radius: 0.0
id: 1
_is_frozen: False
_frozen_cache: {}
_assertions: []
cls: <class 'autogalaxy.profiles.mass_profiles.total_mass_profiles.EllIsothermal'>
EllSersic
centre: (0.1021443943039724, 0.09997293261366325)
elliptical_comps: (0.001123740723307094, 0.24637459200874393)
axis_ratio: 0.6046507209992402
angle: 0.13066516662096977
intensity: 0.3137202711434677
effective_radius: 0.9671301418967997
sersic_index: 1.9675560456243992
id: 8
_is_frozen: False
_frozen_cache: {}
_assertions: []
cls: <class 'autogalaxy.profiles.light_profiles.EllSersic'>


We can access the `median pdf` model, which is the model computed by marginalizing over the samples of every parameter 
in 1D and taking the median of this PDF.

In [13]:
mp_vector = [samps.median_pdf_vector for samps in agg.values("samples")]
mp_instances = [samps.median_pdf_instance for samps in agg.values("samples")]

print("Median PDF Model Parameter Lists: \n")
print(mp_vector, "\n")
print("Most probable Model Instances: \n")
print(mp_instances, "\n")
print(mp_instances[0].galaxies.lens.mass)
print()

Median PDF Model Parameter Lists: 

[[-0.0002672141518261873, -0.001066292869419615, 0.05217771187591518, 0.0018556408554747912, 1.599500971505542, -0.0005134542187038124, -0.0010578160686560887, 0.05253379311028625, 4.5428583175606966e-05, 1.0000312903292794, 0.7998193517260072, 4.000780765934079], [0.0012313248040523356, -0.0022241388654452745, -0.0009335755369794215, 0.24725105756988935, 0.7993260440880611, 0.1017779780666379, 0.0995073121213994, -0.0006346053192086077, 0.2446369732748091, 0.3106886709447523, 0.9725495789315471, 1.9742552350341542], [0.0017370057736058229, 0.00045792679402902487, 0.052588118995531015, 0.001999027392053909, 1.5955725519185895, 1.5731242958029745e-05, 0.0006979262579355166, 0.05246107932104077, 8.540201604954474e-06, 0.9991309241218923, 0.7996079707844024, 3.999718768492638], [-0.003499699773147686, -0.003637834892893185, 0.24906613800456112, 0.0018124653439860294, 0.9980181808439226, 0.19940354745949948, 0.20185380021048552, -0.0029523107180123246, 0

We can compute the model parameters at a given sigma value (e.g. at 3.0 sigma limits).

These parameter values do not account for covariance between the model. For example if two parameters are degenerate 
this will find their values from the degeneracy in the `same direction` (e.g. both will be positive). we'll cover
how to handle covariance in a later tutorial.

Here, I use "uv3" to signify this is an upper value at 3 sigma confidence,, and "lv3" for the lower value.

In [14]:
uv3_vectors = [
    samps.vector_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

uv3_instances = [
    samps.instance_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

lv3_vectors = [
    samps.vector_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

lv3_instances = [
    samps.instance_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

print("Errors Lists: \n")
print(uv3_vectors, "\n")
print(lv3_vectors, "\n")
print("Errors Instances: \n")
print(uv3_instances, "\n")
print(lv3_instances, "\n")

Errors Lists: 

[[0.0021263235033692255, 0.002649422016142297, 0.05298000340232624, 0.004907630798053206, 1.6095121634263094, 0.0024740043160680772, 0.0021004605319928167, 0.05287491482666605, 0.00020736845326054773, 1.0000891305846895, 0.8009732390275229, 4.0009322267927985], [0.004484305103491024, 0.001359804513999883, 0.0031521142739407014, 0.25091494608967035, 0.8006741074108346, 0.10387480605187943, 0.10230175772916295, 0.0031608045808866345, 0.24967826691370162, 0.3210725955489303, 0.9895266927492999, 2.00294232227352], [0.0017370057736058229, 0.002271902487608705, 0.052999971144637635, 0.0049739858700768, 1.6098566557676595, 0.0030885405077213097, 0.00235070829932263, 0.052819162334241936, 0.00022728026119491965, 1.0000906683230624, 0.8009751893863667, 4.000902296004236], [-0.0008271519727919498, -0.00045599449335245447, 0.2507471575133184, 0.004364808977349788, 0.9995615302056249, 0.20072590456037886, 0.20364382656044772, 0.00021830771609990674, 0.1543744998192272, 0.3273662752

We can compute the upper and lower errors on each parameter at a given sigma limit.

Here, "ue3" signifies the upper error at 3 sigma. 

In [15]:
ue3_vectors = [
    samps.error_vector_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

ue3_instances = [
    samps.error_instance_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

le3_vectors = [
    samps.error_vector_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]
le3_instances = [
    samps.error_instance_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

print("Errors Lists: \n")
print(ue3_vectors, "\n")
print(le3_vectors, "\n")
print("Errors Instances: \n")
print(ue3_instances, "\n")
print(le3_instances, "\n")

Errors Lists: 

[[0.002393537655195413, 0.003715714885561912, 0.0008022915264110572, 0.0030519899425784144, 0.010011191920767493, 0.0029874585347718897, 0.0031582766006489054, 0.0003411217163798025, 0.00016193987008494078, 5.784025541011317e-05, 0.001153887301515777, 0.00015146085871986514], [0.003252980299438688, 0.0035839433794451576, 0.004085689810920123, 0.003663888519781, 0.001348063322773485, 0.0020968279852415367, 0.0027944456077635405, 0.0037954099000952423, 0.0050412936388925245, 0.01038392460417803, 0.016977113817752798, 0.02868708723936564], [0.0, 0.00181397569357968, 0.00041185214910662016, 0.002974958478022891, 0.014284103849069973, 0.00307280926476328, 0.0016527820413871136, 0.0003580830132011645, 0.00021874005958996516, 0.0009597442011700652, 0.001367218601964315, 0.0011835275115981148], [0.002672547800355736, 0.003181840399540731, 0.0016810195087572644, 0.002552343633363759, 0.0015433493617023064, 0.0013223571008793844, 0.0017900263499621982, 0.0031706184341122315, 0.00

The maximum log likelihood of each model fit and its Bayesian log evidence (estimated via the nested sampling 
algorithm) are also available.

Given each fit is to a different image, these are not very useful. However, in a later tutorial we'll look at using 
the aggregator for images that we fit with many different models and many different pipelines, in which case comparing 
the evidences allows us to perform Bayesian model comparison!

In [16]:
print("Maximum Log Likelihoods and Log Evidences: \n")
print([max(samps.log_likelihoods) for samps in agg.values("samples")])
print([samps.log_evidence for samps in agg.values("samples")])

Maximum Log Likelihoods and Log Evidences: 

[-112148301.56441748, 5393.260056799782, -46299766.60634844, 4490.518202632665, -25387144.571267933, 3716.6430327584458]
[-112148306.22899555, 5232.516244037385, -46299771.270926505, 4326.044858852035, -25387149.235846, 3613.035611484433]


We can also print the "model_results" of all searches, which is string that summarizes every fit`s lens model providing 
quick inspection of all results.

In [17]:
results = agg.model_results
print("Model Results Summary: \n")
print(results, "\n")

Model Results Summary: 



Bayesian Evidence                                                                         -112148306.22899555
Maximum Likelihood                                                                        -112148301.56441748

Maximum Log Likelihood Model:

galaxies
    lens
        mass
            centre
                centre_0                                                                  -0.000
                centre_1                                                                  -0.001
            elliptical_comps
                elliptical_comps_0                                                        0.052
                elliptical_comps_1                                                        0.002
            einstein_radius                                                               1.600
    source
        bulge
            centre
                centre_0                                                                  -0.001
                cen

The Probability Density Functions (PDF's) of the results can be plotted using the library:

 corner.py: https://corner.readthedocs.io/en/latest/

(In built visualization for PDF's and non-linear searches is a future feature of PyAutoFit, but for now you`ll have to 
use the libraries yourself!).

(uncomment the code below to make a corner.py plot.)

In [18]:
# import corner
#
# for samples in agg.values("samples"):
#
#     corner.corner(
#         xs=samples.parameters,
#         weights=samples.weights,
#         labels=samples.model.parameter_labels,
#     )

Finished.