Database 1: Samples
===================

After fitting a large suite of strong lens data, we can use the aggregator to load the results and manipulate,
interpret and visualize them using a Python script or Jupyter notebook.

This script uses the results generated by the script `/autolens_workspace/aggregator/phase_runner.py`, which fitted 3
simulated strong lenses with:

 - An `EllipticalIsothermal` `MassProfile`.for the lens galaxy's mass.
 - An `EllipticalSersic` `LightProfile`.for the source galaxy's light.

This fit was performed using one `PhaseImaging` object, and the first four tutorials (a1-a4) cover how to use the
aggregator on the results of `Phase`'s (as opposed to `Pipeline`'s). However, the aggregator API is extremely similar
across both and learning to use the aggregator with phases can be easily applied to the results of pipelines.

__Samples__

If you are familiar with the `Samples` object returned from a *PyAutoLens* model-fit (e.g. via a `Phase` or `Pipeline`)
You will be familiar with most of the content in this script. Nevertheless, the script also describes how to use
the `Aggregator`, so will be useful for you too!

__File Output__

The results of this fit are in the `autolens_workspace/output/aggregator` folder. First, take a look in this folder.
Provided you haven't rerun the runner, you`ll notice that all the results (e.g. samples, samples_backup,
model.results, images, etc.) are in .zip files as opposed to folders that can be instantly accessed.

This is because when the pipeline was run, the `remove_files` option in the `config/general.ini` was set to True.
This means all results (other than the .zip file) were removed. This feature is implemented because super-computers
often have a limit on the number of files allowed per user.

Bare in mind the fact that all results are in .zip files, we'll come back to this point in a second.

In [1]:
%matplotlib inline
from pyprojroot import here
workspace_path = str(here())
%cd $workspace_path
print(f"Working Directory has been set to `{workspace_path}`")

from os import path
import autofit as af

/mnt/c/Users/Jammy/Code/PyAuto/autolens_workspace
Working Directory has been set to `/mnt/c/Users/Jammy/Code/PyAuto/autolens_workspace`


To set up the aggregator we simply pass it the folder of the results we want to load.

In [2]:
agg = af.Aggregator(directory=path.join("output", "database", "phase_runner"))

Aggregator loading phases... could take some time.

 A total of 3 phases and results were found.


Before we continue, take another look at the output folder. The .zip files containing results have now all been 
unzipped, such that the results are accessible on your laptop for navigation. This means you can run fits to many 
lenses on a super computer and easily unzip all the results on your computer afterwards via the aggregator.

To begin, let me quickly explain what a generator is in Python, for those unaware. A generator is an object that 
iterates over a function when it is called. The aggregator creates all objects as generators, rather than lists, or 
dictionaries, or whatever.

Why? Because lists store every entry in memory simultaneously. If you fit many lenses, you`ll have lots of results and 
therefore use a lot of memory. This will crash your laptop! On the other hand, a generator only stores the object in 
memory when it runs the function; it is free to overwrite it afterwards. Thus, your laptop won't crash!

There are two things to bare in mind with generators:

    1) A generator has no length, thus to determine how many entries of data it corresponds to you first must convert 
       it to a list.
    
    2) Once we use a generator, we cannot use it again and we'll need to remake it.

We can now create a `samples` generator of every fit, which creates `Sample`'s objects of our results. This object 
contains information on the result of the non-linear search.

In [3]:
samples_gen = agg.values("samples")

When we print this the length of this generator converted to a list of outputs we see 3 different NestSamples 
instances. These correspond to each fit of each phase to each of our 3 images.

In [4]:
print("NestedSampler Samples: \n")
print(samples_gen)
print()
print("Total Samples Objects = ", len(list(samples_gen)), "\n")

NestedSampler Samples: 

<map object at 0x7f7ab1672310>

Total Samples Objects =  3 



The `Samples` class contains all the parameter samples, which is a list of lists where:
 
 - The outer list is the size of the total number of samples.
 - The inner list is the size of the number of free parameters in the fit.

In [5]:
for samples in agg.values("samples"):

    print("All parameters of the very first sample")
    print(samples.parameters[0])
    print("The third parameter of the tenth sample")
    print(samples.parameters[9][2])

print("Samples: \n")
print(agg.values("samples"))
print()
print("Total Samples Objects = ", len(list(agg.values("samples"))), "\n")

All parameters of the very first sample
[0.022031680799566408, -0.0487721521816746, -0.07136903969116735, -0.25920024795345714, 2.53219054795621, -0.4720967159344951, -0.022198590640582862, -0.35478587433585573, 0.44876170136778276, 357434.7002037769, 0.22762051429746258, 1.857839928375425]
The third parameter of the tenth sample
0.22325616030157797
All parameters of the very first sample
[0.046670358408425434, -0.1140484729795512, 0.49811612127678223, -0.26138820712365296, 2.0074304788571764, -0.26399919034418545, 0.02326713583916184, 0.02611530837243495, 0.4622295042014137, 208075.33344557657, 6.640267810422836, 2.301354963797475]
The third parameter of the tenth sample
0.11913523859020043
All parameters of the very first sample
[-0.0823136987994281, -0.027281242252268315, -0.5196330736814809, 0.19705374772523238, 2.24634379739564, 0.14111810418266907, -0.1618805015037641, -0.0315027335158673, 0.3648089782925841, 241064.1793635043, 0.16143516880389752, 0.6597422546536564]
The third p

The `Samples` class contains the log likelihood, log prior, log posterior and weights of every sample, where:

   - The log likelihood is the value evaluated from the likelihood function (e.g. -0.5 * chi_squared + the noise 
     normalization).
    
   - The log prior encodes information on how the priors on the parameters maps the log likelihood value to the log
     posterior value.
      
   - The log posterior is log_likelihood + log_prior.
    
   - The weight gives information on how samples should be combined to estimate the posterior. The weight values 
     depend on the sampler used, for example for MCMC they will all be 1`s.

In [6]:
for samples in agg.values("samples"):
    print("log(likelihood), log(prior), log(posterior) and weight of the tenth sample.")
    print(samples.log_likelihoods[9])
    print(samples.log_priors[9])
    print(samples.log_posteriors[9])
    print(samples.weights[9])

log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-307742744738.9641
2.768042556993659
-307742744736.19604
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-211018159113795.16
10.207397384731259
-211018159113784.94
0.0
log(likelihood), log(prior), log(posterior) and weight of the tenth sample.
-2235879633.1016784
332.16540244913153
-2235879300.936276
0.0


We can use the outputs to create a list of the maximum log likelihood model of each fit to our three images.

In [7]:
ml_vector = [samps.max_log_likelihood_vector for samps in agg.values("samples")]

print("Max Log Likelihood Model Parameter Lists: \n")
print(ml_vector, "\n\n")

Max Log Likelihood Model Parameter Lists: 

[[0.0007675768961561325, 5.42173220638556e-05, -0.0005072265748036187, 0.2496444565271391, 0.798740693949946, 0.10229898874608473, 0.10161058183099471, 0.0022279938361985017, 0.24810529383317712, 0.3083561943766081, 0.9763884197015772, 1.9885203904921538], [-0.002065524468680003, -7.601316107790045e-05, 0.25171815559124744, 0.0009702424242188306, 0.999554191034665, 0.1998346039471717, 0.2014336700603386, -0.0022583328711807986, 0.151976670434614, 0.3052596670353206, 1.481277475497304, 2.4918961186251547], [0.0006523952571976681, -0.0007984663806759884, 0.2489036284102848, -0.00024190232029099864, 1.2003217235075145, 0.300448666517787, 0.3002555061681218, 0.0005948069762867818, 0.223615140978937, 0.30059106767447363, 1.9886478197053985, 3.0053086511827676]] 




This provides us with lists of all model parameters. However, this isn't that much use, which values correspond to 
which parameters?

The list of parameter names are available as a property of the `Model` included with the `Samples`, as are labels 
which can be used for labeling figures.

In [8]:
for samples in agg.values("samples"):
    model = samples.model
    print(model)
    print(model.parameter_names)
    print(model.parameter_labels)

Galaxy (centre_0, GaussianPrior, mean = 0.0, sigma = 0.001), (centre_1, GaussianPrior, mean = 0.0, sigma = 0.001), (elliptical_comps_0, UniformPrior, lower_limit = 0.052, upper_limit = 0.053), (elliptical_comps_1, UniformPrior, lower_limit = -0.005, upper_limit = 0.005), (einstein_radius, UniformPrior, lower_limit = 1.59, upper_limit = 1.61), Galaxy (centre_0, GaussianPrior, mean = 0.0, sigma = 0.001), (centre_1, GaussianPrior, mean = 0.0, sigma = 0.001), (elliptical_comps_0, GaussianPrior, mean = 0.0526, sigma = 0.0001), (elliptical_comps_1, GaussianPrior, mean = 0.0, sigma = 0.0001), (intensity, LogUniformPrior, lower_limit = 0.999, upper_limit = 1.0001), (effective_radius, LogUniformPrior, lower_limit = 0.799, upper_limit = 0.801), (sersic_index, UniformPrior, lower_limit = 3.999, upper_limit = 4.001), None, None
['centre_0', 'centre_1', 'elliptical_comps_0', 'elliptical_comps_1', 'einstein_radius', 'centre_0', 'centre_1', 'elliptical_comps_0', 'elliptical_comps_1', 'intensity', 'ef

These lists will be used later for visualization, how it is often more useful to create the model instance of every fit.

In [9]:
ml_instances = [samps.max_log_likelihood_instance for samps in agg.values("samples")]
print("Maximum Log Likelihood Model Instances: \n")
print(ml_instances, "\n")

Maximum Log Likelihood Model Instances: 

[<autofit.mapper.model.ModelInstance object at 0x7f7a731369d0>, <autofit.mapper.model.ModelInstance object at 0x7f7a959dab20>, <autofit.mapper.model.ModelInstance object at 0x7f7a73136d90>] 



A model instance contains all the model components of our fit, most importantly the list of galaxies we specified in 
the pipeline.

In [10]:
print(ml_instances[0].galaxies)
print(ml_instances[1].galaxies)
print(ml_instances[2].galaxies)

<autofit.mapper.model.ModelInstance object at 0x7f7a73136fd0>
<autofit.mapper.model.ModelInstance object at 0x7f7a9581f1f0>
<autofit.mapper.model.ModelInstance object at 0x7f7a73136f70>


These galaxies will be named according to the phase (in this case, `lens` and `source`).

In [11]:
print(ml_instances[0].galaxies.lens)
print()
print(ml_instances[1].galaxies.source)

Redshift: 0.5
Mass Profiles:
EllipticalIsothermal
centre: (0.0007675768961561325, 5.42173220638556e-05)
elliptical_comps: (-0.0005072265748036187, 0.2496444565271391)
axis_ratio: 0.6004545651809539
phi: -0.05820658388813051
einstein_radius: 0.798740693949946
slope: 2.0
core_radius: 0.0
id: 27
_assertions: []
cls: <class 'autogalaxy.profiles.mass_profiles.total_mass_profiles.EllipticalIsothermal'>

Redshift: 1.0
Light Profiles:
EllipticalSersic
centre: (0.1998346039471717, 0.2014336700603386)
elliptical_comps: (-0.0022583328711807986, 0.151976670434614)
axis_ratio: 0.7361209843631151
phi: -0.4256686875728078
intensity: 0.3052596670353206
effective_radius: 1.481277475497304
sersic_index: 2.4918961186251547
id: 60
_assertions: []
cls: <class 'autogalaxy.profiles.light_profiles.EllipticalSersic'>


Their `LightProfile`'s and `MassProfile`'s are also named according to the phase.

In [12]:
print(ml_instances[0].galaxies.lens.mass)
print(ml_instances[1].galaxies.source.bulge)

EllipticalIsothermal
centre: (0.0007675768961561325, 5.42173220638556e-05)
elliptical_comps: (-0.0005072265748036187, 0.2496444565271391)
axis_ratio: 0.6004545651809539
phi: -0.05820658388813051
einstein_radius: 0.798740693949946
slope: 2.0
core_radius: 0.0
id: 27
_assertions: []
cls: <class 'autogalaxy.profiles.mass_profiles.total_mass_profiles.EllipticalIsothermal'>
EllipticalSersic
centre: (0.1998346039471717, 0.2014336700603386)
elliptical_comps: (-0.0022583328711807986, 0.151976670434614)
axis_ratio: 0.7361209843631151
phi: -0.4256686875728078
intensity: 0.3052596670353206
effective_radius: 1.481277475497304
sersic_index: 2.4918961186251547
id: 60
_assertions: []
cls: <class 'autogalaxy.profiles.light_profiles.EllipticalSersic'>


We can also access the `median pdf` model, which is the model computed by marginalizing over the samples of every 
parameter in 1D and taking the median of this PDF.

In [13]:
mp_vector = [samps.median_pdf_vector for samps in agg.values("samples")]
mp_instances = [samps.median_pdf_instance for samps in agg.values("samples")]

print("Median PDF Model Parameter Lists: \n")
print(mp_vector, "\n")
print("Most probable Model Instances: \n")
print(mp_instances, "\n")
print(mp_instances[0].galaxies.lens.mass)
print()

Median PDF Model Parameter Lists: 

[[0.00048824731540837723, -0.00032434836688285994, -0.0006117548558581092, 0.25122874169252024, 0.7992610432752469, 0.1021496221191687, 0.10151722025675064, 0.0029923247412953482, 0.24873419140369935, 0.30949103498829666, 0.9745378178567415, 1.9861294744144833], [-0.0024972405575819493, -0.0008537665826197714, 0.2518371112789252, 0.0009378403154186794, 0.9996527483122342, 0.19971864567013115, 0.20142003474703757, -0.001486026395862174, 0.1519731987186642, 0.3068973590677394, 1.4756443693792525, 2.4872562979265043], [7.62089755288875e-05, -0.0010701158575068747, 0.24864406707078723, -0.0008156222532175651, 1.2003492337447548, 0.3001880252998633, 0.30021028620623086, 0.0005117569471182748, 0.22293565502823065, 0.29966478846801187, 1.991542555198806, 3.0112112688056483]] 

Most probable Model Instances: 

[<autofit.mapper.model.ModelInstance object at 0x7f7a9786b8b0>, <autofit.mapper.model.ModelInstance object at 0x7f7ab1667040>, <autofit.mapper.model.M

We can compute the model parameters at a given sigma value (e.g. at 3.0 sigma limits).

These parameter values do not account for covariance between the model. For example if two parameters are degenerate 
this will find their values from the degeneracy in the `same direction` (e.g. both will be positive). we'll cover
how to handle covariance in a later tutorial.

Here, I use "uv3" to signify this is an upper value at 3 sigma confidence,, and "lv3" for the lower value.

In [14]:
uv3_vectors = [
    samps.vector_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

uv3_instances = [
    samps.instance_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

lv3_vectors = [
    samps.vector_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

lv3_instances = [
    samps.instance_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

print("Errors Lists: \n")
print(uv3_vectors, "\n")
print(lv3_vectors, "\n")
print("Errors Instances: \n")
print(uv3_instances, "\n")
print(lv3_instances, "\n")

Errors Lists: 

[[0.0035369801299268374, 0.003545776837072363, 0.004032128940248398, 0.2575587360929576, 0.8009156798321182, 0.10439053389198051, 0.10422248777780259, 0.008113196126221757, 0.2540167372900139, 0.3197830702323754, 0.9932534847928747, 2.0211631974223727], [0.00012878649045025846, 0.002106594700166435, 0.2547582795758186, 0.003626952263063246, 1.0015984043587087, 0.20135653613468074, 0.20356690439433842, 0.002139848328073963, 0.15519818811319988, 0.32127593856898257, 1.509718349803412, 2.523062673742351], [0.0028482474971894866, 0.0014327865800394357, 0.2508669679983052, 0.0014647581110305385, 1.202212699593703, 0.3019178127022056, 0.3016277488196638, 0.003204321261870212, 0.22589711892967013, 0.3112043006855008, 2.0299322481139828, 3.0406813085260804]] 

[[-0.00206834378220221, -0.00463810720065155, -0.005509177167563675, 0.24466192311594176, 0.7976119691177757, 0.10021263544878065, 0.09877754543960297, -0.0015225621705862572, 0.24360080904120018, 0.2995606056466964, 0.95

We can compute the upper and lower errors on each parameter at a given sigma limit.

Here, "ue3" signifies the upper error at 3 sigma. 

In [15]:
ue3_vectors = [
    samps.error_vector_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

ue3_instances = [
    samps.error_instance_at_upper_sigma(sigma=3.0) for samps in agg.values("samples")
]

le3_vectors = [
    samps.error_vector_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]
le3_instances = [
    samps.error_instance_at_lower_sigma(sigma=3.0) for samps in agg.values("samples")
]

print("Errors Lists: \n")
print(ue3_vectors, "\n")
print(le3_vectors, "\n")
print("Errors Instances: \n")
print(ue3_instances, "\n")
print(le3_instances, "\n")

Errors Lists: 

[[0.0030487328145184602, 0.003870125203955223, 0.004643883796106507, 0.006329994400437355, 0.0016546365568712584, 0.0022409117728118128, 0.002705267521051949, 0.005120871384926408, 0.005282545886314549, 0.010292035244078734, 0.018715666936133135, 0.03503372300788943], [0.002626027048032208, 0.0029603612827862067, 0.0029211682968933728, 0.0026891119476445662, 0.0019456560464745642, 0.0016378904645495962, 0.0021468696473008475, 0.003625874723936137, 0.0032249893945356933, 0.01437857950124316, 0.034073980424159434, 0.03580637581584689], [0.002772038521660599, 0.0025029024375463104, 0.002222900927517979, 0.0022803803642481037, 0.0018634658489482536, 0.0017297874023422533, 0.0014174626134329515, 0.0026925643147519373, 0.002961463901439476, 0.011539512217488945, 0.0383896929151768, 0.029470039720432162]] 

[[0.002556591097610587, 0.00431375883376869, 0.004897422311705566, 0.0065668185765784814, 0.0016490741574712864, 0.0019369866703880523, 0.0027396748171476665, 0.00451488691

The maximum log likelihood of each model fit and its Bayesian log evidence (estimated via the nested sampling 
algorithm) are also available.

Given each fit is to a different image, these are not very useful. However, in a later tutorial we'll look at using 
the aggregator for images that we fit with many different models and many different pipelines, in which case comparing 
the evidences allows us to perform Bayesian model comparison!

In [16]:
print("Maximum Log Likelihoods and Log Evidences: \n")
print([max(samps.log_likelihoods) for samps in agg.values("samples")])
print([samps.log_evidence for samps in agg.values("samples")])

Maximum Log Likelihoods and Log Evidences: 

[5395.531181179845, 4513.429449741024, 3773.9966422819834]
[5345.968763974438, 4460.490606147629, 3720.081124174727]


We can also print the "model_results" of all phases, which is string that summarizes every fit`s lens model providing 
quick inspection of all results.

In [17]:
results = agg.model_results
print("Model Results Summary: \n")
print(results, "\n")

Model Results Summary: 



Bayesian Evidence                                                                         5345.96876397
Maximum Likelihood                                                                        5395.53118118

Maximum Log Likelihood Model:

galaxies
    lens
        mass
            centre
                centre_0                                                                  0.001
                centre_1                                                                  0.000
            elliptical_comps
                elliptical_comps_0                                                        -0.001
                elliptical_comps_1                                                        0.250
            einstein_radius                                                               0.799
    source
        bulge
            centre
                centre_0                                                                  0.102
                centre_1         

The Probability Density Functions (PDF's) of the results can be plotted using the library:

 corner.py: https://corner.readthedocs.io/en/latest/

(In built visualization for PDF's and non-linear searches is a future feature of PyAutoFit, but for now you`ll have to 
use the libraries yourself!).

(uncomment the code below to make a corner.py plot.)

In [18]:
# import corner
#
# for samples in agg.values("samples"):
#
#     corner.corner(
#         xs=samples.parameters,
#         weights=samples.weights,
#         labels=samples.model.parameter_labels,
#     )

Finished.