## Running Group Finder

This notebook will call the functions to preprocess, run group finder, and run postprocessing code to build up a GroupCatalog object, which mostly wraps a pandas DataFrame containing the resulting group catalog data.

After running this on a given GroupCatalog definition, a serialized (via pickle) version of the GroupCatalog object will exist which can be deserialized elsewhere for analysis. See post_plots.ipynb for that.

In [None]:
import sys
import matplotlib.pyplot as plt
from astropy.table import Table,join
import astropy.io.fits as fits

if './SelfCalGroupFinder/py/' not in sys.path:
    sys.path.append('./SelfCalGroupFinder/py/')
from groupcatalog import *
import catalog_definitions as cat
from pyutils import *
from dataloc import *
import plotting as pp
%load_ext autoreload
%autoreload 2

In [None]:
datasets_to_run: list[GroupCatalog] = []
#datasets_to_run.extend(cat.sdss_list)
#datasets_to_run.extend(cat.uchuu_list)
#datasets_to_run.extend(cat.mxxl_list)
#datasets_to_run.extend(cat.bgs_sv3_list)  
#datasets_to_run.extend(cat.bgs_y1_list)  
#datasets_to_run.extend(cat.bgs_y3_list)  
#datasets_to_run.extend(cat.bgs_aux_list)

# TODO LOA columns not same as KIBO I guess...

gc = cat.bgs_sv3_pz_2_4_10p_c1

datasets_to_run.extend([
    #gc
    cat.bgs_y1_pzp_2_4_c1
    #cat.bgs_y3_pzp_2_4_c1
    #cat.bgs_y1_hybrid8_mcmc
])

# run_group_finder() took 189.6 seconds for bgs_y1_pzp_2_4_c1
# nanoflann: run_group_finder() took 65.4 seconds. groupfind() took 40.35 sec. All iterations took 13.72s.
# run_group_finder() took 67.4 seconds. groupfind() took 40.62 sec. All iterations took 11.95s.
# No output: run_group_finder() took 57.7 seconds.



# run_group_finder() took 13.4 seconds for bgs_y1mini_pzp_2_4_c1
# nanoflann: run_group_finder() took 12.3s. groupfind() took 2.05 sec. All iterations took 0.65s.
# run_group_finder() took 11.9 seconds. groupfind() took 1.89 sec. All iterations took 0.56s.
# No output: run_group_finder() took 11.4 seconds.


for d in datasets_to_run:
    #d = deserialize(d)
    d.preprocess()
    success = d.run_group_finder(popmock=True, profile=True, silent=False)
    if not success:
        print(f"Group finder failed for {d.name}")
        continue
    d.calc_wp_for_mock()
    d.postprocess()
    d.dump()
    d.chisqr()

    #d = deserialize(d)
    #d.calculate_projected_clustering(with_extra_randoms=True) # 15m
    #d.calculate_projected_clustering_in_magbins(with_extra_randoms=True) # 45m
    #serialize(d)


Pre-processing...
Reading data from  /mount/sirocco1/imw2293/GROUP_CAT/DATA/BGS_IRON/ian_BGS_merged.fits

Mode PHOTOZ PLUS
10,805,993 objects in file
17,302 galaxies (0.16%) have a SGA collision, are not SGA centrals, and are blue enough to remove.
212,528 galaxies (1.97%) have fracflux in two bands too high to keep.
5,656,813 galaxies in the neighbor catalog.
2,008,391 galaxies left for main catalog after filters.
429539 (21.4)% need redshifts
Matching 429,539 lost galaxies to 543,532 SDSS galaxies
21,922 of 429,539 lost galaxies matched to SDSS catalog (would have matched 22,372 with 3")
18,333 are reasonable matches given the photo-z.
(18333, 50) (18333, 50)
Quiescent Fraction for Dn4000: 61.39% (N=18333)
Quiescent Fraction for missing: nan% (N=0)
Overall Quiescent Fraction after very blue cut: 61.13%
18,333 of 429,539 redshifts taken from SDSS.
411,206 remaining galaxies need redshifts.
Initializing PhotometricRedshiftGuesser for Mode.PHOTOZ_PLUS_v2


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


Assigning missing redshifts... 
Using given parameters: N=8, p=([1.2938, 1.5467, 3.0134], [1.2229, 0.8628, 2.5882], [0.8706, 0.6126, 2.4447], [1.1163, 1.2938, 3.165])
Assigning missing redshifts complete.
Neighbor usage %: 66.69
(411206, 50) (411206, 50)
Quiescent Fraction for Dn4000: 47.65% (N=411206)
Quiescent Fraction for missing: nan% (N=0)
Overall Quiescent Fraction after very blue cut: 47.33%
Catalog contains 964,084 quiescent and 1,044,307 star-forming galaxies
3,639 galaxies have redshifts outside the range of the catalog and will be removed.
1,734 unobserved galaxies have implied log(L_gal) > 11.60 and will be removed.
0 galaxies have nan log_L_gal and will be removed.
Final Catalog Size: 2,004,739.
Galprops pickling took 1.2981 seconds
Output file will be /mount/sirocco1/imw2293/GROUP_CAT/OUTPUT/BGS_Y1_PZP_V2.4_C1/BGS Y1 PZP v2.4 C1.dat
Time for file writing: 7.36
Pre-processing complete in 6.4e+01 seconds.
Skipping pre-processing
Running Group Finder for BGS Y1 PZP v2.4 C1
[

input> FLUXLIM: 1, COLOR: 1, STELLAR_MASS: 0 
input> z: 0.001026-0.499989, frac_area: 0.064968
input> wcen ON: 19.737869 6.394322 24.657219 11.571343 33.116539 9.403598
input> Bsat ON: -3.755194 16.879988 9.941906 0.958446
input> SECOND_PARAMETER= 0
Allocating space for [2004739] galaxies
min vmax= 8.753051e+00 max vmax= 1.479995e+11
Done reading in from [/mount/sirocco1/imw2293/GROUP_CAT/OUTPUT/BGS_Y1_PZP_V2.4_C1/BGS Y1 PZP v2.4 C1.dat]
Sorting galaxies...
Done sorting galaxies.
Starting inverse-sham...
Done inverse-sham.
Building KD-tree...
Done building KD-tree. 2004739
iter 1 ngroups=1606499 fsat=0.198640 (kdtime=0.61 3.66)
iter 2 ngroups=1556831 fsat=0.223608 (kdtime=0.52 3.42)
iter 3 ngroups=1546286 fsat=0.228811 (kdtime=0.53 3.41)
iter 4 ngroups=1543401 fsat=0.230199 (kdtime=0.52 3.41)
iter 5 ngroups=1542166 fsat=0.230777 (kdtime=0.53 3.44)
Group finding complete. All iterations took 17.34s.
groupfind() took 44.10 sec
lsat_model> Applying Lsat model...
Writing LSAT to pipe
Readi

Group Finder completed successfully.
run_group_finder() took 53.7 seconds.
Running Corrfunc on mock populated with HOD from this sample.
Done with wp on mock populated with HOD from this sample (time = 1.8s).
Post-processing...
Post-processing done.
Red Clustering χ^2:  [ 77. 146.  83.  12.   9.   4.]
Blue Clustering χ^2:  [ 9.  4. 12.  3. 11.  2.]
No sep Clustering χ^2:  [0 0 0 0 0 0]
LSat χ^2:  [0. 2. 3. 1. 0. 1. 0. 0. 1. 1. 0. 0. 1. 7. 0. 5. 2. 1. 5. 0.]
χ^2: 402.9. χ^2/DOF: 2.014 (dof=200)


In [None]:
#χ^2: 663.7. χ^2/DOF: 3.319 (dof=200)

In [None]:
pp.proj_clustering_plot(gc)
pp.lsat_data_compare_plot(gc)
pp.hod_plot(gc)
pp.single_plots(gc)

In [None]:
bgs_sv3_pz_2_4_10p = deserialize(cat.bgs_sv3_pz_2_4_10p)
bgs_sv3_pz_2_4_10p.add_jackknife_err_to_proj_clustering(with_extra_randoms=True, for_mag_bins=False)
serialize(bgs_sv3_pz_2_4_10p)

## Test of writing


In [None]:
catalog = deserialize(cat.bgs_sv3_pz_2_4_10p)
catalog.all_data['Z_ASSIGNED_FLAG'] = catalog.all_data['Z_ASSIGNED_FLAG'].astype('int32')
columns_to_write = [
            'TARGETID', 
            'RA',
            'DEC',
            'Z',
            'L_GAL', 
            'VMAX',
            'P_SAT', 
            'M_HALO',
            'N_SAT', 
            'L_TOT', 
            'IGRP', 
            'WEIGHT', 
            'APP_MAG_R', 
            'Z_ASSIGNED_FLAG',
            'G_R',
            'IS_SAT', 
            'QUIESCENT', 
            'MSTAR' 
        ]

table = Table.from_pandas(
    catalog.all_data.loc[:, columns_to_write],
    units={ 
        'RA': u.degree,
        'DEC': u.degree,
        'L_GAL': u.solLum,
        'VMAX': u.Mpc**3,
        'M_HALO': u.solMass,
        'L_TOT': u.solLum,
        'MSTAR': u.solMass
    } # Others are dimensionless
    )
table.info.name = "GALAXIES"
table.info

In [None]:
frompath = catalog.write_sharable_output_file()

read = Table.read(frompath)
read

In [None]:

hdul = fits.open(frompath, memmap=True)
hdul.info()
hdul[1].name = "GALAXIES"
hdul.info()
hdul.writeto(frompath, overwrite=True)
hdul = fits.open(frompath, memmap=True)
hdul.info()