In [387]:
run main_project_file.py

"main_project_file" will run the following:
Section 1: "master_data*.py"
Section 3: "data_mass_completeness*.py"

Beginning "master_data*.py"
"master_data*.py" Section 1: import data beginning...

Summary Table 1 - Catalogue by SUB-type:
     Property      Total M0416 M1149 M0717 A370 A1063 A2744
------------------ ----- ----- ----- ----- ---- ----- -----
FULL PHOT (Parent) 35753  6224  5825  5168 5697  5896  6943
             Total 44465  7431  6868  6370 6795  7611  9390
       spec & phot  1916   378   334   260  216   227   501
         only phot 33837  5846  5491  4908 5481  5669  6442
         spec only   104    10    10    32    5     8    39
           no data  8192  1152   994  1016 1033  1636  2361
             stars   416    45    39   154   60    71    47
               SUM 44465  7431  6868  6370 6795  7611  9390

Other skipped objects: 0 
NOTE: "use_phot==1": 35964 ;  "use_phot==0": 8501
NOTE: phot only + (spec+phot) samples are the FULL PHOT (Parent) sample w/: 35753 
NO


Membership definition [spec,phot]: [0.01, 0.04] 
Limiting magnitudes per Shipley et al. 2018: [27.5 27.4 26.9 27.1 27.4 27.2]
Limiting masses: [7.14 7.05 7.75 7.26 7.34 7.07]
Magnitudes at limiting mass: [27.53692  27.387985 26.90256  27.070026 27.385109 27.242561]

"data_mass_completeness*.py" Full CATALOGUE breakdown
             Property              Total M0416 M1149 M0717 A370 A1063 A2744
---------------------------------- ----- ----- ----- ----- ---- ----- -----
Full CATALOGUE ("master_data*.py") 44465  7431  6868  6370 6795  7611  9390
                      Phot Members  3351   336   561   681  540   510   723
                      Spec Members   622   107   111   126   67    82   129
                       Non-members 39057  6743  6026  5460 5790  6829  8209
                    Bad flag (!=0)   769    93   139    82  336    43    76
                Bad flux (F814W<0)   521   127    24    17   47   123   183
                        Mass = NaN   145    25     7     4   15    24 

In [388]:
#Created on Tue Jul 07 20:29:15 2020
#
#
################## master_smfz_9.py ##################
## This program will plot the Stellar Mass Function (SMF) for the master_dadta 
## file of all six clusters (most current version: 'master_data_6_final.py); 
## two plots (SF & Q) segregated between spectroscopic and photometric subsamples
#
## v2 includes the parallel fields data; 
## v3 commented out to produce conference 
##    plot. to re-insert, remove all #s and change plot panels to include || 
##    field in cenre column (positions 1 & 4)
## v4 creates SMFs by cluster, and calculates completeness correction by cluster for false pos/neg
##    and attempts compute the completeness correction on a cluster-by-cluster basis.
##    this approach was abandoned in favour of a single set of correciton factors 
##    for the entire sample, justified by the fact that mass completeness limits are nearly the 
##    same for all clusters. 
## v5 removes individual cluster code for mass correction in section (v), plots
##    correction factors for entire sample treated as single population
## v6 moves all totals and SF/Q fractions until AFTER the correction factors 
##    have been made, as this changes the counts in each hist & associated poissonian error
## 
## v8 COMPLETE OVERHAUL: lists are first sorted by cluster, completeness corrections computed 
##    and applied, then each cluster population (both SF & Q) are normalized by the total 
##    number of galaxies in the cluster (i.e. sum(SF)+sum(Q)). 
##
## v9 FINAL version, done after the kinks in variational analysis have been ironed out. This file
##    now just executes a given redshift cut and bin edges, as defined in "main_project_file.py"
##
#
### Section summary:
#
### PROGRAM START
#
### PLOTS THE STELLAR MASS FUNCTION:
### (1)    collect masses into SORTED arrays for SF & Q;
### (1.1)   add DIAG_FLAG_1: summarize sorted arrays;
### (1.2)   import FIELD data;
### (2)    bin into HISTOGRAMS;
### (2.1)    compute bin mid-points; add DIAG_FLAG_2: sub-samples check;
### (3)    CORRECTIONS to raw counts;
### (3.1)   calculate limiting MASS COMPLETENESS correction for low-mass bins; add DIAG_FLAG_3: 
###         display mass correction factors;
### (3.2)   calculate false pos/neg (i.e. spectroscopic) completeness correction for all bins;
###         includes FIGURE for SPECTROSCOPIC COMPLETENESS; add DIAG_FLAG_4: display 
###         corrections for different bin #s; add DIAG_FLAG_5: # of false pos/neg check 
###         & display spec correction factors;
### (3.3)  NORMALIZE; add DIAG_FLAG_6: check normalization result
### (4)    add ERROR BARS for scatter plot;
### (5)    EMCEE simulation; see emcee_chi2_final.py;
### (6)    build SCHECTER best-fit models;
### (7)    PLOT that shit - SMF (CLUSTER v FIELD);
### (7.1)   PLOT that shit - SMF by POPULATION;
#
### PROGRAM END
#
## NOTE: there are flags for diagnostics and plotting throughout the script. search "MAY NEED TO EDIT" to identify where these flags are
#
## NOTE: search "MAY NEED TO EDIT" to find where user-input is required
#
#
###################     PROGRAM START
#
## TIME_FLAG: START
## superior time_flag which supercedes all others and times the entire program
time_flag = 0     # track & print time to execute current section
#
if time_flag == 1:
    start_time = time.time()
#  
# Import modules
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import astropy
from astropy.table import Table
from astropy.table import Column
#from scipy.optimize import curve_fit
#
#
#
# define a function to compute the incremental difference of applying a correction to the raw count lists
def correction_difference(raw_smf,completeness_correction):
    corrected_smf = raw_smf*completeness_correction
    diff = corrected_smf- raw_smf
    return diff
#
#
#
## define a function to compute the mid-points of the bins from a histogram
def midbins(bins):
    size = len(bins)-1
    x_midbins = np.empty([size,1],dtype='float64')
    for x in range(size):
        x_midbins[x] = (bins[x] + bins[(x+1)])/2
    return x_midbins
#
#
#
# this line is specific to jupyter notebook, and allows for Figure editting in a GUI, instead of inline
#%matplotlib qt  
#
#
## MASTER DIAGNOSTIC FLAG: allows you to "turn off" all diagnostics at once (if equals 0), "turn on" all flags (if equal to 1), or set diagnostic flags individually (equal to 2 - search "MAY NEED TO EDIT" to find flags)
diag_flag_master = 2       # 0= all flags turned off;     1= all flags turned on;     2= may turn on flags individually
#
## diagnostic flags
diag_flag_1 = 1            # counting array through initial loop, tracking all object categories
diag_flag_2 = 1            # add spec & phot subsampes together for each cluster, and ensure they equal the total raw count in each mass bin
diag_flag_3 = 1            # limiting mass completeness correction factors
diag_flag_4 = 0            # DEPRECATED: old variational analysis flag
diag_flag_5 = 1            # spec completeness correction factors
#
summary_flag_1 = 1         # initial Summary Table: ensure lists agree w/ "master_data*.py"
summary_flag_2 = 1
#
plot_flag_1 = 1            # spec completeness correction factors
#
## SECTION (1): collect objects above limiting mass by cluster into a single array in order to plot 
## SF_*/Q_*, and track sublists of objects which have spec vs those which only have phot, separately for SF/Q; creates list of samples to be binned & plotted as histogram/scatterplot
#
## Cluster sample
#
SF_list = [[],[],[],[],[],[]]    # create an empty list filled with 6 empty lists - 1 for each cluster
Q_list = [[],[],[],[],[],[]]
SF_phot_list = [[],[],[],[],[],[]]
SF_spec_list = [[],[],[],[],[],[]]
Q_phot_list = [[],[],[],[],[],[]]
Q_spec_list = [[],[],[],[],[],[]]
SF_pos_list = [[],[],[],[],[],[]]
SF_neg_list = [[],[],[],[],[],[]]
Q_pos_list = [[],[],[],[],[],[]]
Q_neg_list = [[],[],[],[],[],[]]
SF_lost = [[],[],[],[],[],[]]
Q_lost = [[],[],[],[],[],[]]
#
#
if 'limiting_mass' in locals():
    pass
else:
    limiting_mass = [7.62,7.63,8.2,7.5,6.75,6.64] # clusters 1,2,3,4,5,6, see IDs below; 
#
SF_pos_lost = np.array([0]*6)        # to track SF/Q false pos/neg objects lost due to their being below the mass limit, by cluster
SF_neg_lost = np.array([0]*6)
Q_pos_lost = np.array([0]*6)
Q_neg_lost = np.array([0]*6)
other_lost = np.array([0]*6)    #objects below limiting mass other than false pos/neg
#
counting_array = np.array([0]*13)
#
# The following loop searches the master catalogue 'master_cat' and separates all objects by 
# cluster. Then, it looks for all objects above the limiting mass for that cluster. It then 
# creates two lists: one for SF and one for Q (e.g. SF_*/Q_*). It further splits these lists into those objects with spectrscopy, and those without (e.g. SF*_spec/phot)
#
for cluster in range(len(limiting_mass)):          # loop through clusters one at a time; "cluster" takes on values [0,1,2,3,4,5]
    for counter in range(len(master_cat)):
        if master_cat['cluster'][counter] == (cluster+1):    # cluster #
            counting_array[0]+=1                                    # all objects in all clusters
            if master_cat['lmass'][counter] > limiting_mass[cluster]:    # limiting mass of cluster: 7.5
                counting_array[1]+=1                                # all objects above limiting mass
                if master_cat['member'][counter] == 0:    # cluster member = 0
                    counting_array[2]+=1                            # all cluster members
                    if master_cat['type'][counter] == 1:   # SF type = 1
                        counting_array[3]+=1                        # SF cluster members
                        SF_list[cluster].append(master_cat['lmass'][counter])               # SF cluster members
                        if master_cat['sub'][counter]==2:     # sub=2 is objects w/ photometry only
                            counting_array[4]+=1                    # SF PHOT cluster members
                            SF_phot_list[cluster].append(master_cat['lmass'][counter])
                        elif master_cat['sub'][counter]==1:  # sub=1 is objects w/ both spec & phot
                            SF_spec_list[cluster].append(master_cat['lmass'][counter])
                            counting_array[5]+=1                    # SF SPEC cluster members
                    elif master_cat['type'][counter] == 2: # Q type = 2
                        Q_list[cluster].append(master_cat['lmass'][counter])               # Q cluster members
                        counting_array[6]+=1                    # Q cluster members
                        if master_cat['sub'][counter]==2:     # sub=2 is objects w/ photometry only
                            Q_phot_list[cluster].append(master_cat['lmass'][counter])
                            counting_array[7]+=1                    # Q PHOT cluster members
                        elif master_cat['sub'][counter]==1:  # sub=1 is objects w/ both spec & phot
                            Q_spec_list[cluster].append(master_cat[counter]['lmass'])
                            counting_array[8]+=1                    # Q SPEC cluster members
                elif master_cat['member'][counter] == 2:  
                    if master_cat['type'][counter] == 1:
                        SF_pos_list[cluster].append(master_cat[counter]['lmass'])
                        counting_array[9]+=1                    # SF pos 
                    elif master_cat['type'][counter] ==2:
                        Q_pos_list[cluster].append(master_cat[counter]['lmass'])
                        counting_array[10]+=1                    # Q pos 
                elif master_cat['member'][counter] == 3:
                    if master_cat['type'][counter] == 1:
                        SF_neg_list[cluster].append(master_cat[counter]['lmass'])
                        counting_array[11]+=1                    # SF neg 
                    elif master_cat['type'][counter] ==2:
                        Q_neg_list[cluster].append(master_cat[counter]['lmass'])
                        counting_array[12]+=1                    # Q neg 
            elif master_cat['lmass'][counter] < limiting_mass[cluster]:
                if master_cat['member'][counter] == 0:    # member = 0
                    if master_cat['type'][counter] == 1:   # SF type = 1
                        SF_lost[cluster].append(master_cat['lmass'][counter])               # SF <lim. mass
                    elif master_cat['type'][counter] == 2: # Q type = 2
                        Q_lost[cluster].append(master_cat['lmass'][counter])               # Q <lim. mass
                elif master_cat['member'][counter] == 2:    # member false pos = 2
                    if master_cat['type'][counter] == 1:   # SF type = 1
                        SF_pos_lost[cluster]+=1
                    elif master_cat['type'][counter] == 2: # Q type = 2
                        Q_pos_lost[cluster]+=1
                elif master_cat['member'][counter] ==3:   # member false neg = 3
                    if master_cat['type'][counter] == 1:   # SF type = 1
                        SF_neg_lost[cluster]+=1
                    elif master_cat['type'][counter] == 2: # Q type = 2
                        Q_neg_lost[cluster]+=1
                else: other_lost[cluster]+=1                                               # catchall for everything else
#
#
if (diag_flag_1 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
    print('\n[all,above_lim_mass,all_cluster_members,SF_members,SF_phot,SF_spec,Q_members,Q_phot,Q_spec,SF_pos,Q_pos,SF_neg,Q_neg]\n%s'%counting_array)
#
## SECTION (1.1) - Summary Table 1: Did we pick up all the false pos/neg as reported in "master_data*.py"?
#
if summary_flag_1 == 1 or adams_flag == 1:
    ## Summarize initial data stats in table
    num_SF_pos = np.sum([len(SF_pos_list[0]),len(SF_pos_list[1]),len(SF_pos_list[2]),len(SF_pos_list[3]),len(SF_pos_list[4]),len(SF_pos_list[5])])
    num_SF_neg = np.sum([len(SF_neg_list[0]),len(SF_neg_list[1]),len(SF_neg_list[2]),len(SF_neg_list[3]),len(SF_neg_list[4]),len(SF_neg_list[5])])
    num_Q_pos = np.sum([len(Q_pos_list[0]),len(Q_pos_list[1]),len(Q_pos_list[2]),len(Q_pos_list[3]),len(Q_pos_list[4]),len(Q_pos_list[5])])
    num_Q_neg = np.sum([len(Q_neg_list[0]),len(Q_neg_list[1]),len(Q_neg_list[2]),len(Q_neg_list[3]),len(Q_neg_list[4]),len(Q_neg_list[5])])
    #
    pos_neg_names = Column(['TOTAL False Pos.','TOTAL False Neg.','SF - False Pos.','SF - False Neg.','SF <lim. mass','Q - False Pos.','Q - False Neg.','Q <lim. mass','SUM'],name='Property')
    col_names = cluster_names
    # SF table
    pos_neg0 = Column([np.sum(pos_spec),np.sum(neg_spec),num_SF_pos,num_SF_neg,np.sum([SF_pos_lost,SF_neg_lost]),num_Q_pos,num_Q_neg,np.sum([Q_pos_lost,Q_neg_lost]),np.sum([num_SF_pos,num_SF_neg,np.sum(SF_pos_lost),np.sum(SF_neg_lost),num_Q_pos,num_Q_neg,np.sum(Q_pos_lost),np.sum(Q_neg_lost)])],name='Total')  # total column
    pos_neg_stats = Table([pos_neg_names,pos_neg0])
    for ii in range(len(mem_spec[0])):
        col = Column([np.sum([pos_spec[0][ii],pos_spec[1][ii]]),np.sum([neg_spec[0][ii],neg_spec[1][ii]]),len(SF_pos_list[ii]),len(SF_neg_list[ii]),np.sum([SF_pos_lost[ii],SF_neg_lost[ii]]),len(Q_pos_list[ii]),len(Q_neg_list[ii]),np.sum([Q_pos_lost[ii],Q_neg_lost[ii]]),np.sum([len(SF_pos_list[ii]),len(SF_neg_list[ii]),SF_pos_lost[ii],SF_neg_lost[ii],len(Q_pos_list[ii]),len(Q_neg_list[ii]),Q_pos_lost[ii],Q_neg_lost[ii]])],name=col_names[ii])
        pos_neg_stats.add_column(col)  # add columns to table one cluster at a time
    #
    #
    ## Now prepare a summary table for the cluster MEMBERS
    #
    num_SF = np.sum([len(SF_list[0]),len(SF_list[1]),len(SF_list[2]),len(SF_list[3]),len(SF_list[4]),len(SF_list[5])])
    num_SF_phot = np.sum([len(SF_phot_list[0]),len(SF_phot_list[1]),len(SF_phot_list[2]),len(SF_phot_list[3]),len(SF_phot_list[4]),len(SF_phot_list[5])])
    num_SF_spec = np.sum([len(SF_spec_list[0]),len(SF_spec_list[1]),len(SF_spec_list[2]),len(SF_spec_list[3]),len(SF_spec_list[4]),len(SF_spec_list[5])])
    num_SF_lost = np.sum([len(SF_lost[0]),len(SF_lost[1]),len(SF_lost[2]),len(SF_lost[3]),len(SF_lost[4]),len(SF_lost[5])])
    num_Q = np.sum([len(Q_list[0]),len(Q_list[1]),len(Q_list[2]),len(Q_list[3]),len(Q_list[4]),len(Q_list[5])])
    num_Q_phot = np.sum([len(Q_phot_list[0]),len(Q_phot_list[1]),len(Q_phot_list[2]),len(Q_phot_list[3]),len(Q_phot_list[4]),len(Q_phot_list[5])])
    num_Q_spec = np.sum([len(Q_spec_list[0]),len(Q_spec_list[1]),len(Q_spec_list[2]),len(Q_spec_list[3]),len(Q_spec_list[4]),len(Q_spec_list[5])])
    num_Q_lost = np.sum([len(Q_lost[0]),len(Q_lost[1]),len(Q_lost[2]),len(Q_lost[3]),len(Q_lost[4]),len(Q_lost[5])])
    #
    member_smf_names = Column(['TOTAL Members (master_data*.py)','Total SF >lim. mass','SF - Phot.','SF - Spec.','SF <lim. mass','Total Q >lim. mass','Q - Phot.','Q - Spec.','Q <lim. mass','SUM'],name='Property')
    col_names = cluster_names
    # SF table
    member_smf0 = Column([np.sum([mem_phot,mem_spec]),num_SF,num_SF_phot,num_SF_spec,num_SF_lost,num_Q,num_Q_phot,num_Q_spec,num_Q_lost,np.sum([num_SF_phot,num_SF_spec,num_Q_phot,num_Q_spec,num_SF_lost,num_Q_lost])],name='Total')  # total column
    member_smf_stats = Table([member_smf_names,member_smf0])
    for ii in range(len(mem_spec[0])):
        col = Column([np.sum([mem_phot[0][ii],mem_phot[1][ii],mem_spec[0][ii],mem_spec[1][ii]]),len(SF_list[ii]),len(SF_phot_list[ii]),len(SF_spec_list[ii]),len(SF_lost[ii]),len(Q_list[ii]),len(Q_phot_list[ii]),len(Q_spec_list[ii]),len(Q_lost[ii]),np.sum([len(SF_phot_list[ii]),len(SF_spec_list[ii]),len(SF_lost[ii]),len(Q_phot_list[ii]),len(Q_spec_list[ii]),len(Q_lost[ii])])],name=col_names[ii])
        member_smf_stats.add_column(col)  # add columns to table one cluster at a time
    #
    print('\nSummary Table 1A: False Pos./Neg.\n%s'%pos_neg_stats)
    print('NOTE: TOTALs reported in first two rows are from Summary Table 4 in "master_data*.py".\n')
    print('\nSummary Table 1B: MEMBERS\n%s'%member_smf_stats)
    #
    #
    ## VISUALIZE the distribution of galaxies lost below limiting mass
    #
    #########
    bins_lost = np.arange(6,8.2,0.2)
    #bins_phot = [-0.5,-0.3,-0.25,-0.2,-0.15,-0.1,-0.05,0.0,0.05,0.10,0.15,0.20,0.25,0.3,0.5,1.0]

    #
    ## collape for plotting purposes
    lost_plot = []
    for ii in range(len(SF_lost)):
        for jj in range(len(SF_lost[ii])):
            lost_plot.append(SF_lost[ii][jj])
        for kk in range(len(Q_lost[ii])):    
            lost_plot.append(Q_lost[ii][kk])
        #
    ####
    #
    ## Visualize del_z distribution
    #
    ## SPEC
    plt.figure()
    n, bins, patches = plt.hist(x=lost_plot,bins=bins_lost,color='deepskyblue',edgecolor='steelblue',alpha=0.7,rwidth=0.95,log=False)
    plt.grid(axis='y', alpha=0.75)
    plt.xlabel('$log(M/M_{\odot})$',fontsize=12)
    plt.ylabel('# count',fontsize=12)
    plt.title("Distribution of galaxies below limiting mass",fontsize=15)
    plt.show()
#
#
#
#
#
## SECTION (1.2): Field sample
## 
#
#
##### IMPORT DATA FROM AM's EMAIL 06/19/20
#
#
#
#
#
## SECTION (2): sort objects into HISTOGRAMS bins for both SF & Q populations, then sum 
## for 'total' population. then normalize each cluster SMF by the total cluster mass. compute midbins. use for total pop plot, and for relative fractions
#
## cluster populations arrays: (SF/Q/total)_smf & (SF/Q/total)_field_smf
#
## MAY NEED TO EDIT  - NOTE: bin_width now set in "main_project_file.py"
## you may change the # of bin points by adjusting the bin width, however IF YOU DO IT HERE, DO THE SAME IN "spec_completeness_binnning.py". search "MAY NEED TO EDIT"
#range2 = [7.3,12.3]     #sets range for all histrograms to be computer: cluster,field,false pos/neg; NOW SET IN "data_mass_completeness_5.py"
#bin_width = 0.2  # in dex
num_points = int((round((range2[1]-range2[0])/bin_width))+1)       # compute # of data points
num_bins = np.linspace(range2[0],range2[1],num_points)
#
## smf histograms for individual clusters
#
SF_raw_smf = [[],[],[],[],[],[]]       # initialize list of lists to store histograms of SMFs
Q_raw_smf = [[],[],[],[],[],[]]
#
for ii in range(len(SF_list)):
    SF_raw, mass_bins = np.histogram(SF_list[ii], bins=num_bins,range=range2)
    Q_raw, mass_bins = np.histogram(Q_list[ii], bins=num_bins,range=range2)
    SF_raw_smf[ii].append(SF_raw)
    Q_raw_smf[ii].append(Q_raw)
#
## convert lists to arrays so we can do math operations on them
SF_raw_smf = np.array(SF_raw_smf)
Q_raw_smf = np.array(Q_raw_smf)
#
total_raw_smf = SF_raw_smf + Q_raw_smf
#
# Display some data for total, SF, Q: 
print('\nSection 2: RAW totals')
print('SF: ',str(np.sum(SF_raw_smf)))
print('Q: ',str(np.sum(Q_raw_smf)))
print('Total: ',str(np.sum(total_raw_smf)),'\n')
#
#
## section (2.1): compute MIDBINS
#
## find midpoint of hist. bins. all populations have been binned identically, so the one 'midbin' will serve for all data arrays to be plotted. for visual clarity when plotting, offset the Q_midpoints by delta_x = 0.05
#
## SORT the spec/phot subsamples into histograms as well, and confirm that spec + phot = total in each mass bin for each type of galaxy
#
# sort spec/phot subsamples into histograms for each cluster
#
SF_phot_smf = [[],[],[],[],[],[]]       # initialize list of lists to store histograms of SMFs
SF_spec_smf = [[],[],[],[],[],[]]
Q_phot_smf = [[],[],[],[],[],[]]       
Q_spec_smf = [[],[],[],[],[],[]]
#
for ii in range(len(SF_spec_list)):
    SF_spec, mass_bins = np.histogram(SF_spec_list[ii], bins=num_bins,range=range2)
    SF_phot, mass_bins = np.histogram(SF_phot_list[ii], bins=num_bins,range=range2)
    Q_spec, mass_bins = np.histogram(Q_spec_list[ii], bins=num_bins,range=range2)
    Q_phot, mass_bins = np.histogram(Q_phot_list[ii], bins=num_bins,range=range2)
    SF_spec_smf[ii].append(SF_spec)
    SF_phot_smf[ii].append(SF_phot)
    Q_spec_smf[ii].append(Q_spec)
    Q_phot_smf[ii].append(Q_phot)
#
## define mass_bins midpoints
SF_midbins = midbins(mass_bins)
Q_midbins = SF_midbins + 0.05
#
## convert lists to arrays so we can do math operations on them
SF_phot_smf = np.array(SF_phot_smf)
SF_spec_smf = np.array(SF_spec_smf)
Q_phot_smf = np.array(Q_phot_smf)
Q_spec_smf = np.array(Q_spec_smf)
#
#
### MAY NEED TO EDIT: diag_flag_2
## DIAGNOSTIC: add spec & phot subsampes together for each cluster, and ensure they equal the total raw count in each mass bin
#
if (diag_flag_2 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
    # compute differences, e.g.: SF_smf1 = SF1_spec_smf + SF1_phot_smf for each mass bin. they should be the same
    SF_diff = np.array([[0]*len(SF_midbins)]*6)     # initialize array to store difference between sample & sub-samples, by cluster
    Q_diff = np.array([[0]*len(Q_midbins)]*6)
    #
    print('Section 2.1: Differences between raw cluster count and (spec + phot) subsamples, by cluster')
    for ii in range(len(SF_raw_smf)):
        SF_diff[ii] = SF_raw_smf[ii] - (SF_phot_smf[ii] + SF_spec_smf[ii])
        Q_diff[ii] = Q_raw_smf[ii] - (Q_phot_smf[ii] + Q_spec_smf[ii])
        print('SF',str(ii+1),' difference: ',str(np.sum(SF_diff[ii])))
        print('Q',str(ii+1),' difference: ',str(np.sum(Q_diff[ii])))
        print('Total difference: %s'%np.sum([SF_diff,Q_diff]),'\n')
#    
#
#
#
#
## SECTION (3): calculate corrections to raw counts. There are two separate corrections - one for limiting mass completeness (i.e. correct for the fact that not all clusters are complete down to our lowest mass bin), and one for spectrscopic completeness (i.e. correct for the fact that there are false pos/neg objects in the sample of galaxies which have both spec & phot, and make correction to photometric sample to account for the ratio of false pos/neg)
#
#
## SECTION (3.1): calculate MASS COMPLETENESS corrections. compute correction to low-mass bin points due to varying mass completenesses of each cluster. The correction factor is: (total # of clusters) / (# of clusters complete at that mass bin), and will be multiplied by the raw number count of galaxies in each mass bin. 
#
## an examination of the limiting mass for each cluster (see list "limiting_mass", above) shows that the following bin midpoints have the following corresponding number of clusters complete at that mass: [7.3,7.5,7.7,7.9,8.1] ---> [1,4,4,5,6]. So all 6 clusters are complete at a mass  of 8.1, but only 1 cluster is complete down to 7.3. the corresponding corrections are as follows:
#
#mass_completeness_correction = np.array([6,1.5,1.5,1.2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1])
#
## if we assume all clusters have the same general composition of SF to Q galaxies, i.e. the same relative fraction in each cluster, then if only one cluster is complete, we "scale it up" by a factor of 6 (# of clusters total / # of clusters complete at that mass). if 4 clusters are complete, we scale it up by (6/4) = 1.5. In this way, it's as if each cluster were complete down to the limiting mass of 7.3
#
## the following loop automates the above explanation, making the code adaptable should i decide to change the number of bins in future
#
mass_completeness_correction = np.zeros_like(SF_midbins)
for ii in range(len(mass_completeness_correction)):
    for jj in range(len(limiting_mass)):
        if limiting_mass[jj] <= mass_bins[ii]:    # count # of clusters complete at each mass bin
            mass_completeness_correction[ii]+=1
mass_completeness_correction = np.transpose(6/mass_completeness_correction)  # the correction factor is: (total # of clusters) / (# of clusters complete at that mass bin); return as a row vector
#
#
# compute how many objects are added to each mass bin as a result of applying the mass_completeness_correction to the *_raw_smf lists. confirm that (# added to SF) + (# added to Q) = (# added to total)
SF_mass_completeness_diff = correction_difference(SF_raw_smf,mass_completeness_correction)
Q_mass_completeness_diff = correction_difference(Q_raw_smf,mass_completeness_correction)
total_mass_completeness_diff = correction_difference(total_raw_smf,mass_completeness_correction)
#
### MAY NEED TO EDIT: diag_flag_3
#
if (diag_flag_3 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
# Display correction factors
    print('/nSection 3.1: Mass completeness correction factors by bin: ',str(mass_completeness_correction),'\n')
    #
    # Display some data for total, SF, Q: 
    print('Section 3.1: Galaxies added due to MASS COMPLETENESS correction')
    print('SF: ',str(np.sum(SF_mass_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(SF_mass_completeness_diff),', or ',str((np.sum(SF_mass_completeness_diff)/np.sum(SF_raw_smf))*100),'%\n')
    print('Q: ',str(np.sum(Q_mass_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(Q_mass_completeness_diff),', or ',str((np.sum(Q_mass_completeness_diff)/np.sum(Q_raw_smf))*100),'%\n')
    print('Total: ',str(np.sum(total_mass_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(total_mass_completeness_diff),', or ',str((np.sum(total_mass_completeness_diff)/np.sum(total_raw_smf))*100),'%\n')
else:
    print('Section 3.1: Galaxies added due to MASS COMPLETENESS correction\nSF: %s'%np.sum(SF_mass_completeness_diff),'\nQ: %s'%np.sum(Q_mass_completeness_diff),'\nTotal: %s'%np.sum(total_mass_completeness_diff))
#
#
#
#
## SECTION (3.2): calculate SPECTROSCOPIC COMPLETENESS correction. basically, look at all the false positives/false negatives, and sort them by type (i.e. SF/Q). then bin them (i.e. make histograms of false pos/neg for each of SF/Q). take their ratio of false pos to false neg, and plot that ratio. it is the correction factor to be applied to the photometric subsample
#
#
SF_pos = []     # track ALL false pos for plotting
SF_neg = []     # track ALL false neg for plotting
Q_pos = []
Q_neg = []
pos_by_cluster = np.array([[0]*6]*2)    #for tracking false pos/neg by cluster; row_1=SF, row_2=Q
neg_by_cluster = np.array([[0]*6]*2)
objects_below_lim_mass = np.array([0]*6)    # for tracking objects below the limiting mass of each cluster
#
## set up lists for plotting false pos/neg ratios
for ii in range(len(SF_pos_list)):
    ## plotting lists
    for jj in range(len(SF_pos_list[ii])):
        SF_pos.append(SF_pos_list[ii][jj])
    for jj in range(len(SF_neg_list[ii])):
        SF_neg.append(SF_neg_list[ii][jj])
    for jj in range(len(Q_pos_list[ii])):
        Q_pos.append(Q_pos_list[ii][jj])
    for jj in range(len(Q_neg_list[ii])):
        Q_neg.append(Q_neg_list[ii][jj])
    ## counting lists
    pos_by_cluster[0][ii] = len(SF_pos_list[ii])
    pos_by_cluster[1][ii] = len(Q_pos_list[ii])
    neg_by_cluster[0][ii] = len(SF_neg_list[ii])
    neg_by_cluster[1][ii] = len(Q_neg_list[ii])
    objects_below_lim_mass[ii] = np.sum([SF_pos_lost[ii],SF_neg_lost[ii],Q_pos_lost[ii],Q_neg_lost[ii]])
#
#
## Set up false pos/neg histograms
#
if (diag_flag_4 == 1 and diag_flag_master == 2) or diag_flag_master == 1:   # SYMMETRIC BINNING
    #
    ## this section - the variational analysis testing different binning methods for a varying number of bins - has been broken out into its own program, called "spec_completeness_binning.py". it is called by the "master_data_*" program when the appropriate diagnostic flag (variational_anaylsis_master_flag or project_master_variational_flag) is turned on. the result of that analysis are presented below the 'else' statement (i.e. the bin numbers chosen based on the variational analysis).
    pass
#
else:
###### The following few lines are for if you want to choose the # of bins independent of the criteria described above; once a final decision is made on the bin edges (i.e. bin widths) to use for making the final histogram of false pos/neg, enter then in arrays 'num_bins*' below, and set diag_flag_f = 0; lines to uncomment marked by 5 hashtags #####
###### recall: range2 = [7.3,12.3]
    #num_bins_SF_pos_neg = [7.3,8.89,9.6,10.11,12.3]      # these are now set in "main_project_file.py"
    #num_bins_Q_pos_neg = [7.3,8.89,9.6,10.11,12.3]
    #
    #
    SF_pos_hist, bins_SF = np.histogram(SF_pos, bins=num_bins_SF_pos_neg, range=range2)
    SF_neg_hist, bins_SF = np.histogram(SF_neg, bins=num_bins_SF_pos_neg, range=range2)
    Q_pos_hist, bins_Q = np.histogram(Q_pos, bins=num_bins_Q_pos_neg, range=range2)
    Q_neg_hist, bins_Q = np.histogram(Q_neg, bins=num_bins_Q_pos_neg, range=range2)
    #
    #####print('SF: %s'%bins_SF)
    #####print('Q: %s'%bins_Q)
#
### MAY NEED TO EDIT: diag_flag_5; continues below
# display diagnostics
#
if (diag_flag_5 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
    # sum into total list, to compare with totals reported in "spec_stats1" table from "master_data_*.py"
#    total_pos_hist = SF_pos_hist + Q_pos_hist     
#    total_neg_hist = SF_neg_hist + Q_neg_hist
    # print
    print('Section 3.2: spec.completeness correction:\n\nThe data preparation file reports:')
    print('# of SF false pos: ',str(np.sum(pos_spec[0])))
    print('# of SF false neg: ',str(np.sum(neg_spec[0])))
    print('# of Q false pos: ',str(np.sum(pos_spec[1])))
    print('# of Q false neg: ',str(np.sum(neg_spec[1])))
    print('This SMF preparation file finds:')
    print('# of SF false pos: ',str(np.sum(SF_pos_hist)))
    print('# of SF false neg: ',str(np.sum(SF_neg_hist)))
    print('# of Q false pos: ',str(np.sum(Q_pos_hist)))
    print('# of Q false neg: ',str(np.sum(Q_neg_hist)))
    print('\nFalse pos/neg lost below limiting mass: ')
    print('SF false pos.: ',str(np.sum(SF_pos_lost)))
    print('SF false neg.: ',str(np.sum(SF_neg_lost)))
    print('Q false pos.: ',str(np.sum(Q_pos_lost)))
    print('Q false neg.: ',str(np.sum(Q_neg_lost)),'\n')
    print('# objects lost below limiting mass (by cluster): ',objects_below_lim_mass)
#
## compute false pos/ false neg ratio; there is a diagnostic built in for error handling - since we require that all mass bins be populated by at least one false pos. and one false neg. (so that we may compute their ratio), the program BREAKS when an empty mass bin is encountered, and you are prompted to try a new number of bins 
SF_frac = np.empty_like(SF_pos_hist, dtype='float32')
Q_frac = np.empty_like(Q_pos_hist, dtype='float32')
# compute fractions for SF, exiting loop if a bin value of zero is encountered; somewhat deprecated given the loop added above, which tests different # of bins for SF/Q. 
for ii in range(len(SF_pos_hist)):               
    if SF_pos_hist[ii] == 0 and SF_neg_hist[ii] == 0:
#        print('Zero false pos. AND zero false neg in bin',str(ii+1))
#        print('Adjust the value of "num_binsSF"\n')
#        break
        SF_frac[ii] = 1
        pass
    elif SF_pos_hist[ii] == 0:
        print('Zero false pos. in bin',str(ii+1),' for %s bins'%len(bins_SF),'\n')
        print('Adjust the value of "num_binsSF"\n')
        break
    elif SF_neg_hist[ii] == 0:
        print('Zero false neg. in bin',str(ii+1),' for %s bins'%len(bins_SF),'\n')
        print('Adjust the value of "num_binsSF"\n')
        break
    else:
        SF_frac[ii] = SF_pos_hist[ii] / SF_neg_hist[ii]
# compute fractions for Q, exiting loop if a bin value of zero is encountered 
for ii in range(len(Q_pos_hist)):               
    if Q_pos_hist[ii] == 0 and Q_neg_hist[ii] == 0:
#        print('Zero false pos. AND zero false neg in bin',str(ii+1))
#        print('Adjust the value of "num_binsQ"\n')
#        break
        Q_frac[ii] = 1
        pass
    elif Q_pos_hist[ii] == 0:
        print('Zero false pos. in bin',str(ii+1),' for %s bins'%len(bins_Q),'\n')
        print('Adjust the value of "num_binsQ"\n')
        break
    elif Q_neg_hist[ii] == 0:
        print('Zero false neg. in bin',str(ii+1),' for %s bins'%len(bins_Q),'\n')
        print('Adjust the value of "num_binsQ"\n')
        break
    else:
        Q_frac[ii] = Q_pos_hist[ii] / Q_neg_hist[ii]        
#
if (diag_flag_5 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
    print('False pos/neg ratios\nSF: %s'%SF_frac,'\nQ: %s'%Q_frac)
# compute midbins for spec. mass completeness plot (i.e. plot of false pos/false neg ratios)
SF_frac_midbins = midbins(bins_SF)
Q_frac_midbins = midbins(bins_Q)
#
#
## now compute the errors for the spec. completeness plot, which is simply sqrt(N) since the spectroscopic uncertainty is Poissonian in nature. do so by computing the relative error for the false pos & false neg histograms for each of SF/Q, and then sum in quadrature to determine relative error of fractions
#              
SF_relerr_pos = (np.sqrt(SF_pos_hist))/SF_pos_hist
SF_relerr_neg = (np.sqrt(SF_neg_hist))/SF_neg_hist
Q_relerr_pos = (np.sqrt(Q_pos_hist))/Q_pos_hist
Q_relerr_neg = (np.sqrt(Q_neg_hist))/Q_neg_hist
#              
SF_frac_err = np.sqrt((SF_relerr_pos**2) + (SF_relerr_neg**2))*SF_frac              
Q_frac_err = np.sqrt((Q_relerr_pos**2) + (Q_relerr_neg**2))*Q_frac              
#              
#    
#
## FIGURE ##
#
if plot_flag_1 == 1:
    # plot Spectroscopic completion correction factors 
    plt.close()
    MC = plt.figure(num=2)
    #MC.suptitle('Spectroscopic Completeness Correction Factors')
    plt.errorbar(SF_frac_midbins,SF_frac,yerr=SF_frac_err, fmt='ob',lolims=False, uplims=False, linewidth=0.0, elinewidth=0.8, mfc='none')
    plt.errorbar(Q_frac_midbins,Q_frac,yerr=Q_frac_err, fmt='or',lolims=False, uplims=False, linewidth=0.0, elinewidth=0.8, mfc='none')
    plt.plot(SF_frac_midbins,SF_frac,'-b', linewidth=1.0, label='Star-forming')
    plt.plot(Q_frac_midbins,Q_frac,'-r', linewidth=1.0, label='Quiescent')
    plt.plot([0,13],[1,1],'--k',linewidth = 0.5)
    plt.legend(loc='upper right', frameon=False)
    plt.xlim=(7.1,12.5)
    plt.xlabel('$log(M/M_{\odot})$')
    plt.ylim=(-0.5,4.1)
    plt.ylabel('Correction factor\n(false pos / false neg)')
    plt.tick_params(axis='both', which='both',direction='in',color='k',top='on',right='on',labelright='on', labelleft='on')
    plt.minorticks_on()
    plt.grid(b=True, which='major', axis='both', color = 'k', linestyle = ':')
    MC.xlim=((range2[0]-0.1),(range2[1]+0.1))
    #
## Now interpolate/extrapolate between these data points
#
# initialize arrays to store slopes/intercepts for extrapolation/interpolation of spec mass completeness correction factors
m_SF = np.zeros((len(SF_frac_midbins)-1))     
b_SF = np.zeros((len(SF_frac_midbins)-1))
m_Q = np.zeros((len(Q_frac_midbins)-1))     
b_Q = np.zeros((len(Q_frac_midbins)-1))
SF_spec_completeness_correction = np.zeros_like(SF_midbins,dtype='float32')
Q_spec_completeness_correction = np.zeros_like(Q_midbins,dtype='float32')
#
## SF
for ii in range(len(SF_frac_midbins)-1):
    m_SF[ii] = (SF_frac[ii+1] - SF_frac[ii]) / (SF_frac_midbins[ii+1] - SF_frac_midbins[ii]) # calc slope
    b_SF[ii] = SF_frac[ii] - (SF_frac_midbins[ii]*m_SF[ii])   # calc intercept
#
for ii in range(len(SF_midbins)):
    if SF_spec_completeness_correction[ii] == 0:     # don't overwrite cell once correction factor is computed
        if SF_midbins[ii] < SF_frac_midbins[0]:      # extrapolate below lowest mass bin
            SF_spec_completeness_correction[ii] = m_SF[0]*SF_midbins[ii] + b_SF[0]    
        elif SF_midbins[ii] > SF_frac_midbins[-1]:    # extrapolate above highest mass bin
            SF_spec_completeness_correction[ii] = m_SF[-1]*SF_midbins[ii] + b_SF[-1]    
        elif SF_midbins[ii] > SF_frac_midbins[0] and SF_midbins[ii] < SF_frac_midbins[-1]:    # interpolate in between all other points
            for jj in range(len(SF_frac_midbins)-1):
                if SF_midbins[ii] > SF_frac_midbins[jj] and SF_midbins[ii] < SF_frac_midbins[jj+1]:
                    SF_spec_completeness_correction[ii] = m_SF[jj]*SF_midbins[ii] + b_SF[jj]
        else:
            print('Error in SF spec completeness correction computation. ABORT')
            break   
#

## Q
for ii in range(len(Q_frac_midbins)-1):
    m_Q[ii] = (Q_frac[ii+1] - Q_frac[ii]) / (Q_frac_midbins[ii+1] - Q_frac_midbins[ii]) # calc slope
    b_Q[ii] = Q_frac[ii] - (Q_frac_midbins[ii]*m_Q[ii])   # calc intercept
#
for ii in range(len(Q_midbins)):
    if Q_spec_completeness_correction[ii] == 0:     # don't overwrite cell once correction factor is computed
        if Q_midbins[ii] < Q_frac_midbins[0]:      # extrapolate below lowest mass bin
            Q_spec_completeness_correction[ii] = m_Q[0]*Q_midbins[ii] + b_Q[0]    
        elif Q_midbins[ii] > Q_frac_midbins[-1]:    # extrapolate above highest mass bin
            Q_spec_completeness_correction[ii] = m_Q[-1]*Q_midbins[ii] + b_Q[-1]    
        elif Q_midbins[ii] > Q_frac_midbins[0] and Q_midbins[ii] < Q_frac_midbins[-1]:    # interpolate in between all other points
            for jj in range(len(Q_frac_midbins)-1):
                if Q_midbins[ii] > Q_frac_midbins[jj] and Q_midbins[ii] < Q_frac_midbins[jj+1]:
                    Q_spec_completeness_correction[ii] = m_Q[jj]*Q_midbins[ii] + b_Q[jj]
        else:
            print('Error in Q spec completeness correction computation. ABORT')
            break   
#    
if plot_flag_1 == 1:                       # plot interpolated/extrapolated points on top of computed correction fractions
    plt.scatter(SF_midbins,SF_spec_completeness_correction,c='b', marker='+', linewidths = 0)
    plt.scatter(Q_midbins,Q_spec_completeness_correction,c='r', marker='x', linewidths = 0)
    MC.xlim=(7.25,12.5)
#
# apply correction; NOTE: need to divide raw_SMF by the spec_completeness_correction, not multiply, hence taking the inverse
SF_spec_completeness_correction = (1/SF_spec_completeness_correction)   
Q_spec_completeness_correction = (1/Q_spec_completeness_correction)
# compute how many objects are added to each mass bin as a result of applying the spec_completeness_correction to the *_raw_smf lists. confirm that (# added to SF) + (# added to Q) = (# added to total)
SF_spec_completeness_diff = correction_difference(SF_phot_smf,np.transpose(SF_spec_completeness_correction))  
Q_spec_completeness_diff = correction_difference(Q_phot_smf,np.transpose(Q_spec_completeness_correction))
total_spec_completeness_diff = SF_spec_completeness_diff + Q_spec_completeness_diff
#
#
if (diag_flag_5 == 1 and diag_flag_master == 2) or diag_flag_master == 1:
    # Display correction factors
    print('\nSection 3.2: Spectroscopic completeness correction factors by bin (multiplicative): ')
    print('SF: ',str(np.transpose(SF_spec_completeness_correction)))
    print('Q: ',str(np.transpose(Q_spec_completeness_correction)),'\n')
    # Display some data for total, SF, Q: 
    print('Galaxies added due to SPECTROSCOPIC COMPLETENESS correction')
    print('SF: ',str(np.sum(SF_spec_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(SF_spec_completeness_diff),'   or ',str((np.sum(SF_spec_completeness_diff)/np.sum(SF_raw_smf))*100),'%.\n')
    print('Q: ',str(np.sum(Q_spec_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(Q_spec_completeness_diff),'   or ',str((np.sum(Q_spec_completeness_diff)/np.sum(Q_raw_smf))*100),'%.\n')
    print('Total: ',str(np.sum(total_spec_completeness_diff,axis=0)),'\nTotal: %s'%np.sum(total_spec_completeness_diff),'   or ',str((np.sum(total_spec_completeness_diff)/np.sum(total_raw_smf))*100),'%.\n')
else:
    print('\nSection 3.2: Galaxies added due to SPEC COMPLETENESS correction\nSF: %s'%np.sum(SF_spec_completeness_diff),'\nQ: %s'%np.sum(Q_spec_completeness_diff))
    #
#
#
#


[all,above_lim_mass,all_cluster_members,SF_members,SF_phot,SF_spec,Q_members,Q_phot,Q_spec,SF_pos,Q_pos,SF_neg,Q_neg]
[44465 36365  3449  1142  1075    67  2307  1739   568    42    73    18
   147]

Summary Table 1A: False Pos./Neg.
    Property     Total M0416 M1149 M0717 A370 A1063 A2744
---------------- ----- ----- ----- ----- ---- ----- -----
TOTAL False Pos.   115    19    19    39   10     8    20
TOTAL False Neg.   169    32    24    11   17    31    54
 SF - False Pos.    42    10     9     9    6     2     6
 SF - False Neg.    18     4     2     2    3     3     4
   SF <lim. mass     4     0     0     0    0     1     3
  Q - False Pos.    73     9    10    30    4     6    14
  Q - False Neg.   147    28    22     9   14    27    47
    Q <lim. mass     0     0     0     0    0     0     0
             SUM   284    51    43    50   27    39    74
NOTE: TOTALs reported in first two rows are from Summary Table 4 in "master_data*.py".


Summary Table 1B: MEMBERS
            

In [381]:
print(mass_completeness_correction)

[[inf 3.  1.5 1.2 1.2 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1. ]]


In [382]:
print(limiting_mass)

[7.14 7.58 7.82 7.26 7.34 7.07]


In [343]:
d

Unnamed: 0,z_spec_cutoff,z_phot_cutoff,type,[limiting_mass],M.o.M.,TOTAL_M.o.M.,cluster_members,false pos.,false neg.,binning_method,#_of_bins,[bin_edges]
0,0.05,0.09,Q,[7.56 8.02 7.89 7.43 7.58 7.26],0.54761,,3842,11,45,2,5,[ 7.26 8.48 8.76 10.15 10.75 12.3 ]


In [314]:
limiting_mass_flag

2

In [189]:
df_result.columns

Index(['z_spec_cutoff', 'z_phot_cutoff', 'type', '[limiting_mass]', 'M.o.M.',
       'cluster_members', 'false pos.', 'false neg.', 'binning_method',
       '#_of_bins', '[bin_edges]'],
      dtype='object')