# Explore PZ Cosmos 2020 catalog

- author : Sylvie Dagoret-Campagne
- affiliation : IJCLab
- creation date : 2022-04-20


See https://cosmos.astro.caltech.edu/

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%matplotlib inline
import os
import numpy as np
from matplotlib.ticker import FormatStrFormatter
import matplotlib.pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline, interp1d
from scipy.special import erfc
import pandas as pd
import seaborn as sns 
import itertools
sns.set_style("white")
sns.set_context("notebook", font_scale=1.0, rc={"lines.linewidth": 2.5})
sns.set_palette(sns.color_palette(["#9b59b6", "#95a5a6", 
                                   "#e74c3c", "#3498db", 
                                   "#34495e", "#2ecc71"]))

In [3]:
import sys
print(sys.executable)
print(sys.version)
#print(sys.version_info)

/sps/lsst/groups/auxtel/softs/utils/anaconda3/bin/python
3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0]


In [4]:
from astropy.io import fits
import pandas

In [5]:
bad_path = ['/opt/conda/lib/python3.8/site-packages']

In [6]:
sys.path.remove(bad_path[0])

In [7]:
from astropy.io import fits
from astropy.table import Table

In [8]:
# Set up some plotting defaults:

params = {'axes.labelsize': 28,
          'font.size': 24,
          'legend.fontsize': 14,
          'xtick.major.width': 3,
          'xtick.minor.width': 2,
          'xtick.major.size': 12,
          'xtick.minor.size': 6,
          'xtick.direction': 'in',
          'xtick.top': True,
          'lines.linewidth': 3,
          'axes.linewidth': 3,
          'axes.labelweight': 3,
          'axes.titleweight': 3,
          'ytick.major.width': 3,
          'ytick.minor.width': 2,
          'ytick.major.size': 12,
          'ytick.minor.size': 6,
          'ytick.direction': 'in',
          'ytick.left': True,
          'figure.figsize': [18, 10],
          'figure.facecolor': 'White'
          }

plt.rcParams.update(params)

# Input

In [9]:
path_cosmos = "../data/cosmos2020/COSMOS2020_R1"
path_cosmos_pz = "../data/cosmos2020/COSMOS2020_R1/PZ"

In [10]:
ls ../data/cosmos2020/COSMOS2020_R1/PZ

ls: unparsable value for LS_COLORS environment variable
COSMOS2020_CLASSIC_R1_v2.0_EAZY_CZ.fits
COSMOS2020_CLASSIC_R1_v2.0_LEPHARE_PZ.fits
COSMOS2020_FARMER_R1_v2.0_EAZY_CZ.fits
COSMOS2020_FARMER_R1_v2.0_LEPHARE_PZ.fits
eazy_zcdf_pdf.txt
PZ_README


In [11]:
! cat ../data/cosmos2020/COSMOS2020_R1/README

FILES
------
COSMOS2020_submitted.pdf -- latest version of the paper (includes some bolded changes!)

COSMOS2020_CLASSIC_R1_v2.0.fits -- release version for Classic
COSMOS2020_CLASSIC_R1_v2.0.header -- corresponding header for release version

COSMOS2020_FARMER_R1_v2.0.fits -- release version for Farmer
COSMOS2020_FARMER_R1_v2.0.header -- corresponding header for release version

COSMOS2020_prepare_apertures.txt -- Galactic extinction + aper->total corrections (python)

MASKS/
 - MASK_*.reg -- DS9 region files for HSC, SUPCAM, UVISTA, UDEEP
 - flags_in_catalog.png -- easy to read summary of regions
 - MASKS_README.txt -- readme file for regions

PZ/
 - COSMOS2020_CLASSIC_R1_v2.0_EAZY_CZ.fits -- EAZY CDF(z) for Classic
 - COSMOS2020_CLASSIC_R1_v2.0_LEPHARE_PZ.fits -- Le Phare P(z) for Classic

 - COSMOS2020_FARMER_R1_v2.0_EAZY_CZ.fits -- EAZY CDF(z) for Farmer
 - COSMOS2020_FARMER_R1_v2.0_LEPHARE_PZ.fits -- Le Phare P(z) for Farmer

 - eazy_zcdf_pdf.txt -- script for converting EAZY CDF

In [12]:
! cat ../data/cosmos2020/COSMOS2020_R1/PZ/PZ_README

For each source, for both catalogues we provide the probability redshift distributions or p(z). These are stored as fits files. 

The p(z) results from lephare are recorded as the likelihood at a given z spanning a baseline from z=0 to 12 sampling in 500 points of equal-z.  

Slightly differently, the p(z) results from eazy are stored instead as 50 samplings of the cumulative redshift probability distribution or cdf(z), equally spaced according to multiples of the standard deviation of a Gaussian distribution. As such, the p(z) can be easily reconstructed from the relatively more compact cdf(z) data without a significant loss of precision. Alternatively, users may find it advantageous to simply assess the probability of a source being in a certain redshift range by taking the difference in the cdf(z) at two z points, equivalent to integrating the p(z) but with much less computational effort. A script is provided to help users access and use this format.

Details of the array formatting

In [13]:
inputfile_pz = os.path.join(path_cosmos_pz,"COSMOS2020_CLASSIC_R1_v2.0_LEPHARE_PZ.fits") 

In [18]:
hdu = fits.open(inputfile_pz)

In [19]:
hdu.info()

Filename: ../data/cosmos2020/COSMOS2020_R1/PZ/COSMOS2020_CLASSIC_R1_v2.0_LEPHARE_PZ.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       8   (1002, 1720701)   float32   


In [22]:
hdu[0].header

SIMPLE  =                    T / conforms to FITS standard                      
BITPIX  =                  -32 / array data type                                
NAXIS   =                    2 / number of array dimensions                     
NAXIS1  =                 1002                                                  
NAXIS2  =              1720701                                                  
EXTEND  =                    T                                                  
COMMENT First column contains object ID                                         
COMMENT First row contains corresponding redshift for p(z)                      

In [21]:
hdu[0].data.shape

(1720701, 1002)

In [14]:
#t = Table.read(inputfile_pz , hdu=1)  
#t.columns

In [16]:
with fits.open(inputfile_pz) as data:
    df = pandas.DataFrame(data[0].data)

In [17]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,992,993,994,995,996,997,998,999,1000,1001
0,,0.0,0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,...,9.91,9.92,9.93,9.94,9.95,9.96,9.97,9.98,9.99,10.0
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7e-06,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3.0,31.419884,13.482229,5.323507,2.120052,0.834505,0.325313,0.109204,0.037099,0.015821,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4.0,0.0,0.000187,0.000177,0.000169,0.000159,0.000137,9.6e-05,4.8e-05,3.8e-05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
df.dropna(axis=1, how='all',inplace=True)

In [None]:
df.head()

In [None]:
len(df)

In [None]:
#for col in df.columns:
#    print(col)

In [None]:
plt.hist(df["ez_z_phot"].values,bins=100);