# Reading catalog - LSST DP0 from Rubin Science Plataform

Uses the Table Access Protocol. 
<a href="http://www.ivoa.net/documents/TAP">Doc for TAP

In [44]:
import time
import numpy as np
import matplotlib.pyplot as plt
import pandas

from lsst.rsp import get_tap_service

In [20]:
pandas.set_option('display.max_rows', 20)

In [21]:
service = get_tap_service()

assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

## Diving in the schemas to see what we have

In [22]:
query = "SELECT * FROM tap_schema.schemas"
results = service.search(query).to_table()

results

description,schema_index,schema_name,utype
str512,int32,str64,str512
Data Preview 0.1 includes five tables based on the DESC's Data Challenge 2 simulation of 300 square degrees of the wide-fast-deep LSST survey region after 5 years. All tables contain objects detected in coadded images.,2,dp01_dc2_catalogs,
"Data Preview 0.2 contains the image and catalog products of the Rubin Science Pipelines v23 processing of the DESC Data Challenge 2 simulation, which covered 300 square degrees of the wide-fast-deep LSST survey region over 5 years.",0,dp02_dc2_catalogs,
ObsCore v1.1 attributes in ObsTAP realization,1,ivoa,
A TAP-standard-mandated schema to describe tablesets in a TAP 1.1 service,100000,tap_schema,
UWS Metadata,120000,uws,


In [5]:
query = "SELECT * FROM tap_schema.tables WHERE tap_schema.tables.schema_name = 'dp02_dc2_catalogs' order by table_index ASC"

results = service.search(query).to_table()
results

description,schema_name,table_index,table_name,table_type,utype
str512,str512,int32,str64,str8,str512
Properties of the astronomical objects detected and measured on the deep coadded images.,dp02_dc2_catalogs,1,dp02_dc2_catalogs.Object,table,
"Properties of detections on the single-epoch visit images, performed independently of the Object detections on coadded images.",dp02_dc2_catalogs,2,dp02_dc2_catalogs.Source,table,
"Forced-photometry measurements on individual single-epoch visit images and difference images, based on and linked to the entries in the Object table. Point-source PSF photometry is performed, based on coordinates from a reference band chosen for each Object and reported in the Object.refBand column.",dp02_dc2_catalogs,3,dp02_dc2_catalogs.ForcedSource,table,
Properties of time-varying astronomical objects based on association of data from one or more spatially-related DiaSource detections on individual single-epoch difference images.,dp02_dc2_catalogs,4,dp02_dc2_catalogs.DiaObject,table,
Properties of transient-object detections on the single-epoch difference images.,dp02_dc2_catalogs,5,dp02_dc2_catalogs.DiaSource,table,
"Metadata about the pointings of the DC2 simulated survey, largely associated with the boresight of the entire focal plane.",dp02_dc2_catalogs,7,dp02_dc2_catalogs.Visit,table,
Metadata about the 189 individual CCD images for each Visit in the DC2 simulated survey.,dp02_dc2_catalogs,8,dp02_dc2_catalogs.CcdVisit,table,
Static information about the subset of tracts and patches from the standard LSST skymap that apply to coadds in these catalogs,dp02_dc2_catalogs,9,dp02_dc2_catalogs.CoaddPatches,table,


## Get columns for DP0 Objects

In [6]:
results = service.search("SELECT column_name, datatype, description, unit from TAP_SCHEMA.columns WHERE table_name = 'dp02_dc2_catalogs.Object'")
results.to_table().to_pandas()

Unnamed: 0,column_name,datatype,description,unit
0,coord_dec,double,Fiducial ICRS Declination of centroid used for...,deg
1,coord_ra,double,Fiducial ICRS Right Ascension of centroid used...,deg
2,deblend_nChild,int,Number of children this object has (defaults t...,
3,deblend_skipped,boolean,Deblender skipped this source,
4,detect_fromBlend,boolean,This source is deblended from a parent with mo...,
...,...,...,...,...
986,z_psfFlux_flag_apCorr,boolean,Set if unable to aperture correct base_PsfFlux...,
987,z_psfFlux_flag_edge,boolean,Object was too close to the edge of the image ...,
988,z_psfFlux_flag_noGoodPixels,boolean,Not enough non-rejected pixels in data to atte...,
989,z_psfFluxErr,double,Flux uncertainty derived from linear least-squ...,nJy


## Get bands for objects - circle and limiting by r band

In [7]:
max_rec = 10
use_center_coords = "62, -37"
use_radius = "1.0"

In [30]:
bands = ['g', 'i', 'r', 'u', 'y', 'z']

mags = ""
for band in bands:
    mags+= f"scisql_nanojanskyToAbMag({band}_cModelFlux) AS mag_{band}_cModel, {band}_cModelFluxErr, "

columns_query = f"objectId, coord_ra, coord_dec, {mags}detect_isPrimary, r_extendedness "

r_extendedness = 1 for extended sources, 0 for point sources. Measured on r-band. So for galaxies set 1 and stars 0

In [31]:
query = "SELECT " + columns_query + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 "
print(query)

SELECT objectId, coord_ra, coord_dec, scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, g_cModelFluxErr, scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel, i_cModelFluxErr, scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, r_cModelFluxErr, scisql_nanojanskyToAbMag(u_cModelFlux) AS mag_u_cModel, u_cModelFluxErr, scisql_nanojanskyToAbMag(y_cModelFlux) AS mag_y_cModel, y_cModelFluxErr, scisql_nanojanskyToAbMag(z_cModelFlux) AS mag_z_cModel, z_cModelFluxErr, detect_isPrimary, r_extendedness FROM dp02_dc2_catalogs.Object WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', 62, -37, 1.0)) = 1 AND detect_isPrimary = 1 AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 


### Sync

memory_usage = deep get memory usage iuncluding object datatype

In [32]:
%%time
results = service.search(query, maxrec=max_rec).to_table().to_pandas()
results.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 17 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   objectId          10 non-null     int64  
 1   coord_ra          10 non-null     float64
 2   coord_dec         10 non-null     float64
 3   mag_g_cModel      10 non-null     float64
 4   g_cModelFluxErr   10 non-null     float64
 5   mag_i_cModel      10 non-null     float64
 6   i_cModelFluxErr   10 non-null     float64
 7   mag_r_cModel      10 non-null     float64
 8   r_cModelFluxErr   10 non-null     float64
 9   mag_u_cModel      10 non-null     float64
 10  u_cModelFluxErr   10 non-null     float64
 11  mag_y_cModel      10 non-null     float64
 12  y_cModelFluxErr   10 non-null     float64
 13  mag_z_cModel      10 non-null     float64
 14  z_cModelFluxErr   10 non-null     float64
 15  detect_isPrimary  10 non-null     bool   
 16  r_extendedness    10 non-null     float64
dtype

In [33]:
results

Unnamed: 0,objectId,coord_ra,coord_dec,mag_g_cModel,g_cModelFluxErr,mag_i_cModel,i_cModelFluxErr,mag_r_cModel,r_cModelFluxErr,mag_u_cModel,u_cModelFluxErr,mag_y_cModel,y_cModelFluxErr,mag_z_cModel,z_cModelFluxErr,detect_isPrimary,r_extendedness
0,1567798028092359809,61.290253,-37.823108,21.925642,28.080038,21.595578,59.048393,21.900913,30.657338,21.846707,89.020718,21.006309,306.253558,21.203053,164.601396,True,1.0
1,1567929969487672063,61.113473,-37.588913,21.646285,16.188883,21.353752,27.827194,21.416171,15.749252,22.337348,47.119185,21.40412,159.674967,21.346525,78.693758,True,1.0
2,1567929969487672056,61.067928,-37.588971,23.709952,21.024943,22.374693,47.538013,22.767477,24.407099,25.043426,64.425174,21.931886,235.726572,22.158206,127.810631,True,1.0
3,1567929969487672098,61.124363,-37.589575,22.429075,17.268169,21.79028,35.558223,22.332013,18.185516,22.315114,58.418107,21.368171,188.847028,21.495673,102.538001,True,1.0
4,1567929969487672099,61.123258,-37.589019,23.759419,25.970056,21.459094,59.634345,22.567717,29.900368,24.229074,79.491818,20.71586,299.483143,20.925544,164.407268,True,1.0
5,1567929969487670941,61.003225,-37.599231,22.341257,26.989035,20.172821,66.013976,20.771771,35.306884,24.382849,85.396563,19.595966,317.409938,19.844229,172.94971,True,1.0
6,1567929969487670965,61.09358,-37.595068,24.10044,16.377647,22.010373,37.310907,22.776702,18.344718,25.906351,51.197285,21.648942,198.127016,21.804808,102.945623,True,1.0
7,1567929969487670966,61.094031,-37.594012,23.840993,16.393433,22.128398,36.527867,22.596023,18.217375,24.99767,50.512568,21.88407,200.252311,21.96472,103.782157,True,1.0
8,1567929969487671127,61.082493,-37.59792,22.318689,20.053189,21.346547,42.961256,21.846055,21.860738,22.372845,66.252413,20.972292,217.435919,21.11757,114.480863,True,1.0
9,1567929969487671128,61.081724,-37.597762,23.915242,20.950325,21.281788,49.518369,22.369007,24.485426,26.831157,69.940322,20.610459,251.145152,20.869247,132.322595,True,1.0


### Async

In [26]:
job = service.submit_job(query)

In [27]:
job.run()

<pyvo.dal.tap.AsyncTAPJob at 0x7f9ed77ccc70>

In [28]:
%%time
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)

Job phase is ERROR
CPU times: user 11.6 ms, sys: 775 µs, total: 12.4 ms
Wall time: 44.8 ms


In [15]:
%%time
results = job.fetch_result().to_table().to_pandas()
results.info(memory_usage="deep")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67373 entries, 0 to 67372
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   objectId          67373 non-null  int64  
 1   coord_ra          67373 non-null  float64
 2   coord_dec         67373 non-null  float64
 3   mag_g_cModel      67317 non-null  float64
 4   g_cModelFluxErr   67357 non-null  float64
 5   mag_i_cModel      67343 non-null  float64
 6   i_cModelFluxErr   67364 non-null  float64
 7   mag_r_cModel      67373 non-null  float64
 8   r_cModelFluxErr   67373 non-null  float64
 9   mag_u_cModel      66780 non-null  float64
 10  u_cModelFluxErr   67371 non-null  float64
 11  mag_y_cModel      67296 non-null  float64
 12  y_cModelFluxErr   67371 non-null  float64
 13  mag_z_cModel      67296 non-null  float64
 14  z_cModelFluxErr   67356 non-null  float64
 15  detect_isPrimary  67373 non-null  bool   
dtypes: bool(1), float64(14), int64(1)
memory

In [45]:
results.query("mag_g_cModel == np.nan")

UndefinedVariableError: name 'np' is not defined

In [23]:
help(results)

Help on Table in module astropy.table.table object:

class Table(builtins.object)
 |  Table(data=None, masked=False, names=None, dtype=None, meta=None, copy=True, rows=None, copy_indices=True, units=None, descriptions=None, **kwargs)
 |  
 |  A class to represent tables of heterogeneous data.
 |  
 |  `~astropy.table.Table` provides a class for heterogeneous tabular data.
 |  A key enhancement provided by the `~astropy.table.Table` class over
 |  e.g. a `numpy` structured array is the ability to easily modify the
 |  structure of the table by adding or removing columns, or adding new
 |  rows of data.  In addition table and column metadata are fully supported.
 |  
 |  `~astropy.table.Table` differs from `~astropy.nddata.NDData` by the
 |  assumption that the input data consists of columns of homogeneous data,
 |  where each column has a unique identifier and may contain additional
 |  metadata such as the data unit, format, and description.
 |  
 |  See also: https://docs.astropy.org/

In [17]:
del results

## Questions/Reflections

- O que é isso scisql_nanojanskyToAbMag? todos os notebooks usam eu usei tmb, pelo visto ele calcula o um valor em notação cientifica para que a magnitude na banda.
    unidade de grandeza de fluxo -> mag 
    
- Como que eu sei tamanhos de tabela por exemplo para rodar tudo q for galáxia?