<img align="left" src = https://linea.org.br/wp-content/themes/LIneA/imagens/logo-header.jpg width=100 style="padding: 20px"> 

<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=160 style="padding: 20px">  

#  Reading catalogs - LSST DP0 from Rubin Science Plataform

**Contact author**: Heloisa da Silva Mengisztki ([heloisasmengisztki@gmail.com](mailto:heloisasmengisztki@gmail.com)) 

**Last verified run**: 2022-12-01 (YYYY-MM-DD) <br><br>


### TAP - Table Acsess Protocol

TAP is a protocol created to access general table data. 
It uses html and xml to configure and acess the data, wich can be tabular, with key values that are stored in tabbles, one column per keyword, and non tabular such as images, an n-dimensional data. 
Also, it passes as parameters atributes that are configurable, for example, the language and the query that we want trough:

LANG=ADQL<br>
QUERY=< ADQL query string >

```xml
    <capability standardID="ivo://ivoa.net/std/TAP"> 
        <!-- BasicAA authentication bundle -->
        <interface xsi:type="urx:Async" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-async</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
        <interface xsi:type="urx:Sync" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-sync</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
     </capability>
```
By default it returns a TapResult, witch is a wrapper for the Astropy Table that constains some metadata of the schema that is being stored, that can be accessed by some methods as getColumn(), getRecords(), etc.

Its important to remember that TAP is a protocol to access the database where data is being stored, not the database itself.

TAP Results [documentation](https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.TAPResults.html) <br>
Oficial [documentation](https://www.ivoa.net/documents/TAP/)<br>
Some Videos: 
[video 1](https://www.youtube.com/watch?v=hFmhypXg7JA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ), 
[video 2](https://www.youtube.com/watch?v=BX10AI0WgMA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=2),
[video 4](https://www.youtube.com/watch?v=szDdL7sqD68&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=3)

In [None]:
import time
import numpy as np
import matplotlib.pyplot as plt
import pandas

from lsst.rsp import get_tap_service

Its important to set max rows here, because these catalogs have a lot of objects, and it may not be that interesting to bring it all here.

In [None]:
pandas.set_option('display.max_rows', 20)

Here we access tap service, since we are inside rubin science plataform we can call the method get_tap_service() and its not necessary to provide a login and password, or a connection string.

In [None]:
service = get_tap_service()

assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

### Seeing the schemas to see what we have

Here we are going to have a look to all the schemas that we can access using tap. What interesses us here are the schemas for dp0 data, the dp01_dc2_catalogs and dp02_dc2_catalogs.

In [None]:
query = "SELECT * FROM tap_schema.schemas"
results = service.search(query).to_table()

results

Looking inside the dp0.2 catalog whe can see that we have some tables and its names. What interesses us here is the Object table, it contains the objects that were already coadded images. 

In [None]:
query = "SELECT * FROM tap_schema.tables WHERE tap_schema.tables.schema_name = 'dp02_dc2_catalogs' order by table_index ASC"

results = service.search(query).to_table()
results

### Geting the columns for DP0.2 Objects

In [None]:
results = service.search("SELECT column_name, datatype, description, unit from TAP_SCHEMA.columns WHERE table_name = 'dp02_dc2_catalogs.Object'")
results.to_table().to_pandas()

### Preparing the query

In [None]:
max_rec = 10
use_center_coords = "62, -37"
use_radius = "1.0"

In [None]:
bands = ['g', 'i', 'r', 'u', 'y', 'z']

mags = ""
for band in bands:
    mags+= f"scisql_nanojanskyToAbMag({band}_cModelFlux) AS mag_{band}_cModel, {band}_cModelFluxErr, "

columns_query = f"objectId, coord_ra, coord_dec, {mags}detect_isPrimary, r_extendedness "

for this quey there is *detect_isPrimary* wich means that the source has no children, so that is already the final object. (this explanation is not very clear, but ok) and *r_extendedness* that defines if the object is a star or a galaxy, being 1 for galaxies and 0 for point objects such as starts.

In [None]:
query = "SELECT " + columns_query + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 "
print(query)

### Running it Sync

using the parameter _memory_usage = deep_ helps us to see the memory usage iuncluding the object datatypes and some other informations

In [None]:
%%time
results = service.search(query, maxrec=max_rec).to_table().to_pandas()
results.info(memory_usage="deep")

In [None]:
results

### Running it Async

To run is asynchronously its is necessary to create a job and once we run it, we can wait for the results an check the status with the method wait()

In [None]:
job = service.submit_job(query)

In [None]:
job.run()

In [None]:
%%time
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)

In [None]:
%%time
results = job.fetch_result().to_table().to_pandas()
results.info(memory_usage="deep")

In [None]:
results.head()

### Cleaning and delete jobs and results

In [None]:
job.delete()
del results