## Crossmatch TAP query demo 
This notebook was written to show how to access the crossmatch results for a known list of sourceIDs. Initially a sample list of IDs was 
obtained by querying the table itself. Andy Wilson then provided a file containing the actual IDs they were interested in so the notebook was modified to read 
in that file.

import and set-up

In [1]:
# Import the Rubin TAP service utilities
from lsst.rsp import get_tap_service, retrieve_query

service = get_tap_service("tap")
assert service is not None
assert service.baseurl == "https://rsp.lsst.ac.uk/api/tap"

# see what databases there are
query = "SELECT * FROM tap_schema.schemas"
resultsSchema = service.search(query).to_table()
resultsSchema

description,schema_index,schema_name,utype
str512,int32,str64,str512
"Data Preview 0.2 contains the image and catalog products of the Rubin Science Pipelines v23 processing of the DESC Data Challenge 2 simulation, which covered 300 square degrees of the wide-fast-deep LSST survey region over 5 years.",0,dp02_dc2_catalogs,
allsky gaia_source catwise_2020 matches from Edinburgh. Run late 2023,1,gaiaxcatwise2312,
A TAP-standard-mandated schema to describe tablesets in a TAP 1.1 service,100000,tap_schema,
UWS Metadata,120000,uws,
VIDEO/HSC database from WP3.5,3,video,
VIKING/HSC database from WP3.5,2,viking,


In [2]:
import pandas
# have a look at the columns
query = "SELECT * from TAP_SCHEMA.columns "+\
                 "WHERE table_name = 'gaiaxcatwise2312.matches_source'"
res = service.search(query)
print(res.fieldnames)
results_table = res.to_table().to_pandas()
print(results_table[['column_name','datatype']])

('"size"', 'arraysize', 'column_index', 'column_name', 'datatype', 'description', 'indexed', 'principal', 'std', 'table_name', 'ucd', 'unit', 'utype', 'xtype')
             column_name datatype
0               ab_flags     char
1             ag_gspphot    float
2       ag_gspphot_lower    float
3       ag_gspphot_upper    float
4    astrometric_chi2_al    float
..                   ...      ...
363         wise_fit_sig   double
364              wise_ra   double
365                   wx    float
366                   wy    float
367                   xi   double

[368 rows x 2 columns]


Simple count query, takes some minutes to complete

In [3]:
# simple count rows query, answer is 753681386

query = "select count(*) as c from gaiaxcatwise2312.matches_source"
results = service.search(query)

print(results)

<DALResultsTable length=1>
    c    
  int64  
---------
753681386


In [4]:
print(type(results))
table=results.to_table()
print(type(table))

<class 'pyvo.dal.tap.TAPResults'>
<class 'astropy.table.table.Table'>


In [5]:
# test to get some sourceIds
query="select source_id from gaiaxcatwise2312.matches_source as x  limit 1000000"
results = service.search(query)
table = results.to_table()
print(table)

     source_id     
-------------------
6357604827140689280
6357604831436779008
6357604865796654080
6357604900156255360
6357651693823824384
6357651693823825920
6357651693823827456
6357651698119913728
6357651728183570560
6357651762543311616
                ...
4638486579995342592
4638486579995346688
4638486579995348480
4638486579995348608
4638486579995389952
4638486579995394176
4638486579995394432
4638486579995930112
4638486579995952512
4638486580000282112
4638486580000297344
Length = 1000000 rows


In [6]:
sourceIds=table['source_id'].data
print(sourceIds[0],max(sourceIds),min(sourceIds))

6357604827140689280 6429985544456040576 288307273246956800


load up the provided IDs

In [7]:
# load Andy's sourceIds
from astropy.io import fits
hdul = fits.open('/home/mikeeread/GaiaEDR3_SourceIds_NGPv4.fits')
data = hdul[1].data # assuming the first extension is a table
hdul.close()
#print(data[0])
andysSourceIds=data['source_id']
print(andysSourceIds,len(andysSourceIds))

[4282338859504210048 4282338889562080256 4282338893863946624 ...
 2018615353311150592 2018615357602731008 2018615357625887488] 20039800


some 20 million IDs so split into 2001 chunks, and around 1000 Ids per chunk. Break after 2 loops as just an example


In [8]:
import numpy
# split up the sourceIds into N chunks (the number is the number of chunks not size of chunks)
chunks = numpy.array_split(numpy.array(andysSourceIds),2001) 
import datetime
i=0
from astropy.table import QTable, Table, Column, vstack
fullResults=Table()
#loop through chunks and submit query for those sourceIds, bringing back a few columns
# stacks all results into one table (but maybe they should be kept separate as this might be inefficient)
for chunk in chunks:
    print(i,len(chunk),datetime.datetime.now())
    
    inClause=','.join(str(sourceId) for sourceId in chunk) #numpy.array2string(chunk, separator=",")
    #print(inClause)
    query="select ra,dec,phot_bp_mean_mag,phot_g_mean_mag,phot_rp_mean_mag from gaiaxcatwise2312.matches_source where source_id in ("+inClause+")"
    #print(query)

    results = service.search(query)
    table = results.to_table()
    if i==0:
        fullResults=table
    else:
        fullResults=(vstack([fullResults, table]))
    #print(table)
    i+=1
    # set to low number for tests
    if i==2:
        break
print(len(fullResults))

0 10015 2024-03-25 11:50:29.469824
1 10015 2024-03-25 11:50:52.604686
13953


In [9]:
print(fullResults)

        ra                dec         ... phot_g_mean_mag phot_rp_mean_mag
------------------ ------------------ ... --------------- ----------------
  282.138360128723  4.443532834875455 ...         17.2079          16.1848
 282.1453523044547 4.4498532842778245 ...         18.3223          17.2477
 282.1599689311139  4.451171826526903 ...         17.6882          16.4412
 282.1445341478628  4.452055493243423 ...         16.0189          14.9313
 282.1160230777142   4.43722356702056 ...         18.9184          17.7303
 282.0870939766269  4.442877361154861 ...         17.9538          16.8751
282.09706619686176   4.43927813379696 ...         15.4824          14.4274
282.10620742763234  4.454777372628117 ...         17.9685          16.8901
 282.1057034003062  4.460084808475075 ...         17.6387          16.4353
 282.1373414992661  4.453260767628231 ...         17.9423          16.8716
               ...                ... ...             ...              ...
 284.2633592808782  6.640

In [18]:
#import pyvo
#tap = pyvo.dal.TAPService('https://rsp.lsst.ac.uk/api/tap')
#tap.run_sync('select count(*) from SXDS.director')