# A DP0.3 query option for Peter Veres

Set up.

In [1]:
import numpy as np
# import matplotlib.pyplot as plt
import pandas as pd
from lsst.rsp import get_tap_service

In [2]:
service = get_tap_service("ssotap")

## triple join

Do a triple-join version of the query, and check if duplicates from the `DiaSource` table are retrieved.

The duplicate that Peter found was for `diaSourceId` = -1000014274337402402, and MJD 60528.09827.

Use a smaller time range for a faster query.

In [3]:
query = "SELECT mpc.mpcDesignation, mpc.mpcNumber, mpc.ssObjectId, mpc.fullDesignation, "\
        "ds.midPointMjdTai, ds.ra, ds.dec, ds.mag, ds.band, "\
        "ss.eclipticBeta, ss.eclipticLambda, ss.phaseAngle, ss.diaSourceId "\
        "FROM dp03_catalogs_1yr.DiaSource AS ds "\
        "JOIN dp03_catalogs_1yr.SSSource AS ss ON ds.diaSourceId = ss.diaSourceId "\
        "JOIN dp03_catalogs_1yr.MPCORB AS mpc ON ds.ssObjectId = mpc.ssObjectId "\
        "WHERE ds.midPointMjdTai BETWEEN 60527.1 AND 60529.1 "
results = service.search(query).to_table()
results

mpcDesignation,mpcNumber,ssObjectId,fullDesignation,midPointMjdTai,ra,dec,mag,band,eclipticBeta,eclipticLambda,phaseAngle,diaSourceId
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,d,deg,deg,Unnamed: 7_level_1,Unnamed: 8_level_1,deg,deg,deg,Unnamed: 12_level_1
str8,int32,int64,str26,float64,float64,float64,float32,str1,float64,float64,float32,int64
S100doiQ,0,14169511631100,2011 S100doiQa,60528.0592,229.3505499,-16.0275999,23.164,r,2.1060307598130077,231.20537881859224,21.629921,4382560468718205905
S100doiQ,0,14169511631100,2011 S100doiQa,60529.06054,229.5382768,-16.0738546,22.92,i,2.108042446276485,231.39173795036453,21.66337,8736066679182986999
S100doiQ,0,14169511631100,2011 S100doiQa,60528.03517,229.3462406,-16.0264993,22.946,i,2.10601970086341,231.20109039548535,21.62895,-7505819655383129492
S100doiQ,0,14169511631100,2011 S100doiQa,60529.03671,229.5338767,-16.0728115,23.174,r,2.1079579425901778,231.38738096037488,21.662514,-5177897605916232160
2003 UC1,0,21630997438946,2011 2003 UC185,60527.20714,298.6265962,-23.396531,19.103,g,-2.514933413064907,296.1122459830124,9.253874,8843940056714935808
S100mO5w,0,426219105868011,2011 S100mO5wa,60528.08927,245.0750437,-29.9528559,23.921,r,-8.368482838442976,248.3415560040466,17.39154,-8852318714543151457
S1006ZLi,0,636383519099723,2011 S1006ZLia,60528.10052,255.3395264,4.7602977,21.603,i,27.36378396591178,253.50122252266763,15.228839,3683915390494606209
S1006ZLi,0,636383519099723,2011 S1006ZLia,60528.07591,255.3410405,4.7643857,21.912,r,27.368016654285764,253.5023887973632,15.225852,-220250982522433581
S100vzmr,0,638321569403210,2011 S100vzmra,60527.23187,301.6913488,-18.4934902,23.113,r,1.7170610854691415,299.8967197427676,6.229333,5439958994279187748
...,...,...,...,...,...,...,...,...,...,...,...,...


Look for how many rows there are in the result table for the `DiaSource` of interest, which was duplicated with a non-join query.

In [4]:
tx = np.where(results['diaSourceId'] == -1000014274337402402)[0]

In [5]:
print(tx)

[77281]


Only one row for this `DiaSource`, good.

In [6]:
print(results[:][tx[0]])

mpcDesignation mpcNumber      ssObjectId     fullDesignation midPointMjdTai      ra       dec     mag  band    eclipticBeta      eclipticLambda   phaseAngle     diaSourceId     
                                                                   d            deg       deg                      deg                deg            deg                         
-------------- --------- ------------------- --------------- -------------- ----------- -------- ----- ---- ------------------ ------------------ ---------- --------------------
      2015 UM3         0 2909710257521553353  2011 2015 UM36    60528.09827 246.3980009 -4.25119 22.98    i 17.186773165106448 245.29527821401246  14.219118 -1000014274337402402


How about all the other `DiaSource` values, are they all unique?

In [7]:
uvals = np.unique(np.asarray(results['diaSourceId'][:]))

In [8]:
print('If there are no more duplicates, ', len(uvals), ' equals ', len(results))

If there are no more duplicates,  489613  equals  489613


Great.

The above shows why triple joins are better than a triple-table query without a join.

But, triple joins are still a bit inefficient.

In the results table above, can see that the values from the `MPCORB` table take up a lot of space in the results table in a redundant way.

Clean up.

In [9]:
del query, results, tx, uvals

<br>

## double-join, two-step process

Compared to the query above, remove rows from `MPCORB`, and add column `ds.ssObjectId`.

In [10]:
query1 = "SELECT ds.midPointMjdTai, ds.ra, ds.dec, ds.mag, ds.band, ds.ssObjectId, "\
         "ss.eclipticBeta, ss.eclipticLambda, ss.phaseAngle, ss.diaSourceId "\
         "FROM dp03_catalogs_1yr.DiaSource AS ds "\
         "JOIN dp03_catalogs_1yr.SSSource AS ss ON ds.diaSourceId = ss.diaSourceId "\
         "WHERE ds.midPointMjdTai BETWEEN 60527.1 AND 60529.1 "
results1 = service.search(query1).to_table()
results1

midPointMjdTai,ra,dec,mag,band,ssObjectId,eclipticBeta,eclipticLambda,phaseAngle,diaSourceId
d,deg,deg,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,deg,deg,deg,Unnamed: 9_level_1
float64,float64,float64,float32,str1,int64,float64,float64,float32,int64
60528.0592,229.3505499,-16.0275999,23.164,r,14169511631100,2.1060307598130077,231.20537881859224,21.629921,4382560468718205905
60529.06054,229.5382768,-16.0738546,22.92,i,14169511631100,2.108042446276485,231.39173795036453,21.66337,8736066679182986999
60528.03517,229.3462406,-16.0264993,22.946,i,14169511631100,2.10601970086341,231.20109039548535,21.62895,-7505819655383129492
60529.03671,229.5338767,-16.0728115,23.174,r,14169511631100,2.1079579425901778,231.38738096037488,21.662514,-5177897605916232160
60527.20714,298.6265962,-23.396531,19.103,g,21630997438946,-2.514933413064907,296.1122459830124,9.253874,8843940056714935808
60528.08927,245.0750437,-29.9528559,23.921,r,426219105868011,-8.368482838442976,248.3415560040466,17.39154,-8852318714543151457
60528.10052,255.3395264,4.7602977,21.603,i,636383519099723,27.36378396591178,253.50122252266763,15.228839,3683915390494606209
60528.07591,255.3410405,4.7643857,21.912,r,636383519099723,27.368016654285764,253.5023887973632,15.225852,-220250982522433581
60527.23187,301.6913488,-18.4934902,23.113,r,638321569403210,1.7170610854691415,299.8967197427676,6.229333,5439958994279187748
...,...,...,...,...,...,...,...,...,...


Create a string list of the unique `ssObjectId`.

This is the advice currently provided in the documentation:<br>
https://dp0-2.lsst.io/data-access-analysis-tools/adql-recipes.html#individual-objects

However, at the moment, there seems to be a limit on how long a string can be passed, and I'm trying to figure out what it is:<br>
https://community.lsst.org/t/what-is-the-limit-for-lists-passed-to-adql/8114

I'm also not sure if this is a temporary limit for DP0 or not.

For now, just take a sub-set while I figure out how this part should go.

In [11]:
# temp = np.unique(np.asarray(results1['ssObjectId'][0:-1]))
temp = np.unique(np.asarray(results1['ssObjectId'][0:50000]))
print(len(temp))

tempstring = "(" + ','.join(['%22i' % num for num in temp]) + ")"

del temp

26345


It will print them all if you ask but it does take a moment.

In [12]:
# tempstring

Now query for the data from the `MPCORB` table.

In [13]:
query2 = "SELECT mpcDesignation, mpcNumber, ssObjectId, fullDesignation "\
         "FROM dp03_catalogs_1yr.MPCORB "\
         "WHERE ssObjectId IN " + tempstring
results2 = service.search(query2).to_table()
results2

mpcDesignation,mpcNumber,ssObjectId,fullDesignation
str8,int32,int64,str26
S100doiQ,0,14169511631100,2011 S100doiQa
2003 UC1,0,21630997438946,2011 2003 UC185
S100hM3K,0,91337575092450,2011 S100hM3Ka
2010 VU1,0,239345804377156,2011 2010 VU172
S100mO5w,0,426219105868011,2011 S100mO5wa
S1006ZLi,0,636383519099723,2011 S1006ZLia
S100vzmr,0,638321569403210,2011 S100vzmra
2005 XJ8,0,728813012469554,2011 2005 XJ84
S1023qfc,0,854866448999463,2011 S1023qfca
S10067A2,0,1017033074662418,2011 S10067A2a
