In [1]:
import psycopg2
import psycopg2.extras

The galactic pipeline works as follows.  On designated "reference" images— right now I just have one per filter— it finds all sources using Eddie Schlafly's crowdsource algorithm, and saves them to the galsources table in the database.

Then, for every other image (call them "search" images), it will run through the same reduction procedure.  It finds sources but does _not_ save them, instead just using them for astrometric and (sort of) photometric calibration.  After that, it reads all the sources from the reference image (of the relevant filter) from the database, and uses the "force fit" version of crowdsource to force there to be sources at all the same positions on the search image as they were on the ref image.  Because I noticed ~10% variations across the chip in the comparison between source and ref, I do a correction to the source magnitudes so that in bulk they match the ref magnitudes; that correction is a 4th-order chebyshev polynomnial in magnituide difference as a function of x and y across the image.  These corrected magnitudes for all of the sources that had >0 flux on the search image are saved to the database.  What this means is that the zeropoint saved in the database for the image is only an approximation.  We're more interested in relative photometry than absolute photometry, so I've calibrated the search image to the reference image.  The real photometric system is defined by the reference image, but then the zeropoint of the reference image should give you absolute calibration to within 10% or 20%.

Finally, the pipeline looks at all of the sources in the search image for magnitude differences from the ref image.  Right now it finds a __lot__.  I do a few things.  First, when saving the catalog of the search image, I enforce an "uncertainty floor" of 0.01 magnitudes.  I don't believe that crowdsource can be trusted to better than this, even though it often quotes sub-0.01 magnitude uncertainties.  (It's probably not to better than 2%, in fact.)  There's also issues of how damn difficult it is to adequately subtract the sky in crowded fields.  I stopped using crowdsources' sky subtraction, and instead went with the Bijaoui algorithm (done in a 20x10 tiles to which a smooth background is fit), as I was getting better results with that.  Second, when I do the correction fit, I look at the χ²/ν value, and expand the uncertainties in the search image by a factor that will make the χ²/ν value go to 1.  (So far, typically, this has been a factor of a few.)  Finally, I do a 15σ cut on the magnitude difference, which sounds high, but the number of variables was overwhelming without that.  We can still tweak this pipeline if we don't like this.

We still get a lot of variables.  However, many of them will be "one-offs"; that is, the result of an image artifact or something like that.  We'll want to focus more on the objects that are detected multiple times as differnt from the ref image.  One flaw in the pipeline right now is that an artifact in the _ref_ image will cause something to be detected multiple times.  I want to think about how to deal with that.  (Possiblities include: using mutiple ref images (all photometrically calibrated to one of them), stacked ref image with outlier rejection.)

All of the tables for the galactic pipeline start with "gal"; the relevant ones are:
* galexposures — each full 62-chip exposure downloaded from the NOIRLab data archive
* galimages — one record for each chip of each exposure; galexposure_id points to the galexposures table
* galsources — this is the __GIANT__ table (as in 10⁷ rows per exposure).  Sources detected on images.  galimage_id points to the galimages table.
* galvarobjs — variable objects detectected where the galsources entry for an image was significantly different from the galsources entry for that same object for the relevant reference image.  (The galimages field "reference" tells you which image was used as a reference for a given search image

I'm hoping that the q3c extension will make searching the ginormous galsources table by ra/dec relatively quick.  This does mean I have to regularly run a very slow "cluster" process to make the q3c index effective, which seems to lock up the galsources table while I'm doing it.  We may decide that saving all the sources from every image is too much for the database, but it would be really nice if we could keep that, as it means that when a variable object is detected, all of the photometry that we've succesfully grabbed for that object is already right there in the database.

Here's a query that will get counts of how many times a given object was detected as variable (i.e. different from the ref).  The "refsource_id" will be the same for the same objects.  (This is _within a filter_.  The same star in g and and r will have different refsource ids.  Cross-filter identification will require searches by RA/DEC, and then there will be blending issues if the photometry on the g and r refs didn't deblend the objects the same way!  The galaxy has too many stars.)

In [9]:
db = psycopg2.connect("dbname='decat_dev' user='mgraham' password='PutYourPasswordHere' host='decatdb.lbl.gov'")

In [10]:
cursor = db.cursor( cursor_factory = psycopg2.extras.DictCursor )
query = ( "SELECT num,COUNT(num) as count "
          "FROM ( SELECT COUNT(v.id) as num "
                 "FROM galvarobjs v "
                 "GROUP BY refsource_id ) subq "
          "GROUP BY num "
          "ORDER BY num DESC" )
cursor.execute( query )
print( "Repeated Detections  Number of Objects" )
for row in cursor.fetchall():
    print( f'    {row["num"]:8d}               {row["count"]:8d}' )

Repeated Detections  Number of Objects
           9                     63
           8                    401
           7                    496
           6                   1850
           5                   2520
           4                   1407
           3                   1930
           2                   3437
           1                  27786
