## ZTF - Data Processing

In this notebook, we query the local UW/DIRAC database for ZTF alerts and process them into a format that can be used by THOR. 

The resulting processed data files can be downloaded [here](https://dirac.astro.washington.edu/~moeyensj/projects/thor/paper1/data/ztf).

In [1]:
import os
import glob
import numpy as np
import pandas as pd
import sqlite3 as sql

import mysql.connector as mariadb
from astropy.time import Time

from thor import __version__
print("THOR Version: {}".format(__version__))

THOR Version: 1.1.dev199+g1c54766.d20210401


In [2]:
os.nice(1)

1

## Data Processing

Here we connect to the alert database and query it for two weeks of observations from night ID 610 up to and including night 624. 

A description of the format of the alerts can be found here: https://zwickytransientfacility.github.io/ztf-avro-alert/schema.html

In [3]:
# Connect to database
con = mariadb.connect(user='ztf', database='ztf')

In [4]:
# Read alerts for solar system objects from after the photometry fix 
sso_alert_fix_date1 = Time('2018-05-16T23:30:00', format='isot', scale='utc') # first attribution fix
sso_alert_fix_date2 = Time('2018-06-08T23:30:00', format='isot', scale='utc') # second attribution fix
sso_alert_phot_fix_date = Time('2018-06-18T23:30:00', format='isot', scale='utc') # photometry fix date

In [5]:
# Only consider alerts post photometry fix
jd_good = sso_alert_phot_fix_date.jd
#ssdistnr >= 0 
df = pd.read_sql_query('select distinct nid from alerts where jd > {}'.format(jd_good), con)
print(len(df))

497


In [6]:
# Set the night range (the nights were picked by looking for an average two week period 
# in terms of the alert volume)
night_range = [610, 624]
df = pd.read_sql_query('select * from alerts where nid >= {} and nid <= {}'.format(*night_range), con)
print(len(df))

4966353


In [7]:
df.sort_values(by=["jd"], inplace=True)
df.reset_index(inplace=True)

Only keep observations with real bogus value above 0.5 and that have been observed less than 4 times in the same area (removes static sources). 

In [8]:
df = df[(df["rb"] >= 0.5) & (df["ndethist"] <= 4)]
len(df)

827546

In [9]:
df.to_csv("ztf_observations_610_624.csv", index=False, sep=" ")

## Preprocess Observations

The ZTF alerts have now been saved to the file listed above. We read in the same file and do some additional processing into a state that THOR can use. 
- Fix the ZTF alert ssnamenr column so that the object designations can later be crossmatched against unpacked MPC designations
- Separate the now fixed known object labels from the observations and store them separately (as preprocessed_associations)
- Store only the columns from the observations file that THOR needs to start running (as preprocessed_observations)

In [2]:
DATA_DIR = "/mnt/data/projects/thor/thor_data/ztf"
observations = pd.read_csv(
    os.path.join(DATA_DIR, "ztf_observations_610_624.csv"), 
    sep=" ", 
    index_col=False, 
    low_memory=False
)
observations.sort_values(by="jd", inplace=True)

observations["observatory_code"] = ["I41" for i in range(len(observations))]    
observations["mjd_utc"] = Time(
    observations["jd"], 
    scale="utc", 
    format="jd"
).utc.mjd

In [3]:
len(observations)

827546

In [4]:
observations.head()

Unnamed: 0,index,objectId,jd,fid,pid,diffmaglim,programid,candid,isdiffpos,tblid,...,clrrms,neargaia,neargaiabright,maggaia,maggaiabright,exptime,drb,drbversion,observatory_code,mjd_utc
0,88437,ZTF18abdsqbl,2458365.0,2,610130484415,19.2443,1,610130484415010015,f,15,...,0.197519,0.261128,0.261128,12.4554,12.4554,,,,I41,58364.130486
1403,88702,ZTF18abslutw,2458365.0,2,610130486315,19.1495,1,610130486315015002,t,2,...,0.153182,0.383472,0.383472,12.2757,12.2757,,,,I41,58364.130486
1402,88700,ZTF18abduicp,2458365.0,2,610130485715,18.9922,1,610130485715015012,t,12,...,0.150455,0.221725,0.221725,12.5379,12.5379,,,,I41,58364.130486
1401,88698,ZTF18abslutg,2458365.0,2,610130484015,19.2974,1,610130484015015041,t,41,...,0.211511,5.367,34.1899,19.5617,12.5654,,,,I41,58364.130486
1400,88697,ZTF18abslutb,2458365.0,2,610130484015,19.2974,1,610130484015015010,t,10,...,0.211511,4.33071,80.2871,19.1578,12.8306,,,,I41,58364.130486


In [5]:
def fixZTFDesignations(ssnamenr):
    try: 
        # eg. 401811 -> 401811
        designation = str(int(ssnamenr)) 
    except: 
        if len(ssnamenr) <= 4:
            # eg. 173P -> 173P
            designation = ssnamenr
        elif ssnamenr[1] == "/":
            # eg. C/2012A2 -> C/2012 A2
            designation = "{} {}".format(ssnamenr[:6], ssnamenr[6:])
        else:
            # eg. 2008SO196 -> 2008 SO196, 2007UJ07 ->  2007 UJ7
            if int(ssnamenr[6:]) == 0:
                n = ""
            else:
                n = str(int(ssnamenr[6:]))
            designation = "{} {}{}".format(ssnamenr[:4], ssnamenr[4:6], n)
    return designation

observations.loc[~observations["ssnamenr"].isna(), "ssnamenr_fixed"] = observations[~observations["ssnamenr"].isna()]["ssnamenr"].apply(fixZTFDesignations)

Let's take a look at some of the unique designation that were fixed:

In [6]:
observations[(observations["ssnamenr_fixed"] != observations["ssnamenr"]) & (~observations["ssnamenr"].isna())][["ssnamenr", "ssnamenr_fixed"]].drop_duplicates()

Unnamed: 0,ssnamenr,ssnamenr_fixed
1697,2009SJ52,2009 SJ52
1614,2014HQ45,2014 HQ45
328,2010PJ64,2010 PJ64
2539,2012OD01,2012 OD1
2551,2009HE82,2009 HE82
...,...,...
821779,2005WT49,2005 WT49
821764,2015BE65,2015 BE65
822671,2015BB120,2015 BB120
824159,2011DY21,2011 DY21


In [7]:
from thor import preprocessObservations

column_mapping = {
    "obs_id" : "candid",
    "mjd" : "mjd_utc",
    "RA_deg" : "ra",
    "Dec_deg" : "decl",
    "RA_sigma_deg" : None,
    "Dec_sigma_deg" : None,
    "observatory_code" : "observatory_code",
    "obj_id" : "ssnamenr_fixed",
}   
mjd_scale = "utc"
astrometric_errors = {
    "I41" : [
        0.1/3600, 
        0.1/3600
    ]
}

preprocessed_observations, preprocessed_associations = preprocessObservations(
    observations,
    column_mapping,
    mjd_scale=mjd_scale,
    astrometric_errors=astrometric_errors
)
# Each unlabeled/unattributed observation should have a unique label (this is for the linkage analysis tool)
preprocessed_associations.loc[preprocessed_associations["obj_id"] == "None", "obj_id"] = ["u{:08d}".format(i) for i in range(len(preprocessed_associations[preprocessed_associations["obj_id"] == "None"]))]

analysis_observations = preprocessed_observations.merge(preprocessed_associations, on="obs_id")

preprocessed_observations.to_csv(
    "/mnt/data/projects/thor/thor_data/ztf/preprocessed_observations.csv",
    index=False,
)
preprocessed_associations.to_csv(
    "/mnt/data/projects/thor/thor_data/ztf/preprocessed_associations.csv",
    index=False,
)

Using 'astrometric_errors' parameter to assign errors...



In [8]:
preprocessed_observations.head(10)

Unnamed: 0,obs_id,mjd_utc,RA_deg,Dec_deg,RA_sigma_deg,Dec_sigma_deg,observatory_code
0,610130484415010015,58364.130486,255.347544,-23.059466,2.8e-05,2.8e-05,I41
1,610130481215010007,58364.130486,255.077637,-26.542553,2.8e-05,2.8e-05,I41
2,610130481215015021,58364.130486,254.708502,-26.721381,2.8e-05,2.8e-05,I41
3,610130483515015056,58364.130486,261.287334,-24.006969,2.8e-05,2.8e-05,I41
4,610130483515015069,58364.130486,261.062061,-23.963993,2.8e-05,2.8e-05,I41
5,610130481715015012,58364.130486,260.863805,-24.507448,2.8e-05,2.8e-05,I41
6,610130481715010004,58364.130486,260.59806,-24.38379,2.8e-05,2.8e-05,I41
7,610130481215015013,58364.130486,254.799074,-26.510711,2.8e-05,2.8e-05,I41
8,610130483515015052,58364.130486,261.668127,-23.836318,2.8e-05,2.8e-05,I41
9,610130483915015049,58364.130486,258.969586,-23.671591,2.8e-05,2.8e-05,I41


In [9]:
preprocessed_associations.head(10)

Unnamed: 0,obs_id,obj_id
0,610130484415010015,u00000000
1,610130481215010007,u00000001
2,610130481215015021,u00000002
3,610130483515015056,u00000003
4,610130483515015069,u00000004
5,610130481715015012,u00000005
6,610130481715010004,u00000006
7,610130481215015013,u00000007
8,610130483515015052,u00000008
9,610130483915015049,u00000009
