# 1. Cleanse ZTF catalogue

**Source:** Export the ZTF catalogue as CSV from the [ZTF BTS explorer](https://sites.astro.caltech.edu/ztf/bts/explorer.php) and save as `ztf.csv` in the project root.

**Columns used:** `type` (filter to SN Ia only), `peakt` (time of peak), `IAUID` (must start with `SN` for Lasair compatibility).

**Time conversion:** The BTS gives peak time as JD − 2458000. We convert to MJD with  
`peakt → MJD = peakt + 57999.5`.

**Output:** `ztf_cleansed.csv` in the project root.

In [1]:
# load ztf data. Raw CSV from https://sites.astro.caltech.edu/ztf/bts/explorer.php
from pathlib import Path
import pandas as pd

project_root = Path.cwd().parent
ztf_data = project_root / "ztf.csv"

ztf_df = pd.read_csv(ztf_data)
print(ztf_df.head())

          ZTFID      IAUID           RA          Dec    peakt peakfilt  \
0  ZTF17aabtvsy  SN2022yei  10:35:32.09  +37:38:59.0  1870.99        r   
1  ZTF17aabvong  SN2024xxq  02:05:07.68  +11:14:55.1  2606.75        g   
2  ZTF17aacldgo  SN2022zxv  03:09:24.36  -04:53:39.2  1897.75        g   
3  ZTF17aadlxmv  SN2020adv  08:29:47.59  +33:54:22.8   879.69        g   
4  ZTF18aaaonon  SN2022jjs  10:19:05.51  +14:24:16.6  1703.77        g   

   peakmag peakabs duration    rise      fade   type redshift          b  \
0  18.0303  -19.41  >34.229   >6.01    28.219  SN Ia  0.06922  59.641962   
1  16.8039  -19.57   23.222   8.464    14.758  SN Ia    0.034 -47.664064   
2  18.7979  -18.91    >1077   >3.85  >1073.15  SN Ia    0.072 -50.332472   
3  17.9475  -19.34   25.146  10.951    14.195  SN Ia    0.062  34.174702   
4  18.5663  -19.08  >18.631   3.691    >14.94  SN Ia  0.07141  52.363911   

     A_V  
0  0.053  
1  0.446  
2  0.183  
3  0.106  
4  0.150  


In [None]:
# ZTF ID:	ZTF 9-letter identifier.
# TNS ID:	TNS AT/SN code. (A few transients are wrongly matched on TNS.)
# Disc ID:	Discovery name posted by the first team to register the transient on TNS.
# RA:	Right ascension.
# Dec:	Declination.
# time:	Time of peak, expressed as JD-2458000.
# mag:	Peak P48 measured magnitude. (If the true peak was missed, this is an upper limit.)
# M_abs:	Absolute magnitude at measured peak, given the observed redshift and extinction. Applies a basic k-correction.
# Rise:	Rise time from half-peak to peak. (This is calculated crudely by linearly interpolating the light curve, including upper limits before the first detection). If no detections or limits deeper than 0.75 mag below peak exist in the post-peak alert history this is given as a limit.
# Fade:	Fade time from peak to half-peak. (Calculated the same way as rise time.) If no detections deeper than 0.75 mag below peak exist in the post-peak alert history this is given as a limit.
# Duration:	The sum of rise and fade times, i.e. the time above half-maximum.
# Type:	Classification from TNS.
# Redshift:	Redshift. Often approximate. (For M31/M33 this is D_L*H_0*c.)
# Host M_i:	Host absolute magnitude. Incudes a basic color-dependent k-correction. Not reliable for PS1 or nearby galaxies (see above); omitted at z<0.01.
# Host g-i:	Host rest-frame g-i color. Not reliable for PS1 or nearby galaxies (see above).
# Cuts:	Bit codes indicating which cut criteria the transient passes.
# b:	Galactic latitude in degrees.
# A_V:	Galactic extinction in V-band in mag.

In [6]:
# We won't use the original csv, we copy to cleansed.csv
cleansed_df = ztf_df.copy()
# remove any rows where type is now SN 1a
cleansed_df = cleansed_df[cleansed_df['type'] == 'SN Ia']
# update peak time format. 
cleansed_df['peakt'] = cleansed_df['peakt'] + 57999.5 # MJD_peak = time + 57999.5
# check if IAUID starts with SN otherwise remove, We only get SN's from lasair
cleansed_df = cleansed_df[cleansed_df['IAUID'].str.startswith('SN')]
# save cleaned ztf data
cleansed_df.to_csv(project_root / "ztf_cleansed.csv", index=False)
print(len(cleansed_df))
print(f"Successfully cleaned")

5010
Successfully cleaned
