# LoTSS DR1
## Clean catalogues

***

* [Original catalogues](#Loading-the-original-catalogues) 
* [Cleaned catalogues](#Output-the-cleaned-catalogues)


#### Libraries 

In [1]:
import os
import numpy as np
from astropy.table import Table, join, Column
from astropy.io import fits

***
## Catalogues
***

### Loading the original catalogues

Set the path where the data can be found

In [2]:
data_path = 'data'

In [3]:
# Creating a function to read the fits catalogues
def read_fits(file):
    'converts a fits file to an astropy table'
    data_file = os.path.join(data_path, file)
    with fits.open(data_file) as cat:
        table = Table(cat[1].data)
        return table

Reading the catalogues

In [4]:
# Raw PyBDSF catalogues
pybdsf = read_fits('LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits')
gaussians = read_fits('LOFAR_HBA_T1_DR1_catalog_v0.99.gaus.fits')
# V1.2 LoTSS DR1 catalogues
optical = read_fits('LOFAR_HBA_T1_DR1_merge_ID_optical_f_v1.2.fits')
components = read_fits('LOFAR_HBA_T1_DR1_merge_ID_v1.2.comp.fits')
# Artifacts catalogue
artifacts = read_fits('LOFAR_HBA_T1_DR1_merge_ID_v1.1.art.fits')

### Cleaning the original catalogues

####  Renaming Sources

Renaming a source that was picked on 2 different mosaics (it has the same `Source_Name`) and which was classified as an artifact in one mosaic and as a source in other. The source is renamed **replacing the last character of the original name by 'B'**. This is done on both **Artifacts catalogue** and **PyBDSF raw catalogue** (the source appears twice on the PyBDSF catalogue).

In [5]:
source_names, source_times = np.unique(pybdsf['Source_Name'], return_counts=True)

In [6]:
# Taking the name of the source (duplicated entry) from the pybdsf catalogue
source_names[source_times != 1]

0
ILTJ132633.10+484745.7


##### Renaming on the Artifact catalogue 

Create a copy of the catalogue

In [7]:
# Creating a copy for the new artifact catalogue
artifacts_cleaned = artifacts.copy()

We look for the source name and modify the last character by 'B'

In [8]:
source_name = source_names[source_times != 1][0]
new_source_name = source_name[0:-1]+'B'
#new_source_name = source_name+'B'
new_source_name

'ILTJ132633.10+484745.B'

In [9]:
# Replacing the original name of the artifact on the artifacts catalogue
artifacts_cleaned['Source_Name'][artifacts_cleaned['Source_Name'] == source_name] = new_source_name

##### Renaming on the Pybdsf catalogue

Create a copy of the catalogue

In [10]:
# Creating a copy for the new pybdsf catalogue
pybdsf_cleaned = pybdsf.copy()

In [11]:
# Taking the mosaic where the 'true source' is
true_source_mosaic = components[components['Source_Name'] == source_name]['Mosaic_ID'][0]

In [12]:
# Replacing the original name of the artifact on the pybdsf catalogue
# Takies just the 8 first characters to match with the pybdsf Mosaic_ID dtype
pybdsf_cleaned['Source_Name'][(pybdsf_cleaned['Source_Name'] == source_name) &
                               (pybdsf_cleaned['Mosaic_ID'] != true_source_mosaic[0:len(pybdsf['Mosaic_ID'][0])])]\
                                = new_source_name

##### Renaming on the gaussian catalogue

In [13]:
gaussians_cleaned = gaussians.copy()

In [14]:
gaussians_cleaned['Source_Name'][(gaussians_cleaned['Source_Name'] == source_name) &
               (gaussians_cleaned['Mosaic_ID'] != true_source_mosaic)] = new_source_name

#### Eliminating duplicated entries

`Source_Name` entries are duplicated in LoTSS DR1 **Components catalogue** (9 equal `Souce_Name` components). These are entries where `Source_Name = Component_Name` (which is equal to pybdsf name).

In [15]:
len(components)

324623

In [16]:
component_names, component_count = np.unique(components['Component_Name'], return_counts = True)
component_names[component_count > 1]

0
ILTJ115037.81+465929.4


In [17]:
component_names, component_index = np.unique(components['Component_Name'], return_index = True)

In [18]:
# Eliminating these from the components catalogue (and just keeping the first entry)
clean_comp = components[component_index]

In [19]:
len(clean_comp)

324615

#### Eliminating a component on the components catalogue that is an artifact

In [20]:
len(clean_comp)

324615

In [21]:
# This is an artifact from the artifact catalogue that was not removed from the components catalogue
component_artifact = artifacts_cleaned[np.isin(artifacts_cleaned['Source_Name'], clean_comp['Component_Name'])]
component_artifact['Source_Name']

0
ILTJ115320.73+552641.4


In [22]:
components_cleaned = clean_comp[~np.isin(clean_comp['Component_Name'], artifacts_cleaned['Source_Name'])]

### Output the cleaned catalogues

The cleaned tables will be written to the data area to allow their posterior use.

In [23]:
artifacts_cleaned.write("artifacts.fits", overwrite = True)
pybdsf_cleaned.write("pybdsf.fits", overwrite = True)
components_cleaned.write("components.fits", overwrite = True)
gaussians_cleaned.write("gaussians.fits", overwrite = True)
optical.write("optical.fits", overwrite = True)