# LoTSS DR1
## PyBDSF - Optical associations over the HETDEX field

***

This notebook outputs a table with the association between the raw PyBDSF catalogue and the final optical catalogue. 
It includes:
* The [Original catalogues](#Loading-the-original-catalogues) 
* The [Cleaned catalogues](#Cleaning-the-original-catalogues)
* The [Ouput catalogue](#Output-table)

The output catalogue is a table which consists of:
* `pybdsf_name` : PyBDSF raw catalogue ` Source_Name `
* `association_name` : optical v1.2 catalogue `Source_Name`
* `flag`:  
    * [Multi-component PyBDSF sources](#Multi-component-PyBDSF-sources) (`flag=8`)
    * [Deblended sources](#Deblended-sources) (`flag=4`)
    * [Single sources](#Single-sources) (`flag=1`)
    * [Artifacts](#Artifacts) (`flag=16` and `flag=32`)
    * or a combination of categories: 
       * [Deblended + multi-component PyBDSF sources](#Sources-that-were-both-deblended-and-have-multi-PyBDSF-components) (`flag=12`)
       * [Sources that were not deblended](#Sources-that-were-not-deblended) (`flag=2`) 
           * [and have multiple PyBDSF components](#MULTIPLE-PyBDSF-COMPONENTS) (`flag=10`)
           * [and are singles](#SINGLES) (`flag=3`)

** Libraries ** 

In [1]:
import pandas as pd
import numpy as np
from astropy.table import Table, join
import unittest
from __future__ import print_function

***
## Catalogues
***

### Loading the original catalogues

In [2]:
# Creating a function to read the fits catalogues
def read_fits(file):
    'converts a fits table to pandas format'
    cat = Table.read(file)
    return cat.to_pandas()

In [77]:
# Raw PyBDSF catalogue
pybdsf = read_fits('../data/LOFAR_HBA_T1_DR1_catalog_v0.9.srl.fixed.fits')
# V1.2 LoTSS DR1 catalogues
optical = read_fits('../data/v1.2/LOFAR_HBA_T1_DR1_merge_ID_optical_f_v1.2.fits')
components = read_fits('../data/v1.2/LOFAR_HBA_T1_DR1_merge_ID_v1.2.comp.fits')
# Artifacts catalogue
artifacts = read_fits('../data/LOFAR_HBA_T1_DR1_merge_ID_v1.1.art.fits')
# Gaussian catalogue
gauss = read_fits('../data/LOFAR_HBA_T1_DR1_catalog_v0.99.gaus.fits')

### Cleaning the original catalogues

####  Renaming Sources

Renaming a source that was picked on 2 different mosaics (it has the same `Source_Name`) and which was classified as an artifact in one mosaic and as a source in other. The source is renamed with the original name followed by 'B'. This is done on both **Artifacts catalogue** and **PyBDSF raw catalogue** (the source appears twice on the PyBDSF catalogue).

In [78]:
# Taking the name of the source (duplicated entry) from the pybdsf catalogue
pybdsf_duplicated = np.array(pybdsf[pybdsf.duplicated('Source_Name')]['Source_Name'])[0]
pybdsf_duplicated

'ILTJ132633.10+484745.7'

##### Renaming on the Artifact catalogue 

In [79]:
# Creating a copy for the new artifact catalogue
artifacts_cleaned = artifacts.copy()
# Replacing the original name by 'nameB'
clean_art = artifacts_cleaned['Source_Name'].replace({pybdsf_duplicated:pybdsf_duplicated+'B'})
# Updating the new catalogue
artifacts_cleaned.update(clean_art)

##### Renaming on the Pybdsf catalogue

In [80]:
# Creating a copy for the new pybdsf catalogue
pybdsf_cleaned = pybdsf.copy()
# Replacing the original name of the artifact on pybdsf catalogue
clean_pybdsf = pybdsf[pybdsf.duplicated('Source_Name', keep = 'last')].\
        replace({pybdsf_duplicated:pybdsf_duplicated+'B'})
# Updating the new catalogue
pybdsf_cleaned.update(clean_pybdsf)

##### Renaming on the Gaussian catalogue (even though this catalogue is not used here)

In [82]:
# Creating a copy for the new gaussian catalogue
gauss_cleaned = gauss.copy()
# Replacing the original name of the artifact on pybdsf catalogue
clean_gauss = gauss[gauss.duplicated('Source_Name', keep = 'last')].\
        replace({pybdsf_duplicated:pybdsf_duplicated+'B'})
# Updating the new catalogue
gauss_cleaned.update(clean_gauss)

#### Eliminating duplicated entries

`Source_Name` entries are duplicated in LoTSS DR1 ** Components catalogue ** (9 equal `Souce_Name` components). These are entries where `Source_Name = Component_Name` (which is equal to pybdsf name).

In [7]:
# Eliminating these from the components catalogue (and just keeping the first entry)
clean_comp = components.drop_duplicates('Component_Name', keep = 'first')
len(clean_comp)

324615

#### Eliminating a component on the components catalogue that is an artifact

In [8]:
# This is an artifact from the artifact catalogue that was not removed from the components catalogue
component_artifact = artifacts_cleaned[artifacts_cleaned['Source_Name'].
                                       isin(clean_comp['Component_Name'])]
component_artifact['Source_Name']

728    ILTJ115320.73+552641.4
Name: Source_Name, dtype: object

In [9]:
components_cleaned = clean_comp[~clean_comp['Component_Name'].
                                isin(component_artifact['Source_Name'])]
len(components_cleaned)

324614

***
## Diagnosis output table outline
***

### Creating a diagnosis output table outline

In [10]:
col_names =  ['pybdsf_name', 'association_name', 'flag']
output_df  = pd.DataFrame(columns = col_names)
output_df

Unnamed: 0,pybdsf_name,association_name,flag


### Creating empty lists to store the different PyBDSF association groups 

In [11]:
pybdsf_multi_df = []
pybdsf_deblended_df = []
pybdsf_singles_df = []
pybdsf_artifacts_df = []

### Creating a unit test

This test is used, throughout the notebook, to check if the different association groups are being correctly appended to the output table.

In [12]:
#Creating a function to be tested
def len_output_df(artifacts, singles, deblends, multiples):
    return len(artifacts)+ len(singles)+len(deblends)+len(multiples)

In [13]:
# Create a test case
class Test_output(unittest.TestCase):
    # Create the unit test
    def test_len_output_df(self):
        # Test if length of output table is the expected one
        self.assertEqual(len(output_df), len_output_df(pybdsf_artifacts_df,
                                                       pybdsf_deblended_df,
                                                       pybdsf_multi_df,
                                                       pybdsf_singles_df))

In [14]:
# Run the unit test
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.004s

OK


<unittest.main.TestProgram at 0x1150a5550>

***
## Multi-component PyBDSF sources
***

**Sources where different PyBDSF sources were associated (but have not been through deblendeding)**

In [15]:
# Creating a condition to select the sources that are made up of more than one component
# It returns the names of the sources 
cond_multi = pd.DataFrame([components_cleaned['Source_Name'].value_counts() > 1][0])

In [16]:
# Taking the source names that meet the condition (names of the sources correspond to the indexes)
multi_names = pd.DataFrame(cond_multi[cond_multi['Source_Name']].index.values)
len(multi_names)

3774

In [17]:
# Taking the components that make up these source names
multi_components = components_cleaned[components_cleaned['Source_Name'].isin(multi_names[0])]
len(multi_components)

9868

#### Selecting the sources

In [18]:
# Selecting from Deblended_from empty field
multi_not_deblended = multi_components[multi_components['Deblended_from'] == '']

In [19]:
# Number of components
len(multi_not_deblended)

9007

In [20]:
# Which is equal to the number of components = number of pybdsf names (1-1 relation)
len(multi_not_deblended.groupby('Component_Name'))

9007

In [21]:
# This corresponds to the number of sources
len(multi_not_deblended.groupby('Source_Name'))

3565

#### Creating a multi-component sources table for the NON-Deblends

In [22]:
# Multi-component sources table just for the ones that were not deblended
pybdsf_multi_df = pd.DataFrame({
    'pybdsf_name':multi_not_deblended['Component_Name'],
    'flag': 8,
    'association_name': multi_not_deblended['Source_Name']})

In [23]:
pybdsf_multi_df['association_name'].describe()

count                       9007
unique                      3565
top       ILTJ140313.38+542018.9
freq                          35
Name: association_name, dtype: object

#### Updating the output table

In [24]:
output_df = output_df.append(pybdsf_multi_df,ignore_index=True)
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x115c313d0>

***
## Deblended sources
***


**Deblended sources are distint sources that were originally associated by PyBDSF as only one source.**

Deblended sources are selected from the **components catalogue** (which links directly to the PyBDSF catalogue through `Deblended_from` column).

#### Having a look at the number of sources

In [25]:
# Total number of PyBDSF sources that were deblended
len(components_cleaned[components_cleaned['Deblended_from'] != ''].groupby('Deblended_from'))

935

In [26]:
# Total number of components that make these PyBDSF
components_deblended = components_cleaned[components_cleaned['Deblended_from'] != '']
len(components_deblended)

2446

In [27]:
# Total number of sources after deblending
len(components_deblended.groupby('Source_Name'))

1794

#### Taking the PyBDSF Source Names

We have to search each PyBDSF in `components_deblended` and take the correct names:

In [28]:
deblended_pybdsf_names = components_deblended.drop_duplicates('Deblended_from')['Deblended_from']
len(deblended_pybdsf_names)

935

This has to be done this way because 1 PyBDSF source can have been deblended into 2 (or more) sources directly; or deblended and associated to other source. For that reason after droping duplicated association names (`Source_Name` below), we miss some of the original PyBDSF names:

In [29]:
len((components_deblended.drop_duplicates('Source_Name')).groupby('Deblended_from'))

908

#### Creating the deblended table

In [30]:
pybdsf_deblended_df = pd.DataFrame(columns = col_names)
for i in range(0,len(deblended_pybdsf_names)):
    corresp = components_deblended[components_deblended['Deblended_from'] == deblended_pybdsf_names.
                                   iloc[i]][['Deblended_from','Source_Name']]
    pybdsf_deblended = pd.DataFrame(
        {'pybdsf_name':deblended_pybdsf_names.iloc[i],
        'flag':4,
        'association_name': corresp.drop_duplicates()['Source_Name']})
    pybdsf_deblended_df = pybdsf_deblended_df.append(pybdsf_deblended)

This table is no longer a 1-1 relationship:

In [31]:
pybdsf_deblended_df.describe()

Unnamed: 0,association_name,flag,pybdsf_name
count,1832,1832,1832
unique,1794,1,935
top,ILTJ141420.38+462620.3,4,ILTJ112051.72+474517.1
freq,4,1832,3


### Sources that were both deblended and have multi PyBDSF components
***

Some of the deblended sources were also associated with other PyBDSF sources.

In [32]:
# Taking the deblended sources that appear more than once on components deblended
cond_deblended_multi = pd.DataFrame([components_deblended['Source_Name'].value_counts() > 1][0])

In [33]:
# Taking the names for these sources (i.e. optical source names in the components catalogue)
deblended_multi_names = pd.DataFrame(cond_deblended_multi[cond_deblended_multi['Source_Name']].index.values)

In [34]:
# Selecting the components that make up these sources
multi_and_deblended = multi_components[multi_components['Deblended_from'] != '']

#### Creating a multi-component sources table for the deblended sources

In [35]:
pybdsf_multi_deblended_df = pd.DataFrame(columns = col_names)
for i in range(0,len(deblended_multi_names)):
    sources_md = multi_and_deblended[multi_and_deblended['Source_Name'] 
                                     == deblended_multi_names[0][i]]
    pybdfs_md = sources_md['Deblended_from'].unique()
    if len(pybdfs_md) > 1 == True:
        sources_md_names = pybdsf_deblended_df[pybdsf_deblended_df['association_name'].
                                               isin(sources_md['Source_Name'])]['association_name']
        for j in range(0,len(pybdfs_md)):
            pybdsf_multi_deblended = pd.DataFrame(
                {'pybdsf_name':pybdfs_md[j],
                 'flag':12,
                 'association_name': sources_md_names}, index = [sources_md_names.index.values[j]])
            pybdsf_multi_deblended_df = pybdsf_multi_deblended_df.append(pybdsf_multi_deblended)

In [36]:
len(pybdsf_multi_deblended_df)

69

In [37]:
# Updating the deblended table
pybdsf_deblended_df.update(pybdsf_multi_deblended_df)

### Sources that were not deblended
***

PyBDSF sources that have `deblended_from` but just have an optical source correspondence

** Selecting these sources and giving them a flag 2 ** 

Note that these include singles (flag=1) and PyBDSF sources that were grouped with other PyBDSFs (flag=8)

In [38]:
deblended_names = components_cleaned[components_cleaned['Deblended_from'] != '']['Deblended_from'].unique()
pybdsf_fcd_df = pd.DataFrame(columns = col_names)
for i in range(0,len(deblended_names)):  #for i in range(0,len(deblended_names)): 
    components_group = components_cleaned[components_cleaned['Deblended_from'] == deblended_names[i]]
    #print (components_group['Source_Name'],  len(components_group.groupby('Source_Name')) < 2 )
    if len(components_group.groupby('Source_Name')) < 2:
        pybdsf_fcd = pd.DataFrame(
                {'pybdsf_name':components_group['Deblended_from'],
                 'flag':2,
                 'association_name': components_group['Source_Name']})
        pybdsf_fcd_df = pybdsf_fcd_df.append(pybdsf_fcd)

In [39]:
len(pybdsf_fcd_df) 

74

### MULTIPLE PyBDSF COMPONENTS

In [40]:
deblended_multi = pybdsf_deblended_df[
                (pybdsf_deblended_df['pybdsf_name'].isin(pybdsf_fcd_df['pybdsf_name'])) &
                (pybdsf_deblended_df['flag'] == 12)]
len(deblended_multi)

39

In [41]:
# Creating a table for these multiples with flag = 10 (flag = 2 + flag = 8)
deblended_multi_df = deblended_multi.copy()
deblended_multi_df['flag'] = 10

In [42]:
# Updating the deblended table
pybdsf_deblended_df.update(deblended_multi_df)
pybdsf_deblended_df['flag'].value_counts()

4.0     1763
10.0      39
12.0      30
Name: flag, dtype: int64

### SINGLES

In [43]:
deblended_singles = pybdsf_deblended_df[
                (pybdsf_deblended_df['pybdsf_name'].isin(pybdsf_fcd_df['pybdsf_name'])) &
                (pybdsf_deblended_df['flag'] == 4)]
len(deblended_singles)

13

In [44]:
# Creating a table for these singles with flag = 3 (flag = 2 + flag = 1)
deblended_singles_df = deblended_singles.copy()
deblended_singles_df['flag'] = 3

In [45]:
# Updating the deblended table
pybdsf_deblended_df.update(deblended_singles_df)
pybdsf_deblended_df['flag'].value_counts()

4.0     1750
10.0      39
12.0      30
3.0       13
Name: flag, dtype: int64

***

### Appending to the output table

In [46]:
output_df = output_df.append(pybdsf_deblended_df,ignore_index=True) 
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x114bf4f90>

***
## Single sources
***

Single sources are PyBDSF sources that have not been deblended, grouped with other PyBDSFs and are not artifacts. They have a unique correspondence between the PyBDFS raw catalogue and the final optical catalogue. 

### Selecting all the sources that have a unique PyBDSF - Optical source name correspondence

These sources can be selected using the condition defined for multiple sources, using the number of times sources (`Source_Name`) appear on the **components catalogue**.

In [47]:
# Condition to select the sources that are NOT made up of multi components
cond_single = ~cond_multi

In [48]:
# taking the names of these sources
single_names = pd.DataFrame(cond_single[cond_single['Source_Name']].index.values)
len(single_names)

314746

### Selecting the ones that did not went through deblending 

Note that some of these went through a deblending process (i.e. after deblending, one of the sources got the original PyBDSF name and other got a different name).

In [49]:
# Taking the components name for the sources that did not went through deblending
single_components = components_cleaned[(components_cleaned['Source_Name'].isin(single_names[0])) &\
                           (components_cleaned['Deblended_from'] == '')]
len(single_components)

313161

In [50]:
# Confirming
len(single_components.groupby('Component_Name')), len(single_components.groupby('Source_Name'))

(313161, 313161)

### Creating a single sources table

In [51]:
pybdsf_singles_df = pd.DataFrame({
    'pybdsf_name':single_components['Component_Name'],
    'flag':1,
    'association_name': single_components['Source_Name']})

### Updating the output table

In [52]:
output_df = output_df.append(pybdsf_singles_df,ignore_index=True) 
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x11dff9fd0>

***
## Artifacts
***

### Artifacts from the (cleaned) artifact catalogue

In [53]:
pybdsf_artifacts = pybdsf_cleaned[pybdsf_cleaned['Source_Name'].
                    isin(artifacts_cleaned['Source_Name'])]
len(pybdsf_artifacts)

2543

Creating a diagnosis artifact table

In [54]:
pybdsf_artifacts_df = pd.DataFrame({
    'pybdsf_name':pybdsf_artifacts['Source_Name'],
    'flag':16,
    'association_name': 'NULL'})

**Appending to the output table**

In [55]:
output_df = output_df.append(pybdsf_artifacts_df, ignore_index=True)
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x11dff9f90>

### Sources missing from the optical catalogue are now being classified as artifacts

PyBDSF raw catalogue sources that are not on the output table and not anywhere are now being classified as artifacts

In [56]:
artifacts_not_anywhere = pybdsf_cleaned[~pybdsf_cleaned['Source_Name'].isin(output_df['pybdsf_name'])]
print (len(artifacts_not_anywhere))

48


In [57]:
artifacts_not_anywhere_df = pd.DataFrame({
    'pybdsf_name': artifacts_not_anywhere['Source_Name'],
    'flag': 32,
    'association_name': 'NULL'})

Appending the new artifacts the pybdsf artifact table

In [58]:
pybdsf_artifacts_df = pybdsf_artifacts_df.append(artifacts_not_anywhere_df,ignore_index=True)
len(pybdsf_artifacts_df)

2591

### Updating the output table

In [59]:
output_df = output_df.append(artifacts_not_anywhere_df,ignore_index=True)
unittest.main(argv=['ignored', '-v'], exit=False)

test_len_output_df (__main__.Test_output) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.002s

OK


<unittest.main.TestProgram at 0x11dff9510>

***
## Output table
***

In [60]:
len(output_df), len(output_df.groupby('pybdsf_name')), len(output_df.groupby('association_name'))

(326591, 325694, 318521)

In [61]:
print (output_df['flag'].value_counts())

1.0     313161
8.0       9007
16.0      2543
4.0       1750
32.0        48
10.0        39
12.0        30
3.0         13
Name: flag, dtype: int64


** Creating an astropy table **

In [62]:
output_df['flag'] = pd.to_numeric(output_df['flag'])

In [63]:
output_table = Table.from_pandas(output_df)

** Writing to a fits file **

In [64]:
output_table.write('output_table.fits', overwrite = True)

In [65]:
# Can also be saved in as a csv file directly from the pandas table
# output_df.to_csv('output_table.csv', index=False)

***
### Output table summary
***

In [66]:
flag = 1,2,3,4,8,10, 12,16,32
for i in flag:
    print ('flag:', i)
    print ('nr of pybdsf:', len(output_df[output_df['flag'] == i].groupby('pybdsf_name')))
    print ('nr of optical:', len(output_df[output_df['flag'] == i].groupby('association_name')))
    print ('nr of entries:', len(output_df[output_df['flag'] == i]))
    print ('--------')

flag: 1
nr of pybdsf: 313161
nr of optical: 313161
nr of entries: 313161
--------
flag: 2
nr of pybdsf: 0
nr of optical: 0
nr of entries: 0
--------
flag: 3
nr of pybdsf: 13
nr of optical: 13
nr of entries: 13
--------
flag: 4
nr of pybdsf: 880
nr of optical: 1750
nr of entries: 1750
--------
flag: 8
nr of pybdsf: 9007
nr of optical: 3565
nr of entries: 9007
--------
flag: 10
nr of pybdsf: 39
nr of optical: 31
nr of entries: 39
--------
flag: 12
nr of pybdsf: 26
nr of optical: 30
nr of entries: 30
--------
flag: 16
nr of pybdsf: 2543
nr of optical: 1
nr of entries: 2543
--------
flag: 32
nr of pybdsf: 48
nr of optical: 1
nr of entries: 48
--------


### Notes about multi component PyBDSF sources - Flag 8 

There are 9007 PyBDSF sources that make up 3565 optical sources.

In [67]:
multi_big = (output_df[output_df['flag'] == 8].groupby('association_name').count() > 2)
print ('Nr of optical sources:') 
print('- made up 2 PyBDSF sources:',len(multi_big[~multi_big['flag']]),
      '; ( nr components:', 2*len(multi_big[~multi_big['flag']]),')')
print('- made up of more than 2 PyBDSF sources:',len(multi_big[multi_big['flag']]),
      '; ( nr components:', output_df[output_df['flag'] == 8]['association_name'].value_counts()
      [0:len(multi_big[multi_big ['flag']])].sum() ,')')

Nr of optical sources:
- made up 2 PyBDSF sources: 2489 ; ( nr components: 4978 )
- made up of more than 2 PyBDSF sources: 1076 ; ( nr components: 4029 )


In [68]:
# 2489 optical sources are made up of 2 PyBDSF sources
# but 1076 optical sources are made up of more than 2 PyBDSF sources,
# and this can go up to 35 PyBDSFs
output_df[output_df['flag'] == 8]['association_name'].value_counts().head()

ILTJ140313.38+542018.9    35
ILTJ121903.03+471642.8    33
ILTJ140923.52+545712.7    12
ILTJ120304.85+513958.6    11
ILTJ122155.05+550313.5     9
Name: association_name, dtype: int64

### Notes about deblended sources - Flag 4 and 12

Most of the PyBDSF were deblended into 2 optical sources. However, some were deblended into 3:

In [69]:
for i in (4,12):
    print ('flag:',i)
    debl = (output_df[output_df['flag'] == i].groupby('pybdsf_name').count() > 2)
    print (len(debl[debl['flag']]), 'PyBDSF(s) was/were deblended into 3 optical sources')
    print (output_df[output_df['flag'] == i]['pybdsf_name'].value_counts().head(3))
    print ('------')

flag: 4
12 PyBDSF(s) was/were deblended into 3 optical sources
ILTJ124647.79+483743.9    3
ILTJ122323.68+463621.3    3
ILTJ135431.27+472516.8    3
Name: pybdsf_name, dtype: int64
------
flag: 12
1 PyBDSF(s) was/were deblended into 3 optical sources
ILTJ111517.03+531906.0    3
ILTJ133014.01+490807.7    2
ILTJ130120.61+521622.3    2
Name: pybdsf_name, dtype: int64
------


However some PyBDSF sources can have both flags = 4 and flag = 12. This happens for 23 sources:

In [70]:
flag_4_12 = output_df[output_df['flag'] == 4]['pybdsf_name'].\
              isin(output_df[output_df['flag'] == 12]['pybdsf_name'])
flag_4_12_index = output_df.iloc[flag_4_12[flag_4_12].index]
print (len(flag_4_12_index.groupby('pybdsf_name')))

23


The total number of PyBDSFs that were deblended:

In [71]:
print (len(output_df[(output_df['flag'] == 4) | (output_df['flag'] == 12)].groupby('pybdsf_name')))
# Which corresponds to the number of individual group flags minus the common PyBDSFs
len(output_df[output_df['flag'] == 4].groupby('pybdsf_name')) + \
len(output_df[output_df['flag'] == 12].groupby('pybdsf_name')) - (len(flag_4_12_index.groupby('pybdsf_name')))

883


883

These correspond to a total number of optical sources:

In [72]:
print(len(output_df[(output_df['flag'] == 4) | (output_df['flag'] == 12)].groupby('association_name')))
# Note that there are no repeated optical names...
len(output_df[output_df['flag'] == 4].groupby('association_name')) + \
len(output_df[output_df['flag'] == 12].groupby('association_name'))

1780


1780

### Notes about sources that were not deblended - flag 3 and 10

MULTIPLES

In [73]:
print(len(output_df[(output_df['flag'] == 8) | (output_df['flag'] == 10)].groupby('association_name')))
# Note that there are no repeated optical names...
len(output_df[output_df['flag'] == 8].groupby('association_name')) + \
len(output_df[output_df['flag'] == 10].groupby('association_name'))

3596


3596

Some sources that were flagged with 3 have more than one component:

In [74]:
components_cleaned[components_cleaned['Deblended_from'].
                   isin(output_df[output_df['flag'] == 3]['pybdsf_name'])]['Source_Name'].value_counts()

ILTJ130420.89+502345.4    7
ILTJ111744.86+484556.8    3
ILTJ114328.51+474146.2    2
ILTJ114330.16+474155.0    1
ILTJ131900.29+535256.2    1
ILTJ141528.59+490824.6    1
ILTJ124615.48+503126.5    1
ILTJ141529.30+490836.6    1
ILTJ124617.13+503126.5    1
ILTJ111744.40+484605.9    1
ILTJ130419.85+502412.0    1
ILTJ110953.70+482721.1    1
ILTJ131902.56+535300.1    1
Name: Source_Name, dtype: int64

## EXPORT CLEANED CATALOGUES

In [76]:
pybdsf_cleaned_cat = Table.from_pandas(pybdsf_cleaned)
pybdsf_cleaned_cat.write('pybdsf_cleaned.fits', overwrite = True)

In [83]:
gauss_cleaned_cat = Table.from_pandas(gauss_cleaned)
gauss_cleaned_cat.write('gauss_cleaned.fits', overwrite = True)