# Week 2: A naive cross-matcher

When investigating astronomical objects, like active galactic nuclei (AGN), astronomers compare data about those objects from different telescopes at different wavelengths.

This requires positional cross-matching to find the closest counterpart within a given radius on the sky.

In this activity you'll cross-match two catalogues: one from a radio survey, the AT20G Bright Source Sample (BSS) catalogue and one from an optical survey, the SuperCOSMOS all-sky galaxy catalogue.

The BSS catalogue lists the brightest sources from the AT20G radio survey while the SuperCOSMOS catalogue lists galaxies observed by visible light surveys. If we can find an optical match for our radio source, we are one step closer to working out what kind of object it is, e.g. a galaxy in the local Universe or a distant quasar.

We've chosen one small catalogue (BSS has only 320 objects) and one large one (SuperCOSMOS has about 240 million) to demonstrate the issues you can encounter when implementing cross-matching algorithms.

The positions of stars, galaxies and other astronomical objects are usually recorded in either equatorial or Galactic coordinates.

Equatorial coordinates are fixed relative to the celestial sphere, so the positions are independent of when or where the observations took place. They are defined relative to the celestial equator (which is in the same plane as the Earth's equator) and the ecliptic (the path the sun traces throughout the year).

A point on the celestial sphere is given by two coordinates:

Right ascension: the angle from the vernal equinox to the point, going east along the celestial equator;
Declination: the angle from the celestial equator to the point, going north (negative values indicate going south).
The vernal equinox is the intersection of the celestial equator and the ecliptic where the ecliptic rises above the celestial equator going further east.

The coordinates of stars in the sky will change slightly over the years due to the slow wobble of Earth's axis. Therefore, it is important to specify the epoch or time period which we are using as a reference for the celestial coordinate system.

In [10]:
import numpy as np

## Conversion functions (HMS and DMS to decimal degrees)

In [8]:
# Write your hms2dec and dms2dec functions here
def sign(x): 
    return x/abs(x)

def dms2dec(hours, arcminutes, arcseconds):
    return (abs(hours)+arcminutes/60+arcseconds/(60*60))*sign(hours)

def hms2dec(hours, minutes, seconds):
    return 15*(hours+minutes/60+seconds/(60*60))

# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.
if __name__ == '__main__':
  # The first example from the question
  print(hms2dec(23, 12, 6))

  # The second example from the question
  print(dms2dec(22, 57, 18))

  # The third example from the question
  print(dms2dec(-66, 5, 5.1))

348.025
22.955
-66.08475


In [9]:
print(hms2dec(23, 12, 6))

348.025


## Angular distance

In [29]:
# Write your angular_dist function here.
def angular_dist(a_deg1, d_deg1, a_deg2, d_deg2):
    
    a_rad1, d_rad1, a_rad2, d_rad2 = np.radians([a_deg1, d_deg1, a_deg2, d_deg2])

    a = np.sin(abs(d_rad1-d_rad2)/2)**2
    b = np.cos(d_rad1)*np.cos(d_rad2)*np.sin(np.abs(a_rad1 - a_rad2)/2)**2
    d = 2*np.arcsin(np.sqrt(a + b))
    
    return np.degrees(d)

# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.
if __name__ == '__main__':
  # Run your function with the first example in the question.
  print(angular_dist(21.07, 0.1, 21.15, 8.2))

  # Run your function with the second example in the question
  print(angular_dist(10.3, -3, 24.3, -29))


8.100392318146504
29.208498180546595


## Import datasets

In [67]:
# Write your import_bss function here.

def import_bss():
    cat = np.loadtxt('bss.dat', usecols=range(1, 7))
    tuple_list = []
    for i, star in enumerate(cat):
        tuple_list.append(
            (i+1, hms2dec(star[0], star[1], star[2]), dms2dec(star[3], star[4], star[5]))
        )
    return tuple_list

def import_super():
    cat = np.loadtxt('super.csv', delimiter=',', skiprows=1, usecols=[0, 1])
    tuple_list = []
    for i, star in enumerate(cat):
        tuple_list.append(
            (i+1, star[0], star[1])
        )
    return tuple_list

# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.
# if __name__ == '__main__':
#   # Output of the import_bss and import_super functions
bss_cat = import_bss()
super_cat = import_super()
print(bss_cat)
print(super_cat)

[(1, 1.1485416666666666, -47.60530555555556), (2, 2.6496666666666666, -30.463416666666667), (3, 2.7552916666666665, -26.209194444444442)]
[(1, 1.0583407, -52.9162402), (2, 2.6084425, -41.5005753), (3, 2.7302499, -27.706955)]


## Find_closest function

Write a find_closest function that takes a catalogue and the position of a target source (a right ascension and declination) and finds the closest match for the target source in the catalogue.

Your function should return the ID of the closest object and the distance to that object.

The right ascension and declination are in degrees. The catalogue list has been loaded by import_bss from the previous question. The full 320 object BSS catalogue is contained in bss.dat for you to test your code on.

In [41]:
# Write your find_closest function here

def import_bss():
    cat = np.loadtxt('bss_full.dat', usecols=range(1, 7))
    tuple_list = []
    for i, star in enumerate(cat):
        tuple_list.append(
            (i+1, hms2dec(star[0], star[1], star[2]), dms2dec(star[3], star[4], star[5]))
        )
    return tuple_list

def find_closest(cat, ra_source, dec_source):
    min_dist = 999999999
    coordinates = []
    for i, star in enumerate(cat):
        found_dist = angular_dist(ra_source, dec_source, star[1], star[2])
        if found_dist <= min_dist:
            min_dist = found_dist
            coordinates = (star[1], star[2])
            id_ = i+1
            #print("new closest star at", coordinates, "distance", min_dist, "id", id_)
    return (id_, min_dist)
        
# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.
#if __name__ == '__main__':
cat = import_bss()

# First example from the question
print(find_closest(cat, 175.3, -32.5))

# Second example in the question
print(find_closest(cat, 32.2, 40.7))


(156, 3.7670580226469053)
(26, 57.729135775621295)


## A full cross-matching program

You now have all the tools necessary to crossmatch the BSS and SuperCOSMOS catalogues. In the next problem you'll put it all together to see how many of the bright radio sources in the BSS catalogue have a counterpart in the SuperCOSMOS catalogue. The process you should follow is:

1. Select an object from the BSS catalogue;
2. Go through all the objects in SuperCOSMOS and find the closest one to the BSS object;
3. If the objects are close enough, record the match;
4. Repeat 1-3 for all the other objects in the BSS catalogue.

In step 3, if the closest object isn't within a given distance then it's unlikely that the two objects are actually counterparts, and it's more likely that they just happen to be nearby.
The given distance you choose depends on the uncertainty of the measured object positions in each catalogue.
Although we are cross matching based solely on celestial coordinates in the following exercise, there are other properties we could consider while conducting research, such as the brightness and color of an object.

Write a crossmatch function that crossmatches two catalogues within a maximum distance. It should return a list of matches and non-matches for the first catalogue against the second.

The list of matches contains tuples of the first and second catalogue object IDs and their distance. The list of non-matches contains the unmatched object IDs from the first catalogue only. Both lists should be ordered by the first catalogue's IDs.

The BSS and SuperCOSMOS catalogues will be given as input arguments, each in the format you’ve seen previously. The maximum distance is given in decimal degrees.

In [62]:
def import_bss():
    cat = np.loadtxt('bss_medium.dat', usecols=range(1, 7))
    tuple_list = []
    for i, star in enumerate(cat):
        tuple_list.append(
            (i+1, hms2dec(star[0], star[1], star[2]), dms2dec(star[3], star[4], star[5]))
        )
    return tuple_list

def import_super():
    cat = np.loadtxt('super_full.csv', delimiter=',', skiprows=1, usecols=[0, 1])
    tuple_list = []
    for i, star in enumerate(cat):
        tuple_list.append(
            (i+1, star[0], star[1])
        )
    return tuple_list


In [66]:
# Write your crossmatch function here.

def crossmatch(bss_cat, super_cat, max_dist):
    matches, no_matches = [], []

    for bss_id, star in enumerate(bss_cat):
        ra_source, dec_source = star[1], star[2]
        super_id, dist = find_closest(super_cat, ra_source, dec_source)
        #print("Checking distance for", bss_id+1, dist <= max_dist, dist , max_dist)
        if dist <= max_dist:
            matches.append((bss_id+1, super_id, dist))
        else:
            no_matches.append(bss_id+1)
    return matches, no_matches

# You can use this to test your function.
# Any code inside this `if` statement will be ignored by the automarker.

bss_cat = import_bss()
super_cat = import_super()

# First example in the question
max_dist = 40/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))

# Second example in the question
max_dist = 5/3600
matches, no_matches = crossmatch(bss_cat, super_cat, max_dist)
print(matches[:3])
print(no_matches[:3])
print(len(no_matches))


[(1, 2, 0.00010988610938710059), (2, 4, 0.0007649845967242494), (3, 5, 0.00020863352870707666)]
[5, 6, 11]
9
[(1, 2, 0.00010988610938710059), (2, 4, 0.0007649845967242494), (3, 5, 0.00020863352870707666)]
[5, 6, 11]
40
