# Unsupervised Machine Learning for the Classification of Astrophysical X-ray Sources
###### *Víctor Samuel Pérez Díaz<sup>1</sup>, Rafael Martinez-Galarza<sup>2</sup>, Alexander Caicedo-Dorado<sup>1</sup>, Raffaele D'Abrusco<sup>2</sup>*

*1. Universidad del Rosario, 2. Center for Astrophysics | Harvard & Smithsonian*

---
#### Crossmatching

In this notebook we are going to crossmatch the data obtained from the Chandra Source Catalog to different optical and infrared catalogs in order to extract more information that could be useful for the unsupervised learning models.

In [1]:
import pandas as pd
import numpy as np

from astropy.io.votable import parse

In [2]:
def votable_to_pandas(votable_file):
    votable = parse(votable_file)
    table = votable.get_first_table().to_table(use_names_over_ids=True)
    return table.to_pandas()

##### Original
First we have our original CSC2 extracted data:

In [3]:
data = votable_to_pandas("../tools/data/corpus.vot")

In [4]:
data

Unnamed: 0,name,obsid,region_id,theta,ra,dec,significance,likelihood,src_area_b,flux_aper_b,...,ks_prob_h,ks_prob_m,ks_prob_s,kp_prob_b,kp_prob_h,kp_prob_m,kp_prob_s,gti_start,gti_stop,gti_elapse
0,2CXO J000002.9-350332,15712,4,5.606038,0.012318,-35.059068,16.266113,1506.051665,6.228827,3.589425e-13,...,0.724755,0.317516,0.423297,0.808309,0.669796,0.197852,0.691868,4.989492e+08,4.989593e+08,10068.80008
1,2CXO J000010.0-501526,11997,5,7.662707,0.041803,-50.257400,16.952246,1018.200194,8.978497,3.593006e-14,...,0.249246,0.181138,0.805236,0.614278,0.101385,0.028956,0.918830,3.991763e+08,3.992403e+08,64051.80924
2,2CXO J000019.8-245030,13394,69,14.525021,0.082814,-24.841752,10.720911,351.465473,884.616067,1.002743e-13,...,0.847319,0.495847,0.506963,0.430789,0.817322,0.353208,0.778730,4.294139e+08,4.294639e+08,50055.70038
3,2CXO J000025.4-245419,13394,41,11.145001,0.106246,-24.905300,16.716272,819.212066,39.749528,1.160466e-13,...,0.067084,0.939628,0.368044,0.924866,0.051374,0.950423,0.288221,4.294139e+08,4.294639e+08,50055.70038
4,2CXO J000027.4-500421,11997,43,5.944960,0.114303,-50.072669,26.377284,3525.375052,3.925662,1.093804e-13,...,0.969048,0.227237,0.030124,0.766089,0.932166,0.556750,0.047099,3.991763e+08,3.992403e+08,64051.80924
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37873,2CXO J235932.4+181247,14894,9,0.785980,359.885089,18.213181,14.004148,1389.658960,0.014842,6.634546e-14,...,0.583490,0.964677,0.262496,0.974790,0.494843,0.989638,0.387899,4.669117e+08,4.669368e+08,25072.80019
37874,2CXO J235937.1+623151,2810,48,5.461281,359.904697,62.530855,10.870086,630.387993,0.156711,4.395501e-14,...,0.052800,0.338171,0.093817,0.447750,0.109237,0.512777,0.100806,1.483829e+08,1.484329e+08,49963.57902
37875,2CXO J235945.8-574927,9335,10,6.673537,359.940857,-57.824258,10.922837,543.759025,6.741334,7.763868e-14,...,0.691945,0.910468,0.948922,0.882940,0.727829,0.955295,0.743982,3.536294e+08,3.536600e+08,30549.14217
37876,2CXO J235953.1-605617,11507,5,6.194053,359.971451,-60.938193,15.430025,1104.219928,,1.206115e-13,...,0.519522,0.711643,0.034164,0.683820,0.677194,0.679284,0.162685,3.841246e+08,3.841447e+08,20046.85030


##### Crossmatch with A. Rots (2020) [optixray]

We can crossmatch our extracted data with CSC2 Crossmatch by A. Rots (2020):

In [5]:
optixray = pd.read_csv("./data/optixray.csv")

In [6]:
optixray

Unnamed: 0,name,obsid,region_id,theta,ra,dec,significance,likelihood,src_area_b,flux_aper_b,...,SDSSDR15_RawPa_1-sigma,SDSSDR15_MatchMaj_1-sigma,SDSSDR15_MatchMin_1-sigma,SDSSDR15_MatchPA_1-sigma,CSC2-SDSSDR15_separation,Match_probability,Match_type,Match_grade,GroupID,GroupSize
0,2CXO J000050.0+231646,14898,117,17.207478,0.208635,23.279533,7.850166,99.575075,702.592390,1.484902e-13,...,79.543999,0.021484,1.965,90.0,4.318635,0.667585,E,D,,
1,2CXO J000134.9+233540,14898,35,6.415757,0.395704,23.594535,17.441905,1431.435046,1.410725,1.057434e-13,...,83.111000,0.042233,0.330,90.0,0.207188,0.997877,E,D,,
2,2CXO J000144.7+131150,8491,35,7.010605,0.436493,13.197253,19.424647,1827.318839,3.579951,1.183237e-13,...,79.135002,0.007416,0.310,90.0,0.308003,0.997719,E,D,1.0,2.0
3,2CXO J000144.7+131150,6978,35,7.027482,0.436493,13.197253,19.424647,1827.318839,6.827448,1.191485e-13,...,79.135002,0.007416,0.310,90.0,0.308003,0.997719,E,D,1.0,2.0
4,2CXO J000209.6+232304,14898,1,9.137592,0.540390,23.384525,33.642905,5047.514175,21.609260,4.333749e-13,...,-87.306999,0.006415,0.313,90.0,0.371891,0.996837,E,D,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4080,2CXO J235707.1+004253,4827,6,2.937520,359.279885,0.714914,12.480624,1019.782557,,1.186384e-13,...,-70.246002,0.025739,0.293,90.0,0.216290,0.998366,E,D,,
4081,2CXO J235720.1-005829,6128,24,7.189862,359.333838,-0.974877,16.170926,1115.480563,9.004014,1.294421e-13,...,132.679001,0.017784,0.309,0.0,0.338108,0.996510,E,D,,
4082,2CXO J235910.6+175846,14894,39,14.156012,359.794289,17.979611,10.657013,192.787028,40.553841,4.509162e-14,...,49.084999,0.028662,1.223,90.0,0.957006,0.969970,E,D,,
4083,2CXO J235922.6+181130,14894,6,2.163287,359.844351,18.191711,23.123118,3842.240092,0.138703,2.291188e-13,...,64.602997,0.008661,0.291,0.0,0.402991,0.996821,E,D,,


##### Extracting SDSSDR15 info with CasJobs [optixray_plus]

Performing CasJobs query in order to extract the optical data needed. Imaging data seems to match totally, one exception. Some spectro data is missing.

In [7]:
optixray_plus = pd.read_csv("./data/optixray_finally_samuelperezdi.csv")

In [8]:
optixray_plus

Unnamed: 0,name,obsid,region_id,theta,ra,dec,significance,likelihood,src_area_b,flux_aper_b,...,rerun,camcol,field,type,specobjid,class,redshift,plate,mjd,fiberid
0,2CXO J132106.3+003714,4824,5,2.119055,200.2764,0.620611,17.28654,2010.7030,,1.297852e-13,...,301,5,379,6,,,,,,
1,2CXO J144214.1+005741,3960,16,14.473190,220.5588,0.961449,10.45455,330.4539,110.202600,2.236891e-13,...,301,6,514,6,345820898969806848,QSO,0.603458,307.0,51663.0,617.0
2,2CXO J144309.2+010213,3960,58,12.804500,220.7885,1.037038,14.38368,513.7379,1072.689000,3.342333e-13,...,301,6,516,3,603485132803827712,GALAXY,0.528579,536.0,52024.0,10.0
3,2CXO J100247.0+002103,13976,50,16.001190,150.6959,0.350901,11.12087,299.4449,59.000760,5.985373e-14,...,301,4,233,6,302973209857255424,QSO,2.167938,269.0,51910.0,386.0
4,2CXO J100255.7+001840,13976,61,13.000440,150.7321,0.311191,13.56834,431.4743,33.085830,3.672572e-14,...,301,4,233,3,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4080,2CXO J233831.6+270035,4816,8,1.901208,354.6320,27.009780,15.46088,1421.1270,,4.751777e-14,...,301,3,198,6,,,,,,
4081,2CXO J224510.3+263342,3287,8,2.287855,341.2933,26.561810,11.92267,1029.2860,0.058270,1.908682e-13,...,301,6,119,3,7088696429267361792,GALAXY,0.542854,6296.0,56219.0,111.0
4082,2CXO J222653.0+255134,12249,14,1.968493,336.7212,25.859460,16.44176,1718.4050,0.000483,1.442847e-13,...,301,1,91,3,,,,,,
4083,2CXO J222631.4+255213,12249,16,4.702007,336.6310,25.870380,11.84068,752.7114,1.523609,7.837906e-14,...,301,1,107,3,,,,,,


##### Crossmatch with SDSSDR12 and 2MASS

Using TOPCAT X-Match service, we were able to perform a "Best" crossmatch between our data and SDSSDR12/2MASS. Best mode loads a new table with one row for each local row that matches a remote row, giving the closest match. Unmatched local rows are not included. 

In [18]:
csc_2mass = pd.read_csv("./data/csc_2mass.csv")
csc_sdssdr12 = pd.read_csv("./data/csc_sdssdr12.csv")

In [19]:
csc_2mass

Unnamed: 0,name,obsid,region_id,theta,ra,dec,significance,likelihood,src_area_b,flux_aper_b,...,Hmag,Kmag,e_Jmag,e_Hmag,e_Kmag,Qfl,Rfl,X,MeasureJD,angDist
0,2CXO J000002.9-350332,15712,4,5.606038,0.012318,-35.059068,16.266113,1506.051665,6.228827,3.589425e-13,...,15.357,14.502,0.111,0.127,0.092,BBA,222,0,2.451134e+06,0.551896
1,2CXO J000031.1-500914,11742,169,5.412581,0.129721,-50.153970,22.044711,909.983216,47.354806,5.579316e-14,...,13.477,13.232,0.091,0.088,0.091,EEA,222,1,2.451819e+06,0.992372
2,2CXO J000031.1-500914,11997,169,5.413978,0.129721,-50.153970,22.044711,909.983216,86.329005,4.815573e-14,...,13.477,13.232,0.091,0.088,0.091,EEA,222,1,2.451819e+06,0.992372
3,2CXO J000134.9+233540,14898,35,6.415757,0.395704,23.594535,17.441905,1431.435046,1.410725,1.057434e-13,...,15.862,15.359,0.175,0.189,0.161,CCC,222,0,2.451137e+06,0.434818
4,2CXO J000144.7+131150,8491,35,7.010605,0.436493,13.197253,19.424647,1827.318839,3.579951,1.183237e-13,...,15.619,14.986,0.142,0.147,0.128,BBB,222,0,2.451812e+06,0.638343
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14754,2CXO J235924.9-605252,11507,13,3.546899,359.853988,-60.881319,13.464421,1238.093094,,5.329261e-14,...,10.976,10.871,0.023,0.022,0.019,AAA,222,0,2.451527e+06,0.728920
14755,2CXO J235925.6-500754,11997,30,5.197079,359.857081,-50.131772,47.555634,15694.977779,,3.636116e-13,...,15.701,15.308,0.163,0.169,0.181,CCC,222,0,2.451471e+06,0.318938
14756,2CXO J235925.6-500754,9334,19,2.745264,359.857081,-50.131772,47.555634,15694.977779,0.044242,3.668330e-13,...,15.701,15.308,0.163,0.169,0.181,CCC,222,0,2.451471e+06,0.318938
14757,2CXO J235925.6-500754,11864,30,5.539386,359.857081,-50.131772,47.555634,15694.977779,0.041751,2.484504e-13,...,15.701,15.308,0.163,0.169,0.181,CCC,222,0,2.451471e+06,0.318938


In [20]:
csc_sdssdr12

Unnamed: 0,name,obsid,region_id,theta,ra,dec,significance,likelihood,src_area_b,flux_aper_b,...,avg_zph,pmRA,e_pmRA,pmDE,e_pmDE,SpObjID,spType,spCl,subClass,angDist
0,2CXO J000134.9+233540,14898,35,6.415757,0.395704,23.594535,17.441905,1431.435046,1.410725,1.057434e-13,...,0.23775,-0.2,2.8,-3.0,2.8,0,,,,0.206815
1,2CXO J000136.1+130639,6978,16,2.479215,0.400440,13.110960,25.760146,4742.017301,0.346816,1.846236e-13,...,,,,,,0,,,,0.319184
2,2CXO J000136.1+130639,8491,16,2.448114,0.400440,13.110960,25.760146,4742.017301,0.165345,8.301141e-14,...,,,,,,0,,,,0.319184
3,2CXO J000144.7+131150,8491,35,7.010605,0.436493,13.197253,19.424647,1827.318839,3.579951,1.183237e-13,...,0.21254,-12.8,2.6,0.1,2.6,0,,,,0.307729
4,2CXO J000144.7+131150,6978,35,7.027482,0.436493,13.197253,19.424647,1827.318839,6.827448,1.191485e-13,...,0.21254,-12.8,2.6,0.1,2.6,0,,,,0.307729
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6447,2CXO J235707.1+004253,4827,6,2.937520,359.279885,0.714914,12.480624,1019.782557,,1.186384e-13,...,,,,,,0,,,,0.009099
6448,2CXO J235720.1-005829,6128,24,7.189862,359.333838,-0.974877,16.170926,1115.480563,9.004014,1.294421e-13,...,,-1.7,3.3,-4.4,3.3,0,,,,0.306504
6449,2CXO J235910.6+175846,14894,39,14.156012,359.794289,17.979611,10.657013,192.787028,40.553841,4.509162e-14,...,,,,,,6950318218305794048,QSO,QSO,BROADLINE,0.956471
6450,2CXO J235922.6+181130,14894,6,2.163287,359.844351,18.191711,23.123118,3842.240092,0.138703,2.291188e-13,...,,,,,,0,,,,0.403949


Crossmatch with Astropy are harder when we don't have all the catalogs local. Using catalogs server services is preferable.