# Classification of stars, galaxies, and quasars

Using a query, you should try to obtain three data files each contains 10000 observations of stars, galaxies, and quasars, respectively, which are distinctly different objects.

As this is data resuting from a (raw) query, it has got several missing/9999 values and other flaws to consider, before applying any further data analysis. You need to fix this!

Thus, this exercise first consists of **Inspecting and Cleaning the data**.

### Features:
The data features / input variables (X) are:
* ID:   unique object ID in the database
* ra:   right ascension (coordinate)
* dec:  declination (coordinate)
* istar:        log-likelihood that the object is point-like, given by the pipeline run on the images
* gmag:  magnitude in g-band
* rmag: magnitude in r-band
* imag: magnitude in i-band
* zmag: magnitude in z-band
* W1:   magnitude in W1-band (from AllWISE)
* W2:   magnitude in W2-band (from AllWISE)
* psfgmag:      PSF magnitude in g-band (i.e. the best-fit magnitude of a point-like object fit to the pixel data)
* psfrmag:      PSF magnitude in r-band (i.e. the best-fit magnitude of a point-like object fit to the pixel data)
* psfimag:      PSF magnitude in i-band (i.e. the best-fit magnitude of a point-like object fit to the pixel data)
* W3:   magnitude in W3-band (from AllWISE)
* W3err:        uncertainty on magnitude in W1-band (from AllWISE)
* J:    magnitude in J-band (from 2MASS, in AllWISE)
* Jerr: uncertainty on J
* H:    magnitude in J-band (from 2MASS, in AllWISE)
* Herr: uncertainty on H
* K:    magnitude in J-band (from 2MASS, in AllWISE)
* Kerr: uncertainty on K
* umag: magnitude in u-band
* zs: "true" redshift

Make sure that you shortly think about (and discuss) which of these features should be included, if you want to try to identify which type of object it is.

Also, this time there is no target value (Y) given in the data. However, given the query selection (by other means) to be stars, galaxies, and quasars, you can consider the file type to be the target. But you need to put these three files together and add a column with the target (i.e. file origin) value.


### Task:
Thus, the task before you is to:<br>
1) Make three queries, which produces three files of data containing stars, galaxies, and quasars.<br>
2) Combine the three data files into one, which has a target value corresponding to the file type.<br>
3) Read and inspect this data, and make sure that you understand what it (roughly) looks like.<br>
4) Clean/cut (or impute) the data, such that different (unsupervised) analysis techniques will work.<br>
5) Run a (k)PCA (and later other techniques) on it, and see what the resulting distributions looks like.<br>

Do you in the end manage to get e.g. get three well separated classes out?<br>

***

* Author: Troels C. Petersen (NBI)
* Email:  petersen@nbi.dk
* Date:   3rd of May 2021

In [2]:
from __future__ import print_function, division   # Ensures Python3 printing & division standard
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

In [32]:
from functools import reduce

allfiles = ['Data_Galaxies.txt','Data_Quasars.txt','Data_Stars.txt']

colnames =  ['ID','ra','dec','istar','gmag','rmag','imag','zmag','W1','W2','psfgmag','psfrmag','psfimag','W3','W3err','J','Jerr','H','Herr','K','Kerr','umag','zs']

li = []

for file in allfiles:
    df = pd.read_csv(file)
    df.columns = colnames
    df['Source'] = file
    print(df)
    li.append(df)



#df_final = reduce(lambda left,right: pd.merge(left,right), li)
df_final = pd.concat(li)

print(df_final)


                       ID          ra        dec         istar      gmag  \
0     1237649918426415808    9.254510  13.818784     -28.58386  22.80513   
1     1237652942639202897    1.786889  14.392791     -28.20103  22.61204   
2     1237652942639267859    1.908622  14.497953   -6275.14600  18.34751   
3     1237652942639333782    2.068091  14.467087     -19.42173  22.34108   
4     1237652944249028825  359.916063  15.607598   -2156.05200  18.68034   
...                   ...         ...        ...           ...       ...   
9995  1237652947458392293   15.030862 -10.041520   -2580.91400  17.32711   
9996  1237652947458523189   15.331162  -9.845271 -105325.90000  14.47522   
9997  1237652947458523325   15.319195  -9.951411   -2433.99900  18.78175   
9998  1237652947458588827   15.494276  -9.964049   -3581.63600  18.67374   
9999  1237652947989495941    1.719409  -9.755432  -13278.68000  17.32711   

          rmag      imag      zmag      W1      W2  ...     W3err         J  \
0     20

TypeError: merge() missing 1 required positional argument: 'right'