# Initialization

We import libraries that we will need to work with the catalog and define paths to our workspace. I also include here some useful commands that you should add to your toolbelt.

In [1]:
from astropy.table import Table, vstack
import numpy as np

## Print Working Directory
Very useful to know which folder you are working in, and what the parent folders are

In [2]:
%pwd

'/Users/kmohamad/Documents/GitHub/SIP2019/Kadri'

The percent sign used at the front of <code>pwd</code> tells Jupyter Notebook that I am calling a magic function. They are just special function that allow more interactivity.

## List Directory
I use this all the time to see what's in my directory.

In [3]:
%ls

Untitled.ipynb


You can even go further and have it display the items in other directories as lists in human readable formats. You can learn some basic examples of how to use <code>ls</code> <a href='https://www.tecmint.com/15-basic-ls-command-examples-in-linux/'>here</a>. You can then easily copy the names of the catalogs for use later.

In [4]:
%ls -alh ./../catalogs

total 544
drwxr-xr-x@ 12 kmohamad  staff   384B Jun 24 13:21 [34m.[m[m/
drwxr-xr-x  12 kmohamad  staff   384B Jun 24 17:51 [34m..[m[m/
-rw-r--r--@  1 kmohamad  staff   6.0K Jun 24 13:30 .DS_Store
drwxr-xr-x@  2 kmohamad  staff    64B Oct  6  2018 [34m.ipynb_checkpoints[m[m/
-rw-r--r--@  1 kmohamad  staff    20K Sep 28  2018 SelectGCsTrue2.fits
-rw-r--r--@  1 kmohamad  staff   8.4K Sep 28  2018 SelectGCsTrue_kinematic_prob.fits
-rw-r--r--@  1 kmohamad  staff    11K Sep 28  2018 VDGC_kinematic_prob.fits
-rw-r--r--@  1 kmohamad  staff   3.0K Sep 26  2018 VDGC_pPXF_2017.README
-rw-r--r--@  1 kmohamad  staff   149K Sep 26  2018 VDGC_pPXF_2017_v2.fits
-rw-r--r--@  1 kmohamad  staff   2.7K Sep 26  2018 VUGC_pPXF_2017.README
-rw-r--r--@  1 kmohamad  staff    34K Sep 26  2018 VUGC_pPXF_2017_v2.fits
-rw-r--r--@  1 kmohamad  staff    23K Sep 28  2018 orphanGCs.fits


In [5]:
# Note that my working directory is specific to my computer. You will have to change the path
# for your own computer
working_dir = '/Users/kmohamad/Documents/GitHub/SIP2019/'

# The Catalogs

## Cleaning up the catalog
We now import the catalogs that we want to use, and prepare dictionaries that will allow us to access them easily

In [6]:
tables = [
    Table.read(working_dir + 'catalogs/VDGC_pPXF_2017_v2.fits'),
    Table.read(working_dir + 'catalogs/VUGC_pPXF_2017_v2.fits')
]

I will be using dictionaries to make accessing the different catalogs neater. Dictionaries are simply <b>keys</b> that you define to certain <b>values</b>. If you're familiar with Computer Science terminology, it is similar to a hash table.

In [7]:
catalog = {'VDGC':0, 'VUGC':1}

We access the catalogs like so:

In [8]:
# Show only the first five rows of the VDGC catalog
tables[catalog['VDGC']][:5]

VCC,TARGTYPE,GCSAT,HOST,RA,DEC,ZHEL,ZERR,ZERR_pe,ZERR_ne,ZCONF,ZOBS,ZOBS_pe,ZOBS_ne,ABANDCOR,ABANDCOR_pe,ABANDCOR_ne,HELCOR,SN,KECKID,MASKNAME,SLITNUM,ZSPECNUM,ZQUAL,YLOW,YHIGH,SPEC1DNAME
bytes10,bytes7,bytes2,bytes10,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,float32,bytes20,bytes20,int16,int16,int16,int16,int16,bytes40
---,ALG,N,---,187.0097,10.359333,0.00018189283,8.389982e-06,4.2536944e-06,4.2911956e-06,1.0,0.00039203788,7.004846e-07,7.004846e-07,0.00016147838,4.6698975e-07,7.3384103e-07,4.866667e-05,196.03,alg42,vdgc1,0,0,-1,15,21,spec1d.vdgc1.000.alg42.fits.gz
---,ALG,N,---,187.07608,10.3505,-0.0007024523,1.0064823e-05,6.74872e-06,7.2782736e-06,1.0,-0.0005678595,5.0034614e-06,5.5705204e-06,8.592611e-05,1.7678897e-06,2.1348103e-06,4.866667e-05,58.89,alg115,vdgc1,1,1,-1,15,21,spec1d.vdgc1.001.alg115.fits.gz
---,ALG,N,---,186.9995,10.459528,4.2896674e-05,9.472941e-06,5.8739492e-06,6.4012156e-06,1.0,0.0002562106,3.6024921e-06,4.0361256e-06,0.00016464724,2.0347409e-06,2.701869e-06,4.866667e-05,44.0,alg140,vdgc1,2,2,-1,15,21,spec1d.vdgc1.002.alg140.fits.gz
---,ALG,N,---,187.04004,10.45025,6.654637e-05,8.618114e-06,4.658566e-06,4.754663e-06,1.0,0.00010960916,1.5343949e-06,1.9013154e-06,-5.6038766e-06,1.4009692e-06,1.2675436e-06,4.866667e-05,80.71,alg164,vdgc1,3,3,-1,16,21,spec1d.vdgc1.003.alg164.fits.gz
---,ALG,N,---,187.07025,10.325111,-9.1059665e-06,8.707397e-06,4.8853194e-06,4.84415e-06,1.0,0.00011594688,2.1681667e-06,2.1348103e-06,7.6386175e-05,1.3342564e-06,1.2341872e-06,4.866667e-05,92.15,alg177,vdgc1,4,4,-1,15,20,spec1d.vdgc1.004.alg177.fits.gz


We now check what columns the table contains, and what <code>dtype</code> they contain

In [9]:
print(
    tables[catalog['VDGC']].info, 
    tables[catalog['VUGC']].info
)

<Table length=790>
    name     dtype 
----------- -------
        VCC bytes10
   TARGTYPE  bytes7
      GCSAT  bytes2
       HOST bytes10
         RA float32
        DEC float32
       ZHEL float32
       ZERR float32
    ZERR_pe float32
    ZERR_ne float32
      ZCONF float32
       ZOBS float32
    ZOBS_pe float32
    ZOBS_ne float32
   ABANDCOR float32
ABANDCOR_pe float32
ABANDCOR_ne float32
     HELCOR float32
         SN float32
     KECKID bytes20
   MASKNAME bytes20
    SLITNUM   int16
   ZSPECNUM   int16
      ZQUAL   int16
       YLOW   int16
      YHIGH   int16
 SPEC1DNAME bytes40
 <Table length=162>
    name     dtype 
----------- -------
        VCC bytes10
   TARGTYPE  bytes7
      GCSAT  bytes2
       HOST bytes10
         RA float32
        DEC float32
       VHEL float32
       VERR float32
    VERR_pe float32
    VERR_ne float32
      ZCONF float32
       VOBS float32
    VOBS_pe float32
    VOBS_ne float32
   ABANDCOR float32
ABANDCOR_pe float32
ABANDCOR_ne float32
 

Since we are going to merge the two catalogs together, let's see if they both have the same number of columns. If they don't, let's see what columns are missing

In [10]:
print(
    'VDGC:', str(len(tables[catalog['VDGC']].colnames)), 
    'VUGC:', str(len(tables[catalog['VUGC']].colnames))
)

VDGC: 27 VUGC: 24


To find the different columns, we use sets and set operations - yes, the math one - to help us. We take the set difference of <code>VDGC - VUGC</code>:

In [11]:
set(tables[catalog['VDGC']].colnames).difference(set(tables[catalog['VUGC']].colnames))

{'KECKID',
 'ZERR',
 'ZERR_ne',
 'ZERR_pe',
 'ZHEL',
 'ZOBS',
 'ZOBS_ne',
 'ZOBS_pe',
 'ZQUAL',
 'ZSPECNUM'}

Wait. Up above we saw that we are missing only 3 columns. The results have more than 3 columns. What's the deal? If we run it in reverse, that is <code>VUGC - VDGC</code> we get:

In [12]:
set(tables[catalog['VUGC']].colnames).difference(set(tables[catalog['VDGC']].colnames))

{'VERR', 'VERR_ne', 'VERR_pe', 'VHEL', 'VOBS', 'VOBS_ne', 'VOBS_pe'}

It looks like <code>VERR</code> and <code>ZERR</code> might be related, and so do a few other columns. Turns out, the columns with "Z" are redshifts, so we have to multiply by 300,000 to get the velocity in km/s. So it looks like the columns that are really missing in the VUGC catalogs are:
<ul>
    <li> <code>KECKID</code>
    <li> <code>ZQUAL</code>
    <li> <code>ZSPECNUM</code>
</ul>
So we first rescale then rename the columns in the VDGC catalog. We can then add the missing columns to the VUGC catalog and initialize them with values that won't be mistaken for real data (respecting the column <code>dtype</code>).

In [13]:
columns_to_change = ['ZERR', 'ZERR_ne', 'ZERR_pe', 'ZHEL', 'ZOBS', 'ZOBS_ne', 'ZOBS_pe']

for col in columns_to_change:
    tables[catalog['VDGC']][col] = tables[catalog['VDGC']][col] * 3e+5
    
    # col[1:] selects the second letter through the end, so that ZERR_ne becomes ERR_ne. We then put V in front
    tables[catalog['VDGC']].rename_column(col, 'V' + col[1:])   
    
tables[catalog['VUGC']]['KECKID'] = '---'
tables[catalog['VUGC']]['ZQUAL'] = -101010
tables[catalog['VUGC']]['ZSPECNUM'] = -101010

I also like to sort the header of the table in alphabetical order. You can do this thusly

In [14]:
for ind, cat in enumerate(tables):
    tables[ind] = cat[sorted(cat.colnames)]

Now that we have this prepared, we can start doing stuff with the catalogs.

## Extracting things
We now will filter the tables to get what we want (as per the email I sent you). First up, we need all entries in both catalogs to have <code>ZCONF == 1</code>.

In [15]:
for ind, cat in enumerate(tables):
    tables[ind] = cat[cat['ZCONF'] == 1]
    
# Check that ZCONF only contains 1. I'm using list comprehensions here.
# If you don't understand what's going on, don't worry! I'll explain tomorrow.
print([set(cat['ZCONF']) for cat in tables])

[{1.0}, {1.0}]


In [24]:
DE_GC = tables[catalog['VDGC']][tables[catalog['VDGC']]['GCSAT'] == 'Y']
UDG_GC = tables[catalog['VUGC']][tables[catalog['VUGC']]['GCSAT'] == 'Y']

set(tables[catalog['VUGC']]['HOST'])

# Create a boolean mask for boolean indexing later
non_udg = (UDG_GC['HOST'] != 'VLSB-B') & (UDG_GC['HOST'] != 'VLSB-D') & (UDG_GC['HOST'] != 'VCC0615')

DE_GC = vstack([DE_GC, UDG_GC[non_udg]])
UDG_GC.remove_rows(np.nonzero(non_udg)[0])

Now we have two new tables that holds DE and UDG GC satellites like the prompt asked.

So this is a little introduction to how I would have approached the problem. I wrote this in hopes of giving you ideas on how you could approach it, or as a reference if you are feeling stuck and need a little help. I recommend that you continue doing it the way you are and try to avoid imitating my code too much. This is a learning process and you should be proud with your code! Also, it is always fun to share code and see the various approaches. 

Lastly, remember that you can always reach out to me for anything!

Happy coding! :)