## 30 July 2018
-- Laurin Gray

This notebook is to determine which points appear to the right of the 3-sigma line (30July2018_3SigFlagging notebook) in multiple CMDs.

The data comes from the catalog of Spitzer sources of Khan et al. (2015), matched with sources from Whitelock et al. (2013) in CasJobs.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import gaussian_kde
import csv
import pathlib

In [2]:
# Read in the table of points to the right of the 3-sigma line

flagged_data = pd.read_csv('/Users/lgray/Documents/Phot_data/flagged_vals_30July2018_lauringray.csv')

In [3]:
# shorten variable names

jMINUSthreesix = flagged_data.threesixVS_jMINUSthreesix.values                  # [3.6] vs. J - [3.6]
threesixMINUSeightzero = flagged_data.eightzeroVS_threesixMINUSeightzero.values # [8.0] vs. [3.6]-[8.0]
fourfiveMINUSeightzero = flagged_data.eightzeroVS_fourfiveMINUSeightzero.values # [8.0] vs. [4.5]-[8.0]
jMINUSh = flagged_data.hVS_jMINUSh.values                                       # H vs. J-H
hMINUSthreesix = flagged_data.hVS_hMINUSthreesix.values                         # H vs. H-[3.6]
hMINUSfourfive = flagged_data.hVS_hMINUSfourfive.values                         # H vs. H-[4.5]
hMINUSk = flagged_data.kVS_hMINUSk.values                                       # K vs. H-K
jMINUSk = flagged_data.kVS_jMINUSk.values                                       # K vs. J-K

The 8 CMDs that were data-flagged are:
    - [3.6] vs. J-[3.6]
    - [8.0] vs. [3.6]-[8.0]
    - [8.0] vs. [4.5]-[8.0]
    - H vs. J-H
    - H vs. H-[3.6]
    - H vs. H-[4.5]
    - K vs. H-K
    - K vs. J-K

The current approach is to iterate through the longest column (jMINUSthreesix), because that has the most IDs to look for matches for.  I create a list of the other columns, and for each ID in the longest column, check each column in the list (converted into a set).  If the ID is in that column, add +1 to a counter.  After checking all the other columns, append the ID to a list according to the value of the counter.

In [4]:
def CMD_count(col, column_list):
    """
    Before running the function, user defines empty lists of CMD counts as:
        in_one = []
        in_two = []
        in_three = []
        in_four = []
        in_five = []
        in_six = []
        in_seven = []
        in_eight = []
    
    This is so that the function can be run on multiple columns without erasing the previous lists.
    
    User chooses a CMD column that they want to use to iterate through the other CMDs (col), and defines a list
    including all other columns (column_list).
    
    For each element in the chosen column, the function goes through the list of columns, 
    and checks if the element is in a column.  Each time it is, +1 is added to a counter.  The element is then 
    sorted into a list based on the final value of the counter, after checking to make sure that 
    the element is not already in that list (so that it can be run on multiple columns).
    
    The function also prints which row of the chosen column the function is on every 100 rows, 
    which is useful for estimating progress.
    
    The user needs to update the col_len for each column in the definition if it has changed since the last time
    the function was run.
    
    Call example:
        col_list = [fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
                    hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]
        CMD_count(jMINUSthreesix, col_list)
    """
    
    if col is jMINUSthreesix:
        col_len = 3330
    if col is threesixMINUSeightzero:
        col_len = 2432
    if col is fourfiveMINUSeightzero:
        col_len = 2546
    if col is jMINUSh:
        col_len = 2073
    if col is hMINUSthreesix:
        col_len = 2209
    if col is hMINUSfourfive:
        col_len = 1684
    if col is hMINUSk:
        col_len = 678
    if col is jMINUSk:
        col_len = 1922
    
    k = 1 # row counter
    for i in col:
        if k < col_len: # keeps it from going past the end of the column if it's not the longest one
            if k%100 == 0:
                print("On row", k) 
        
            a=1 # CMD counter
            for c in column_list:
                s = set(c)
                if i in s:
                    a = a+1
        
            if a == 1 and int(i) not in in_one: #if the counter is at 1 and the ID has not already been included
                in_one.append(int(i))
            elif a == 2 and int(i) not in in_two:
                in_two.append(int(i))
            elif a == 3 and int(i) not in in_three:
                in_three.append(int(i))
            elif a == 4 and int(i) not in in_four:
                in_four.append(int(i))
            elif a == 5 and int(i) not in in_five:
                in_five.append(int(i))
            elif a == 6 and int(i) not in in_six:
                in_six.append(int(i))
            elif a == 7 and int(i) not in in_seven:
                in_seven.append(int(i))
            elif a == 8 and int(i) not in in_eight:
                in_eight.append(int(i))
                
            k = k+1

In [5]:
in_one = []
in_two = []
in_three = []
in_four = []
in_five = []
in_six = []
in_seven = []
in_eight = []

In [6]:
full_col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

In [7]:
col_list = [fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]
CMD_count(jMINUSthreesix, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200
On row 2300
On row 2400
On row 2500
On row 2600
On row 2700
On row 2800
On row 2900
On row 3000
On row 3100
On row 3200
On row 3300


In [8]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 517
Number in 2 CMDs: 441
Number in 3 CMDs: 848
Number in 4 CMDs: 428
Number in 5 CMDs: 352
Number in 6 CMDs: 230
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [9]:
col_list = [jMINUSthreesix, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(fourfiveMINUSeightzero, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200
On row 2300
On row 2400
On row 2500


In [10]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 616
Number in 2 CMDs: 2095
Number in 3 CMDs: 868
Number in 4 CMDs: 442
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [11]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(threesixMINUSeightzero, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200
On row 2300
On row 2400


In [12]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 687
Number in 2 CMDs: 2098
Number in 3 CMDs: 869
Number in 4 CMDs: 443
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [13]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(jMINUSk, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900


In [14]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 706
Number in 2 CMDs: 2178
Number in 3 CMDs: 881
Number in 4 CMDs: 451
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [15]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(hMINUSthreesix, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200


In [16]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 732
Number in 2 CMDs: 2217
Number in 3 CMDs: 883
Number in 4 CMDs: 451
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [17]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, hMINUSfourfive, hMINUSk]

CMD_count(jMINUSh, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000


In [18]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 791
Number in 2 CMDs: 2222
Number in 3 CMDs: 883
Number in 4 CMDs: 451
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [19]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSk]

CMD_count(hMINUSfourfive, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600


In [20]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 882
Number in 2 CMDs: 2222
Number in 3 CMDs: 883
Number in 4 CMDs: 451
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [21]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive]

CMD_count(hMINUSk, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600


In [22]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 882
Number in 2 CMDs: 2222
Number in 3 CMDs: 883
Number in 4 CMDs: 451
Number in 5 CMDs: 355
Number in 6 CMDs: 231
Number in 7 CMDs: 141
Number in 8 CMDs: 362


In [23]:
# save data

filename = '/Users/lgray/Documents/Phot_data/CMD_counting_30July2018_lauringray.csv'

f = open(filename, 'w')
writer = csv.writer(f)
#add heading
points_w_header = ['in_one'] + in_one

for val in points_w_header:
    writer.writerow([val])

f.close()


# list of other counts
counts = [in_two, in_three, in_four, in_five, in_six, in_seven, in_eight]
headers = ['in_two', 'in_three', 'in_four', 'in_five', 'in_six', 'in_seven', 'in_eight']

c=0
for i in counts:
    data = pd.read_csv(filename)
    new_col = pd.DataFrame({headers[c]:i})
    c = c+1

    data= pd.concat([data, new_col], axis=1)
    data.to_csv(filename, index=False)