## 23 July 2018
-- Laurin Gray

This notebook is to determine which points appear to the right of the 3-sigma line (23July2018_3SigFlagging notebook) in multiple CMDs.

The data comes from the catalog of Spitzer sources of Khan et al. (2015), matched with sources from Whitelock et al. (2013) in CasJobs.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import gaussian_kde
import csv
import pathlib

In [2]:
# Read in the table of points to the right of the 3-sigma line

flagged_data = pd.read_csv('/Users/lgray/Documents/Phot_data/flagged_vals_23July2018_lauringray.csv')

In [3]:
# shorten variable names

jMINUSthreesix = flagged_data.threesixVS_jMINUSthreesix.values                  # [3.6] vs. J - [3.6]
threesixMINUSeightzero = flagged_data.eightzeroVS_threesixMINUSeightzero.values # [8.0] vs. [3.6]-[8.0]
fourfiveMINUSeightzero = flagged_data.eightzeroVS_fourfiveMINUSeightzero.values # [8.0] vs. [4.5]-[8.0]
jMINUSh = flagged_data.hVS_jMINUSh.values                                       # H vs. J-H
hMINUSthreesix = flagged_data.hVS_hMINUSthreesix.values                         # H vs. H-[3.6]
hMINUSfourfive = flagged_data.hVS_hMINUSfourfive.values                         # H vs. H-[4.5]
hMINUSk = flagged_data.kVS_hMINUSk.values                                       # K vs. H-K
jMINUSk = flagged_data.kVS_jMINUSk.values                                       # K vs. J-K

The 8 CMDs that were data-flagged are:
    - [3.6] vs. J-[3.6]
    - [8.0] vs. [3.6]-[8.0]
    - [8.0] vs. [4.5]-[8.0]
    - H vs. J-H
    - H vs. H-[3.6]
    - H vs. H-[4.5]
    - K vs. H-K
    - K vs. J-K

The current approach is to iterate through the longest column (jMINUSthreesix), because that has the most IDs to look for matches for.  I create a list of the other columns, and for each ID in the longest column, check each column in the list (converted into a set).  If the ID is in that column, add +1 to a counter.  After checking all the other columns, append the ID to a list according to the value of the counter.

In [27]:
def CMD_count(col, column_list):
    """
    Before running the function, user defines empty lists of CMD counts as:
        in_one = []
        in_two = []
        in_three = []
        in_four = []
        in_five = []
        in_six = []
        in_seven = []
        in_eight = []
        in_nine = []
        in_ten = []
    
    This is so that the function can be run on multiple columns without erasing the previous lists.
    
    User chooses a CMD column that they want to use to iterate through the other CMDs (col), and defines a list
    including all other columns (column_list).
    
    For each element in the chosen column, the function goes through the list of columns, 
    and checks if the element is in a column.  Each time it is, +1 is added to a counter.  The element is then 
    sorted into a list based on the final value of the counter, after checking to make sure that 
    the element is not already in that list (so that it can be run on multiple columns).
    
    The function also prints which row of the chosen column the function is on every 100 rows, 
    which is useful for estimating progress.
    """
    
    k = 1 # row counter
    while k < len(col): # so that the program doesn't try to go past the length of the column
        for i in col:
            if k%100 == 0:
                print("On row", k) 
        
            a=1 # CMD counter
            for c in column_list:
                s = set(c)
                if i in s:
                    a = a+1
        
            if a == 1 and int(i) not in in_one: #if the counter is at 1 and the ID has not already been included
                in_one.append(int(i))
            elif a == 2 and int(i) not in in_two:
                in_two.append(int(i))
            elif a == 3 and int(i) not in in_three:
                in_three.append(int(i))
            elif a == 4 and int(i) not in in_four:
                in_four.append(int(i))
            elif a == 5 and int(i) not in in_five:
                in_five.append(int(i))
            elif a == 6 and int(i) not in in_six:
                in_six.append(int(i))
            elif a == 7 and int(i) not in in_seven:
                in_seven.append(int(i))
            elif a == 8 and int(i) not in in_eight:
                in_eight.append(int(i))
            
            k = k+1

In [8]:
in_one = []
in_two = []
in_three = []
in_four = []
in_five = []
in_six = []
in_seven = []
in_eight = []

In [6]:
full_col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

In [9]:
col_list = [fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]
CMD_count(jMINUSthreesix, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200
On row 2300
On row 2400
On row 2500


In [10]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 583
Number in 2 CMDs: 293
Number in 3 CMDs: 591
Number in 4 CMDs: 239
Number in 5 CMDs: 243
Number in 6 CMDs: 213
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [12]:
col_list = [jMINUSthreesix, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(fourfiveMINUSeightzero, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200


ValueError: cannot convert float NaN to integer

In [16]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 657
Number in 2 CMDs: 1880
Number in 3 CMDs: 601
Number in 4 CMDs: 247
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230
[85, 106, 170, 262, 286, 299, 313, 342, 348, 350, 385, 387, 389, 402, 406, 411, 437, 439, 442, 481, 498, 516, 540, 546, 585, 616, 618, 627, 632, 636, 643, 653, 654, 658, 664, 665, 685, 703, 722, 724, 733, 734, 751, 782, 783, 785, 789, 796, 800, 813, 821, 822, 835, 849, 869, 874, 883, 884, 889, 911, 917, 930, 935, 961, 963, 965, 975, 979, 991, 992, 1001, 1010, 1018, 1046, 1052, 1055, 1076, 1091, 1093, 1094, 1098, 1100, 1103, 1122, 1124, 1135, 1154, 1157, 1160, 1163, 1175, 1184, 1202, 1203, 1211, 1222, 1232, 1235, 1241, 1253, 1258, 1261, 1273, 1277, 1278, 1283, 1285, 1291, 1294, 1298, 1300, 1320, 1323, 1324, 1349, 1357, 1361, 1369, 1373, 1386, 1389, 1398, 1402, 1417, 1418, 1424, 1429, 1430, 1431, 1439, 1440, 1447, 1448, 1452, 1459, 1492, 1500, 1529, 1534, 1563, 1574, 1579, 1612, 1621, 1630, 1640, 1658, 1662, 1666, 

In [17]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(threesixMINUSeightzero, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400
On row 1500
On row 1600
On row 1700
On row 1800
On row 1900
On row 2000
On row 2100
On row 2200


ValueError: cannot convert float NaN to integer

In [18]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 736
Number in 2 CMDs: 1883
Number in 3 CMDs: 603
Number in 4 CMDs: 247
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [19]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(jMINUSk, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400


ValueError: cannot convert float NaN to integer

In [20]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 755
Number in 2 CMDs: 1981
Number in 3 CMDs: 612
Number in 4 CMDs: 250
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [21]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            jMINUSh, hMINUSfourfive, hMINUSk]

CMD_count(hMINUSthreesix, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300
On row 1400


ValueError: cannot convert float NaN to integer

In [22]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 771
Number in 2 CMDs: 2017
Number in 3 CMDs: 612
Number in 4 CMDs: 250
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [23]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, hMINUSfourfive, hMINUSk]

CMD_count(jMINUSh, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200
On row 1300


ValueError: cannot convert float NaN to integer

In [24]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 819
Number in 2 CMDs: 2018
Number in 3 CMDs: 612
Number in 4 CMDs: 250
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [25]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSk]

CMD_count(hMINUSfourfive, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500
On row 600
On row 700
On row 800
On row 900
On row 1000
On row 1100
On row 1200


ValueError: cannot convert float NaN to integer

In [26]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 908
Number in 2 CMDs: 2018
Number in 3 CMDs: 612
Number in 4 CMDs: 250
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [28]:
col_list = [jMINUSthreesix, fourfiveMINUSeightzero, threesixMINUSeightzero, jMINUSk, 
            hMINUSthreesix, jMINUSh, hMINUSfourfive]

CMD_count(hMINUSk, col_list)

On row 100
On row 200
On row 300
On row 400
On row 500


ValueError: cannot convert float NaN to integer

In [29]:
print("Number in 1 CMD:", len(in_one))
print("Number in 2 CMDs:", len(in_two))
print("Number in 3 CMDs:", len(in_three))
print("Number in 4 CMDs:", len(in_four))
print("Number in 5 CMDs:", len(in_five))
print("Number in 6 CMDs:", len(in_six))
print("Number in 7 CMDs:", len(in_seven))
print("Number in 8 CMDs:", len(in_eight))

Number in 1 CMD: 908
Number in 2 CMDs: 2018
Number in 3 CMDs: 612
Number in 4 CMDs: 250
Number in 5 CMDs: 243
Number in 6 CMDs: 215
Number in 7 CMDs: 137
Number in 8 CMDs: 230


In [30]:
# save data

filename = '/Users/lgray/Documents/Phot_data/CMD_counting_23July2018_lauringray.csv'

f = open(filename, 'w')
writer = csv.writer(f)
#add heading
points_w_header = ['in_one'] + in_one

for val in points_w_header:
    writer.writerow([val])

f.close()


# list of other counts
counts = [in_two, in_three, in_four, in_five, in_six, in_seven, in_eight]
headers = ['in_two', 'in_three', 'in_four', 'in_five', 'in_six', 'in_seven', 'in_eight']

c=0
for i in counts:
    data = pd.read_csv(filename)
    new_col = pd.DataFrame({headers[c]:i})
    c = c+1

    data= pd.concat([data, new_col], axis=1)
    data.to_csv(filename, index=False)