# Work In Progress (WIP)

* Reduce frames:
  * Remove all frames that have an exposure length of 0 ("exptime_numeric" == 0.0)
  * Remove all frames that don't have an area ('area'.isna()) - This removes all CATALOG frames
  * Reduce frames for all unique datetimes to a single frame, taking the lowest RLEVEL possible
  * Check that it's okay to remove all blocks without a science frame - Assume they are just failed science blocks?
  * Check for Extract Target that all Science Frames for a given block have the same target - therefore I can just take the first one of these?
  * Check observations in the Calibration PropID (and other calibration ones), create a separate file of those patterns / individuals? Add to schedule.
  * Check that just having blocks with science frames in is okay?
<br>

* *__TODO:__* Make a series of observations through the LCO API (space them apart if possible). Download the data from the API, to see if we can link the observation frames through the API fields alone.

* Find the login information for St Andrews and get all the information about our Proposals through the API.

* Look at the timings for incomplete patterns (the ones showing up without any science frames, so I can't automatically work out the targets), to see if they match up to others in the same Request Number?

# Notes

The RLEVEL column is a value identifying how much data reduction has occurred on the frame.
* 0 is raw unprocessed data
* 91 is most recent processed data
* 90 is processed data, with the previous ORAC pipeline (pre 2016)

* CATALOG frames all end in 'e90_cat.fits', and are generated through the SExtractor from the ORAC-DR Pipeline

TODO: Check that every processed frame has an unprocessed 'related_frame'. If this is the case, we should be able to purge all of the processed data.

# Code

## Import Libraries

In [1]:
import json, os, sys, re
import pandas as pd
import datetime as dt
from os.path import join as pathjoin
from numpy import mean, std, floor, log10, finfo
from download_datasets_lco import create_data_name

## Select Working Folder

In [2]:
dirpaths = []
for (root, directories, files) in os.walk("archive_data_2019"):
    if len(directories) == 0:
        dirpaths.append(root)

current_dirpath = 0

## Collate Raw Data Files

In [3]:
# Collate all the raw data files in the target directory
def merge_datasets(dir_path):
    dir_path = pathjoin(dir_path)
    datafile_list = filter(lambda x: x.startswith('data'),os.listdir(dir_path))

    df_list = []

    for datafile in datafile_list:
        filepath = pathjoin(dir_path, datafile)
        data = json.load(open(filepath,"r"))
        df_list.append(pd.DataFrame(data).dropna(axis=1, how="all"))
        # Dropna is a fix for some files having empty (all-NaN) columns. Pandas doesn't like this.        
    df = pd.concat(df_list)
    df.reset_index(drop=True, inplace=True)
    return df

# raw = merge_datasets(dirpaths[current_dirpath])

## Merge all files

In [16]:
df = None

df_list = []
for i in range(len(dirpaths)):
    
    # Skip 2m data, as we want to focus on one network of telescopes that are all (vaguely) identical.
    if "1m0a" not in dirpaths[i]:
        continue
        
    # print(f"\rFolder {i+1} / {len(dirpaths)}", end="")
    print(dirpaths[i], " - ", end="")
    raw = merge_datasets(dirpaths[i])
    df_list.append(raw)
    print(len(raw))
print("\nConcatenating dataframes...")

# Drop some duplicate entries due to getting the days separately
df = pd.concat(df_list).drop_duplicates(subset="basename").set_index("id")
del df_list

# Drop duplicate or unnecessary columns
drop_columns = ["url", "filename", "version_set", "DATE_OBS", "DAY_OBS", "PROPID", "INSTRUME", "OBJECT",
                "RLEVEL", "SITEID", "TELID", "EXPTIME", "FILTER", "L1PUBDAT", "OBSTYPE",
                "BLKUID", "REQNUM"]
df.drop(columns=drop_columns, inplace=True)

# Drop any frames that don't have a request_id
df = df[df["request_id"].notna()]

# Drop frames with a 0s exposure time
df = df[df["exposure_time"]!=0]

df

archive_data_2019\2019-03\coj_1m0a_2019-03-01_2019-04-01  - 21991
archive_data_2019\2019-03\cpt_1m0a_2019-03-01_2019-04-01  - 58346
archive_data_2019\2019-03\elp_1m0a_2019-03-01_2019-04-01  - 17981
archive_data_2019\2019-03\lsc_1m0a_2019-03-01_2019-04-01  - 49984
archive_data_2019\2019-03\tfn_1m0a_2019-03-01_2019-04-01  - 0
archive_data_2019\2019-04\coj_1m0a_2019-04-01_2019-05-01  - 28598
archive_data_2019\2019-04\cpt_1m0a_2019-04-01_2019-05-01  - 62128
archive_data_2019\2019-04\elp_1m0a_2019-04-01_2019-05-01  - 14176
archive_data_2019\2019-04\lsc_1m0a_2019-04-01_2019-05-01  - 85157
archive_data_2019\2019-04\tfn_1m0a_2019-04-01_2019-05-01  - 0
archive_data_2019\2019-05\coj_1m0a_2019-05-01_2019-06-01  - 27708
archive_data_2019\2019-05\cpt_1m0a_2019-05-01_2019-06-01  - 71072
archive_data_2019\2019-05\elp_1m0a_2019-05-01_2019-06-01  - 18025
archive_data_2019\2019-05\lsc_1m0a_2019-05-01_2019-06-01  - 65668
archive_data_2019\2019-05\tfn_1m0a_2019-05-01_2019-06-01  - 0

Concatenating datafra

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,primary_optical_element,public_date,configuration_type,observation_id,request_id,area,related_frames
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
11304644,coj1m011-fa12-20190331-0257-e00,2019-03-31T15:15:29.502000Z,2019-03-31,NOAO2019A-003,fa12,obj_217550,0,coj,1m0a,60.0,rp,2020-03-30T15:15:29.502000Z,EXPOSE,494193370.0,1737712.0,"{'type': 'Polygon', 'coordinates': [[[162.7770...",[11304646]
11304646,coj1m011-fa12-20190331-0257-e91,2019-03-31T15:15:29.502000Z,2019-03-31,NOAO2019A-003,fa12,obj_217550,91,coj,1m0a,60.0,rp,2020-03-30T15:15:29.502000Z,EXPOSE,494193370.0,1737712.0,"{'type': 'Polygon', 'coordinates': [[[161.8828...","[11304644, 11302075, 11301884, 11269343, 10504..."
11304645,coj1m003-fa11-20190331-0211-e00,2019-03-31T15:15:24.222000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,0,coj,1m0a,90.0,ip,2020-03-30T15:15:24.222000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[176.3420...",[11304647]
11304647,coj1m003-fa11-20190331-0211-e91,2019-03-31T15:15:24.222000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,91,coj,1m0a,90.0,ip,2020-03-30T15:15:24.222000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[175.8705...","[11304645, 11302397, 11302074, 11301886, 10496..."
11304641,coj1m003-fa11-20190331-0210-e00,2019-03-31T15:13:12.927000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,0,coj,1m0a,90.0,rp,2020-03-30T15:13:12.927000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[176.3420...",[11304659]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17165778,lsc1m009-ak01-20190504-0757-g00,2019-05-05T02:46:48.680000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:48.680000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[]
17165773,lsc1m009-ak01-20190504-0756-g00,2019-05-05T02:46:33.615000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:33.615000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[]
17165772,lsc1m009-ak01-20190504-0755-g00,2019-05-05T02:46:18.662000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:18.662000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[]
17165771,lsc1m009-ak01-20190504-0754-g00,2019-05-05T02:46:03.712000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:03.712000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[]


In [17]:
obs_frame_count = {}
for o, g in df.groupby("observation_id"):
    l = len(g)
    if l not in obs_frame_count:
        obs_frame_count[l] = 0
    obs_frame_count[l] += 1
obs_frame_count

{389: 1,
 2: 5613,
 330: 4,
 8: 766,
 11: 2054,
 4: 1502,
 12: 776,
 160: 6,
 26: 13,
 40: 67,
 6: 1787,
 22: 176,
 24: 374,
 10: 604,
 629: 5,
 118: 20,
 70: 7,
 20: 433,
 82: 14,
 202: 8,
 141: 15,
 52: 13,
 13: 9,
 34: 76,
 261: 4,
 56: 4,
 222: 2,
 756: 1,
 44: 6,
 28: 26,
 267: 2,
 1: 33,
 581: 2,
 16: 189,
 742: 1,
 148: 1,
 46: 15,
 142: 16,
 58: 16,
 256: 5,
 112: 16,
 3: 12,
 64: 14,
 134: 8,
 96: 8,
 906: 1,
 80: 15,
 214: 2,
 117: 16,
 192: 3,
 30: 119,
 505: 1,
 18: 76,
 210: 3,
 138: 8,
 161: 7,
 163: 3,
 500: 2,
 108: 13,
 94: 7,
 14: 22,
 132: 6,
 86: 11,
 110: 12,
 32: 18,
 212: 2,
 78: 66,
 92: 18,
 168: 2,
 342: 3,
 48: 7,
 76: 15,
 184: 4,
 72: 7,
 25: 5,
 201: 7,
 102: 11,
 68: 8,
 5: 8,
 60: 10,
 66: 33,
 42: 14,
 162: 7,
 122: 3,
 57: 7,
 260: 10,
 503: 1,
 124: 3,
 140: 11,
 114: 10,
 270: 2,
 286: 2,
 264: 2,
 322: 2,
 152: 1,
 38: 8,
 229: 2,
 50: 3,
 113: 6,
 154: 3,
 407: 1,
 74: 12,
 630: 2,
 62: 10,
 631: 4,
 39: 5,
 115: 5,
 182: 8,
 164: 3,
 90: 10,
 33: 

In [18]:
sorted(obs_frame_count.items(), key=lambda x: x[0])

[(1, 33),
 (2, 5613),
 (3, 12),
 (4, 1502),
 (5, 8),
 (6, 1787),
 (7, 1),
 (8, 766),
 (9, 18),
 (10, 604),
 (11, 2054),
 (12, 776),
 (13, 9),
 (14, 22),
 (15, 5),
 (16, 189),
 (17, 5),
 (18, 76),
 (19, 10),
 (20, 433),
 (21, 14),
 (22, 176),
 (23, 11),
 (24, 374),
 (25, 5),
 (26, 13),
 (27, 9),
 (28, 26),
 (29, 4),
 (30, 119),
 (31, 4),
 (32, 18),
 (33, 16),
 (34, 76),
 (35, 2),
 (36, 4),
 (37, 3),
 (38, 8),
 (39, 5),
 (40, 67),
 (41, 3),
 (42, 14),
 (43, 4),
 (44, 6),
 (45, 5),
 (46, 15),
 (47, 1),
 (48, 7),
 (49, 1),
 (50, 3),
 (51, 10),
 (52, 13),
 (53, 6),
 (54, 11),
 (55, 5),
 (56, 4),
 (57, 7),
 (58, 16),
 (59, 1),
 (60, 10),
 (61, 3),
 (62, 10),
 (63, 7),
 (64, 14),
 (65, 48),
 (66, 33),
 (67, 10),
 (68, 8),
 (69, 5),
 (70, 7),
 (71, 13),
 (72, 7),
 (73, 4),
 (74, 12),
 (75, 7),
 (76, 15),
 (77, 7),
 (78, 66),
 (79, 11),
 (80, 15),
 (81, 14),
 (82, 14),
 (83, 2),
 (84, 7),
 (85, 7),
 (86, 11),
 (87, 6),
 (88, 12),
 (89, 1),
 (90, 10),
 (91, 7),
 (92, 18),
 (93, 9),
 (94, 7),
 (9

In [26]:
obs_frame_count = {}
for o, g in df.groupby("observation_id"):
    l = len(g)
    if l == 6:
        print(o)

466002457.0
466877197.0
467546113.0
467764246.0
468585880.0
468902314.0
469120924.0
469710454.0
471319696.0
471327412.0
471335155.0
471342889.0
471357526.0
471374224.0
471376366.0
471461515.0
471985546.0
472150312.0
472158358.0
472174705.0
472197340.0
472212619.0
472486729.0
472800478.0
472860829.0
472876177.0
472891759.0
472938424.0
472941358.0
472958584.0
472963396.0
472968682.0
472970221.0
472982065.0
472983214.0
472984888.0
472988947.0
472990096.0
473136829.0
473273557.0
473294395.0
473759131.0
473771125.0
473780917.0
474064792.0
474289723.0
474295732.0
474363580.0
474403423.0
474405469.0
474441790.0
474483631.0
474505471.0
474514396.0
474518092.0
474524512.0
474752320.0
474800632.0
474807721.0
474807727.0
474886546.0
474919393.0
475000450.0
475022077.0
475036591.0
475133020.0
475250761.0
475264627.0
475267273.0
475278406.0
475356037.0
475570165.0
476106238.0
476121142.0
476201797.0
476395807.0
476408716.0
476421244.0
476427544.0
476640355.0
476668288.0
476855938.0
476926036.0
4769

In [28]:
for obsid, g in df.groupby("observation_id"):
    l = len(g["target_name"].unique())
    if l > 1:
        print(l)

2


In [27]:
df[df["observation_id"]==471374224.0]

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,primary_optical_element,public_date,configuration_type,observation_id,request_id,area,related_frames
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
11140961,lsc1m004-fa03-20190306-0201-e00,2019-03-07T09:29:09.562000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,0,lsc,1m0a,150.0,ip,2020-03-06T09:29:09.562000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...",[11140992]
11140992,lsc1m004-fa03-20190306-0201-e91,2019-03-07T09:29:09.562000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,91,lsc,1m0a,150.0,ip,2020-03-06T09:29:09.562000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...","[11140961, 11090305, 11090216, 11082127, 9741441]"
11140943,lsc1m004-fa03-20190306-0200-e00,2019-03-07T09:26:13.091000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,0,lsc,1m0a,150.0,ip,2020-03-06T09:26:13.091000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...",[11140985]
11140985,lsc1m004-fa03-20190306-0200-e91,2019-03-07T09:26:13.091000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,91,lsc,1m0a,150.0,ip,2020-03-06T09:26:13.091000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...","[11140943, 11090305, 11090216, 11082127, 9741441]"
11140973,lsc1m004-fa03-20190306-0199-e91,2019-03-07T09:23:16.313000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,91,lsc,1m0a,150.0,ip,2020-03-06T09:23:16.313000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...","[11140936, 11090305, 11090216, 11082127, 9741441]"
11140936,lsc1m004-fa03-20190306-0199-e00,2019-03-07T09:23:16.313000Z,2019-03-06,NAOC2019A-004,fa03,OB190157,0,lsc,1m0a,150.0,ip,2020-03-06T09:23:16.313000Z,EXPOSE,471374224.0,1747807.0,"{'type': 'Polygon', 'coordinates': [[[-88.0368...",[11140973]


In [24]:
df[df["observation_id"]==468455590.0]

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,primary_optical_element,public_date,configuration_type,observation_id,request_id,area,related_frames
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
11118941,lsc1m004-fa03-20190303-0544-e00,2019-03-04T05:04:03.554000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,0,lsc,1m0a,3.5,w,2020-03-03T05:04:03.554000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[119.2567...",[11118944]
11118944,lsc1m004-fa03-20190303-0544-e91,2019-03-04T05:04:03.554000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,91,lsc,1m0a,3.5,w,2020-03-03T05:04:03.554000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[118.3613...","[11118941, 11090703, 11090305, 11090216, 9741441]"
11118937,lsc1m004-fa03-20190303-0543-e00,2019-03-04T05:03:34.073000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,0,lsc,1m0a,3.5,w,2020-03-03T05:03:34.073000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[119.2644...",[11118940]
11118940,lsc1m004-fa03-20190303-0543-e91,2019-03-04T05:03:34.073000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,91,lsc,1m0a,3.5,w,2020-03-03T05:03:34.073000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[118.3693...","[11118937, 11090703, 11090305, 11090216, 9741441]"
11118935,lsc1m004-fa03-20190303-0542-e00,2019-03-04T05:03:04.661000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,0,lsc,1m0a,3.5,w,2020-03-03T05:03:04.661000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[119.2723...",[11118936]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11116707,lsc1m004-fa03-20190303-0094-e91,2019-03-04T01:22:30.678000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,91,lsc,1m0a,3.5,w,2020-03-03T01:22:30.678000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[121.8791...","[11116706, 11090703, 11090305, 11090216, 9741441]"
11116701,lsc1m004-fa03-20190303-0093-e00,2019-03-04T01:22:01.218000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,0,lsc,1m0a,3.5,w,2020-03-03T01:22:01.218000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[122.7231...",[11116703]
11116703,lsc1m004-fa03-20190303-0093-e91,2019-03-04T01:22:01.218000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,91,lsc,1m0a,3.5,w,2020-03-03T01:22:01.218000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[121.8869...","[11116701, 11090703, 11090305, 11090216, 9741441]"
11116696,lsc1m004-fa03-20190303-0092-e00,2019-03-04T01:21:31.705000Z,2019-03-03,LCO2019A-006,fa03,2019 CT4,0,lsc,1m0a,3.5,w,2020-03-03T01:21:31.705000Z,EXPOSE,468455590.0,1745804.0,"{'type': 'Polygon', 'coordinates': [[[122.7305...",[11116699]


### TESTING - Related Frames

From testing, it looks like every raw file (RLEVEL=0) has exactly 1 related frame, and that frame is a reduced file. All reduced files have multiple related frames, which include the raw file that it relates to, as well as any calibration frames that were used in its reduction.
Any calibration frames might also have reduced versions, which will be included along with their raw files as well.

In [212]:
sdf = df.head(500).copy(deep=True)
sdf.loc[:, ["num_related_frames"]] = sdf["related_frames"].apply(len)

In [213]:
sdf[["reduction_level", "num_related_frames"]].value_counts()

reduction_level  num_related_frames
0                1                     250
91               5                     206
                 2                      34
                 3                      10
Name: count, dtype: int64

In [235]:
# Get a frame that has a reduction level of 0
r0frame = sdf[(sdf["reduction_level"]==0) & (sdf["configuration_type"]=="EXPOSE")].iloc[0]
print("Examining raw frame:")
print(f"ID: {r0frame.name}")
print(f"Reduction Level: {r0frame['reduction_level']}")
print(f"Datetime: {r0frame['observation_date']}")
print(f"Configuration Type: {r0frame['configuration_type']}")
r0_related_frames = r0frame["related_frames"]
print(f"Related Frames: {r0_related_frames}")

Examining raw frame:
ID: 11304644
Reduction Level: 0
Datetime: 2019-03-31T15:15:29.502000Z
Configuration Type: EXPOSE
Related Frames: [11304646]


In [239]:
print("Examining reduced frame:")
r1frame = sdf.loc[r0_related_frames[0]]
print(f"ID: {r1frame.name}")
print(f"Reduction Level: {r1frame['reduction_level']}")
print(f"Datetime: {r1frame['observation_date']}")
print(f"Configuration Type: {r1frame['configuration_type']}")
r1_related_frames = r1frame["related_frames"]
print(f"Related Frames: {r1_related_frames}")

Examining reduced frame:
ID: 11304646
Reduction Level: 91
Datetime: 2019-03-31T15:15:29.502000Z
Configuration Type: EXPOSE
Related Frames: [11304644, 11302075, 11301884, 11269343, 10504311]


In [240]:
print("Examining calibration frame:")
other_frames = [x for x in r1_related_frames if x != r0frame.name]
r2frame = df.loc[other_frames[0]]
print(f"ID: {r2frame.name}")
print(f"Reduction Level: {r2frame['reduction_level']}")
print(f"Datetime: {r2frame['observation_date']}")
print(f"Configuration Type: {r2frame['configuration_type']}")
r2_related_frames = r2frame["related_frames"]
print(f"Related Frames: {r2_related_frames}")

Examining calibration frame:
ID: 11302075
Reduction Level: 91
Datetime: 2019-03-31T02:02:25.969100Z
Configuration Type: DARK
Related Frames: [11309170, 11309166, 11309250, 11309161, 11309152, 11309147, 11309136, 11309131, 11309254, 11309125, 11309116, 11309111, 11309101, 11309097, 11309248, 11309084, 11309071, 11309063, 11308968, 11308957, 11309255, 11308955, 11308953, 11308949, 11308944, 11308938, 11309252, 11308933, 11308929, 11308924, 11304646, 11304632, 11304621, 11304604, 11304594, 11304591, 11304584, 11304570, 11304559, 11304517, 11304458, 11304430, 11304406, 11304384, 11304369, 11304366, 11304363, 11304358, 11304351, 11304347, 11304342, 11304337, 11304333, 11304327, 11304323, 11304321, 11304317, 11304313, 11304303, 11304297, 11304294, 11304283, 11304280, 11304275, 11304271, 11304267, 11304263, 11304257, 11304252, 11304245, 11304235, 11304223, 11304215, 11304202, 11304196, 11304194, 11304191, 11304184, 11304180, 11304177, 11304174, 11304170, 11304167, 11304162, 11304154, 11304148

## Extract Telescope Number

Unlikely, but extract the ID for each individual telescope just in case there are observations scheduled on separate telescopes of the same class at the same location at the exact same time.

In [41]:
df[["telescope_number", "day_frame_number"]] = df["basename"].str.extract("^.+[12]m(\d{3})-[\w\d]{4}-\d{8}-(\d{4})")
df

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,primary_optical_element,public_date,configuration_type,observation_id,request_id,area,related_frames,telescope_number,day_frame_number
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
11304644,coj1m011-fa12-20190331-0257-e00,2019-03-31T15:15:29.502000Z,2019-03-31,NOAO2019A-003,fa12,obj_217550,0,coj,1m0a,60.0,rp,2020-03-30T15:15:29.502000Z,EXPOSE,494193370.0,1737712.0,"{'type': 'Polygon', 'coordinates': [[[162.7770...",[11304646],011,0257
11304646,coj1m011-fa12-20190331-0257-e91,2019-03-31T15:15:29.502000Z,2019-03-31,NOAO2019A-003,fa12,obj_217550,91,coj,1m0a,60.0,rp,2020-03-30T15:15:29.502000Z,EXPOSE,494193370.0,1737712.0,"{'type': 'Polygon', 'coordinates': [[[161.8828...","[11304644, 11302075, 11301884, 11269343, 10504...",011,0257
11304645,coj1m003-fa11-20190331-0211-e00,2019-03-31T15:15:24.222000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,0,coj,1m0a,90.0,ip,2020-03-30T15:15:24.222000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[176.3420...",[11304647],003,0211
11304647,coj1m003-fa11-20190331-0211-e91,2019-03-31T15:15:24.222000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,91,coj,1m0a,90.0,ip,2020-03-30T15:15:24.222000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[175.8705...","[11304645, 11302397, 11302074, 11301886, 10496...",003,0211
11304641,coj1m003-fa11-20190331-0210-e00,2019-03-31T15:13:12.927000Z,2019-03-31,KEY2017AB-001,fa11,sn2019bka,0,coj,1m0a,90.0,rp,2020-03-30T15:13:12.927000Z,EXPOSE,494181751.0,1760536.0,"{'type': 'Polygon', 'coordinates': [[[176.3420...",[11304659],003,0210
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17165778,lsc1m009-ak01-20190504-0757-g00,2019-05-05T02:46:48.680000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:48.680000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[],009,0757
17165773,lsc1m009-ak01-20190504-0756-g00,2019-05-05T02:46:33.615000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:33.615000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[],009,0756
17165772,lsc1m009-ak01-20190504-0755-g00,2019-05-05T02:46:18.662000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:18.662000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[],009,0755
17165771,lsc1m009-ak01-20190504-0754-g00,2019-05-05T02:46:03.712000Z,2019-05-04,KEY2017AB-002c,ak01,BD+01 2447,0,lsc,1m0a,10.0,air,2020-05-04T02:46:03.712000Z,GUIDE,522515884.0,1787353.0,"{'type': 'Polygon', 'coordinates': [[[157.1454...",[],009,0754


In [47]:
l = {}
for n, g in df.groupby(["site_id", "telescope_number", "observation_day"]):
    # ll = len(g)
    # if ll == 2:
    #     print(g)
    #     break
    if ll not in l:
        l[ll] = 0
    l[ll] += 1
l

{2: 740}

## Reduce Frames and columns
We're only really interested in the timestamps and details of the observation frames, so all of the extra reduced frames are fairly irrelevant to us. As such, we're going to bin them to make the dataset smaller to work with.

We're going to group by observation_date and telescope [INSERT COLUMNS HERE] to get all unique observation start times, and then only keep one frame per observation.

We also need to keep in mind that some of the observations are from guiding telescopes and standards, and we need to filter those out too as they aren't really relevant.

In [9]:
df[df["request_id"].isna()]["proposal_id"].unique()

array(['calibrate', 'auto_focus', 'standard', 'LCOEngineering',
       'SAAO2019A-006', '', 'LCOEngineering-001', 'LCO2019A-006',
       'FTPEPO2017AB-002'], dtype=object)

In [169]:
sdf[(sdf["REQNUM"].isna()) & (sdf["proposal_id"]=="auto_focus")]

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,...,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames,datetime
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11302342,coj1m003-fa11-20190331-0052-e00,2019-03-31T08:51:53.821000Z,2019-03-31,auto_focus,fa11,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2020-03-30T08:51:53.821000Z,EXPOSE,493726708.0,,"{'type': 'Polygon', 'coordinates': [[[112.3992...",[11302346],2019-03-31 08:51:53.821
11302346,coj1m003-fa11-20190331-0052-e91,2019-03-31T08:51:53.821000Z,2019-03-31,auto_focus,fa11,auto_focus,91,coj,1m0a,10.0,...,1m0a,10.0,rp,2020-03-30T08:51:53.821000Z,EXPOSE,493726708.0,,"{'type': 'Polygon', 'coordinates': [[[111.9014...","[11302342, 11302074, 11301886, 11296255, 10496...",2019-03-31 08:51:53.821
11302340,coj1m011-fa12-20190331-0057-e00,2019-03-31T08:51:43.864000Z,2019-03-31,auto_focus,fa12,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2020-03-30T08:51:43.864000Z,EXPOSE,493726339.0,,"{'type': 'Polygon', 'coordinates': [[[118.3600...",[11302348],2019-03-31 08:51:43.864
11302348,coj1m011-fa12-20190331-0057-e91,2019-03-31T08:51:43.864000Z,2019-03-31,auto_focus,fa12,auto_focus,91,coj,1m0a,10.0,...,1m0a,10.0,rp,2020-03-30T08:51:43.864000Z,EXPOSE,493726339.0,,"{'type': 'Polygon', 'coordinates': [[[117.8319...","[11302340, 11302075, 11301884, 11269343, 10504...",2019-03-31 08:51:43.864
19819141,coj1m003-fa11-20190331-0051-x00,2019-03-31T08:50:51.457000Z,2019-03-31,auto_focus,fa11,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:50:51.457000Z,EXPERIMENTAL,493726708.0,,"{'type': 'Polygon', 'coordinates': [[[110.8326...",[],2019-03-31 08:50:51.457
19833209,coj1m011-fa12-20190331-0056-x00,2019-03-31T08:50:36.480000Z,2019-03-31,auto_focus,fa12,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:50:36.480000Z,EXPERIMENTAL,493726339.0,,"{'type': 'Polygon', 'coordinates': [[[116.7911...",[],2019-03-31 08:50:36.480
19819142,coj1m003-fa11-20190331-0050-x00,2019-03-31T08:50:14.374000Z,2019-03-31,auto_focus,fa11,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:50:14.374000Z,EXPERIMENTAL,493726708.0,,"{'type': 'Polygon', 'coordinates': [[[110.8326...",[],2019-03-31 08:50:14.374
19833208,coj1m011-fa12-20190331-0055-x00,2019-03-31T08:49:54.060000Z,2019-03-31,auto_focus,fa12,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:49:54.060000Z,EXPERIMENTAL,493726339.0,,"{'type': 'Polygon', 'coordinates': [[[116.7911...",[],2019-03-31 08:49:54.060
19819140,coj1m003-fa11-20190331-0049-x00,2019-03-31T08:49:37.370000Z,2019-03-31,auto_focus,fa11,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:49:37.370000Z,EXPERIMENTAL,493726708.0,,"{'type': 'Polygon', 'coordinates': [[[110.8326...",[],2019-03-31 08:49:37.370
19833207,coj1m011-fa12-20190331-0054-x00,2019-03-31T08:49:16.748000Z,2019-03-31,auto_focus,fa12,auto_focus,0,coj,1m0a,10.0,...,1m0a,10.0,rp,2019-03-31T08:49:16.748000Z,EXPERIMENTAL,493726339.0,,"{'type': 'Polygon', 'coordinates': [[[116.7911...",[],2019-03-31 08:49:16.748


In [149]:
df["basename"].str.extract("^.{5}(\d{3})-").isna()[0]

id
11306273    False
11306276    False
11306274    False
11306275    False
11306232    False
            ...  
11539479    False
11539459    False
11539461    False
11539454    False
11539456    False
Name: 0, Length: 669561, dtype: bool

In [6]:
bad_basenames = df["basename"].str.extract("^.{5}(\d{3})-").isna()[0]

In [7]:
df[bad_basenames]

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,primary_optical_element,public_date,configuration_type,observation_id,request_id,area,related_frames
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
11306716,lsc1m00X-fa09-20190331-0002-e00,2019-03-31T23:41:32.698000Z,2019-03-31,,fa09,,0,lsc,1m0a,300.000,air,2020-03-30T23:41:32.698000Z,TARGET,,,"{'type': 'Polygon', 'coordinates': [[[0.220051...",[]
11306704,lsc1m00X-fa09-20190331-0001-e00,2019-03-31T23:35:27.006000Z,2019-03-31,,fa09,,0,lsc,1m0a,300.000,air,2020-03-30T23:35:27.006000Z,TARGET,,,"{'type': 'Polygon', 'coordinates': [[[0.220051...",[]
11305732,TAU2019A-006_0001760499_ftn_20190331_58574,2019-03-31T14:23:35.483000Z,2019-03-30,TAU2019A-006,en06,at2019bxq,90,ogg,2m0a,3600.000,air,2020-03-30T14:23:35.483000Z,SPECTRUM,494165041.0,1760499.0,"{'type': 'Polygon', 'coordinates': [[[-105.702...",[11304654]
11305735,KEY2017AB-001_0001760485_ftn_20190331_58574,2019-03-31T13:00:39.379000Z,2019-03-30,KEY2017AB-001,en06,SN2019bwa,90,ogg,2m0a,3600.000,air,2020-03-30T13:00:39.379000Z,SPECTRUM,494120059.0,1760485.0,"{'type': 'Polygon', 'coordinates': [[[-147.123...",[11304355]
11305734,OGG_calib_0001760408_ftn_20190331_58574,2019-03-31T10:43:22.783000Z,2019-03-30,OGG_calib,en06,HZ 44,90,ogg,2m0a,300.000,air,2019-03-31T10:43:22.783000Z,SPECTRUM,494042938.0,1760408.0,"{'type': 'Polygon', 'coordinates': [[[-159.163...",[11303380]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11555926,KEY2017AB-001_0001784392_ftn_20190502_58606,2019-05-02T06:15:02.480000Z,2019-05-01,KEY2017AB-001,en06,at2019edx,90,ogg,2m0a,1800.027,air,2020-05-01T06:15:02.480000Z,SPECTRUM,520032886.0,1784392.0,"{'type': 'Polygon', 'coordinates': [[[-146.681...",[11551143]
11546382,KEY2017AB-001_0001784066_ftn_20190501_58605,2019-05-01T08:28:29.957000Z,2019-04-30,KEY2017AB-001,en06,sn2019bao,90,ogg,2m0a,2700.064,air,2020-04-30T08:28:29.957000Z,SPECTRUM,519247615.0,1784066.0,"{'type': 'Polygon', 'coordinates': [[[157.3793...",[11542606]
11546384,KEY2017AB-001_0001783733_ftn_20190501_58605,2019-05-01T07:19:30.838000Z,2019-04-30,KEY2017AB-001,en06,SN2019dks,90,ogg,2m0a,3600.008,air,2020-04-30T07:19:30.838000Z,SPECTRUM,519206464.0,1783733.0,"{'type': 'Polygon', 'coordinates': [[[176.0539...",[11542254]
11546380,KEY2017AB-001_0001783368_ftn_20190501_58605,2019-05-01T06:33:43.593000Z,2019-04-30,KEY2017AB-001,en06,SN2019dfa,90,ogg,2m0a,1800.054,air,2020-04-30T06:33:43.593000Z,SPECTRUM,519172180.0,1783368.0,"{'type': 'Polygon', 'coordinates': [[[149.8029...",[11541769]


In [152]:
df["OBSTYPE"].value_counts()

OBSTYPE
GUIDE           262798
EXPOSE          193143
BIAS             67059
SKYFLAT          64879
EXPERIMENTAL     41550
DARK             20295
STANDARD          9177
LAMPFLAT          3982
TARGET            2838
DOUBLE            1552
SPECTRUM          1425
ARC                862
BPM                  1
Name: count, dtype: int64

In [153]:
df[df["OBSTYPE"]=="BPM"]

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,...,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames,datetime
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11327761,bpm-ef05-2x2-2019-04-03T215059,2019-04-03T21:50:59.311000Z,2019-04-03,calibrate,ef05,,59,lsc,1m0a,0.001,...,1m0a,0.001,opaque,2019-04-03T16:28:40.680000Z,BPM,,,"{'type': 'Polygon', 'coordinates': [[[7.571628...","[52672629, 52672571, 52672524, 52672500, 52672...",2019-04-03 21:50:59.311


In [127]:
for c in sdf.columns:
    print(c, " - ", sdf.iloc[0][c])

basename  -  coj1m011-fa12-20190331-0279-d00
observation_date  -  2019-03-31T21:59:57.843000Z
observation_day  -  2019-03-31
proposal_id  -  calibrate
instrument_id  -  fa12
target_name  -  
reduction_level  -  0
site_id  -  coj
telescope_id  -  1m0a
exposure_time  -  900.0
primary_optical_element  -  rp
public_date  -  2019-03-31T21:59:57.843000Z
configuration_type  -  DARK
observation_id  -  493726327.0
request_id  -  nan
version_set  -  [{'id': 11719651, 'key': 'Brmodcq.vGCVgsTB..I4As5aTbh3ufVL', 'md5': '38fb409af0a7c2f020234f544ae43c10', 'extension': '.fits.fz', 'url': 'https://archive-lco-global.s3.us-west-2.amazonaws.com/coj/fa12/20190331/raw/coj1m011-fa12-20190331-0279-d00.fits.fz?versionId=Brmodcq.vGCVgsTB..I4As5aTbh3ufVL&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA6FT4CXR464A32PW2%2F20240328%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20240328T174511Z&X-Amz-Expires=172800&X-Amz-SignedHeaders=host&X-Amz-Signature=60966ca2d508d6793cd45a9caa524db1036b039efbdd64d5216fa196

In [122]:
sdf = df.head(2000).copy()

In [124]:
for i, g in sdf.groupby(["site_id", "telescope_id", "observation_date"]):
    print(i, len(g))

('coj', '1m0a', '2019-03-30T08:52:28.973000Z') 1
('coj', '1m0a', '2019-03-30T08:52:45.265000Z') 2
('coj', '1m0a', '2019-03-30T08:56:04.616000Z') 2
('coj', '1m0a', '2019-03-30T08:56:33.408000Z') 2
('coj', '1m0a', '2019-03-30T08:57:16.492000Z') 2
('coj', '1m0a', '2019-03-30T08:57:55.556000Z') 2
('coj', '1m0a', '2019-03-30T08:58:27.819000Z') 2
('coj', '1m0a', '2019-03-30T08:59:17.716000Z') 2
('coj', '1m0a', '2019-03-30T08:59:39.755000Z') 2
('coj', '1m0a', '2019-03-30T09:00:40.612000Z') 2
('coj', '1m0a', '2019-03-30T09:00:51.231000Z') 2
('coj', '1m0a', '2019-03-30T09:02:02.587000Z') 2
('coj', '1m0a', '2019-03-30T09:02:02.874000Z') 2
('coj', '1m0a', '2019-03-30T09:03:14.498000Z') 2
('coj', '1m0a', '2019-03-30T09:03:24.475000Z') 2
('coj', '1m0a', '2019-03-30T09:04:26.217000Z') 2
('coj', '1m0a', '2019-03-30T09:04:46.891000Z') 2
('coj', '1m0a', '2019-03-30T09:05:38.217000Z') 2
('coj', '1m0a', '2019-03-30T09:06:09.091000Z') 2
('coj', '1m0a', '2019-03-30T09:06:49.625000Z') 2
('coj', '1m0a', '201

In [121]:
sdf[["site_id", "telescope_id", "instrument_id", "observation_id"]]

Unnamed: 0_level_0,site_id,telescope_id,instrument_id,observation_id,FILTER
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
11306273,coj,1m0a,fa12,493726327.0,rp
11306276,coj,1m0a,fa12,493726327.0,rp
11306274,coj,1m0a,fa11,493726579.0,ip
11306275,coj,1m0a,fa11,493726579.0,ip
11306232,coj,1m0a,fa12,493726327.0,rp
...,...,...,...,...,...
11303666,coj,1m0a,fa12,494015875.0,zs
11303661,coj,1m0a,fa11,493981741.0,ip
11303679,coj,1m0a,fa11,493981741.0,ip
11303606,coj,1m0a,fa12,494015875.0,zs


In [114]:
def reduce_frames(df):
    obs_groups = df.groupby("observation_date")

    sole_frames = []
    weird = []
    
    for obsdate, group in obs_groups:
        minimum_rlevel = group["RLEVEL"].min()
        raw_frames = group[group["RLEVEL"]==minimum_rlevel]
        if len(raw_frames) == 1:
            # Only 1 raw frame
            sole_frames.append(raw_frames.iloc[0].name)

        else:
            weird.append(raw_frames)
            print(len(raw_frames), raw_frames["RLEVEL"].unique())
    
    print(len(obs_groups))
    print(len(df))
    print(sole_frames)

reduce_frames(df)

6 [0]
4 [0]
2 [0]
3 [0]
4 [0]
2 [0]
4 [0]
3 [0]
2 [0]
2 [0]
2 [91]
7 [0]
2 [0]
4 [0]
4 [0]
2 [91]
8 [0]
2 [0]
7 [0]
2 [0]
2 [0]
4 [0]
2 [0]
2 [0]
2 [0]
2 [0]
2 [0]
4 [0]
2 [0]
2 [0]
2 [0]
2 [91]



KeyboardInterrupt



In [None]:
def reduce_frames(df):
    obs_groups = df.groupby('datetime')
    expected_frames = len(obs_groups)

    if expected_frames == len(df):
        print("THIS DOES NOTHING - NO FRAMES REDUCED")

    sole_frames = []
    for d, g in obs_groups:
        minimum_rlevel = g["RLEVEL"].min() # For the odd case where there is no 0-RLEVEL frame.
        raw_frames = g[g["RLEVEL"]==minimum_rlevel]
        
        if len(raw_frames) > 1:
            # There is the odd occurence where there are 2 0-RLEVEL frames (or 90-RLEVEL frames if no 0),
            # but they should have the same attributes for everything except the URL, which we've already dropped.]
            # This just checks that that is definitely the case.
            for c in raw_frames.columns:
                try:
                    if len(raw_frames[c].unique()) > 1:
                        print("ERROR - Duplicate entries with different attributes")
                        print("Column: ", c)
                        print(raw_frames[c])
                        raise Error

                except TypeError:
                    print("Warning - Still trying to compare unhashable data types: ", c)

            sole_frames.append(raw_frames.head(1))

        else:
            sole_frames.append(raw_frames)

    new_df = pd.concat(sole_frames, axis=0)


    resultant_frames = len(new_df)
    if resultant_frames != expected_frames:
        print("Unexpected number of frames returned: {} expected, {} received.".format(\
            expected_frames, resultant_frames))
    return new_df

temp4 = reduce_frames(temp3).reset_index(drop=True)
print(temp3.shape, "-->", temp4.shape)

## Convert timestamp to datetime object
Straightforward

In [105]:
# Convert the timestamp to a datetime object
def str_to_datetime(date_str):
    try:
        return dt.datetime.strptime(date_str,"%Y-%m-%dT%H:%M:%S.%fZ")
    except:
        return dt.datetime.strptime(date_str,"%Y-%m-%dT%H:%M:%SZ")

df['datetime'] = df['DATE_OBS'].apply(str_to_datetime)
df

Unnamed: 0_level_0,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,exposure_time,...,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames,datetime
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
11306273,coj1m011-fa12-20190331-0279-d00,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,900.0,...,1m0a,900.0,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...",[11306276],2019-03-31 21:59:57.843
11306276,coj1m011-fa12-20190331-0279-d91,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,91,coj,1m0a,900.0,...,1m0a,900.0,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...","[11306273, 11301884, 10504311]",2019-03-31 21:59:57.843
11306274,coj1m003-fa11-20190331-0233-d00,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,0,coj,1m0a,900.0,...,1m0a,900.0,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...",[11306275],2019-03-31 21:59:55.455
11306275,coj1m003-fa11-20190331-0233-d91,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,91,coj,1m0a,900.0,...,1m0a,900.0,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...","[11306274, 11301886, 10496386]",2019-03-31 21:59:55.455
11306232,coj1m011-fa12-20190331-0278-d00,2019-03-31T21:44:33.270000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,900.0,...,1m0a,900.0,rp,2019-03-31T21:44:33.270000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-100.302...",[11306235],2019-03-31 21:44:33.270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11539479,ogg2m001-fs02-20190430-0003-b91,2019-05-01T02:50:26.666000Z,2019-04-30,calibrate,fs02,,91,ogg,2m0a,0.0,...,2m0a,0.0,air,2019-05-01T02:50:26.666000Z,BIAS,518829271.0,,"{'type': 'Polygon', 'coordinates': [[[104.7111...","[11539478, 3519536]",2019-05-01 02:50:26.666
11539459,ogg2m001-fs02-20190430-0002-b00,2019-05-01T02:46:33.312000Z,2019-04-30,calibrate,fs02,,0,ogg,2m0a,0.0,...,2m0a,0.0,air,2019-05-01T02:46:33.312000Z,BIAS,518829271.0,,"{'type': 'Polygon', 'coordinates': [[[103.7415...",[11539461],2019-05-01 02:46:33.312
11539461,ogg2m001-fs02-20190430-0002-b91,2019-05-01T02:46:33.312000Z,2019-04-30,calibrate,fs02,,91,ogg,2m0a,0.0,...,2m0a,0.0,air,2019-05-01T02:46:33.312000Z,BIAS,518829271.0,,"{'type': 'Polygon', 'coordinates': [[[103.7369...","[11539459, 3536694]",2019-05-01 02:46:33.312
11539454,ogg2m001-fs02-20190430-0001-b00,2019-05-01T02:45:45.031000Z,2019-04-30,calibrate,fs02,,0,ogg,2m0a,0.0,...,2m0a,0.0,air,2019-05-01T02:45:45.031000Z,BIAS,518829271.0,,"{'type': 'Polygon', 'coordinates': [[[103.5389...",[11539456],2019-05-01 02:45:45.031


## Get coordinates of centroid of frame
Take the mean RA and mean DEC of all polygon vertices.
<br>
_NOTE_: Should I be taking halfway between the Min and Max of both coordinates?

In [5]:
# Get the coordinates of the centroid of the frame
def get_centroid(area):
    try:
        coordinates = area.get('coordinates')[0]
    except:
        return pd.Series((None,None))
    if coordinates == None:
        return None
    ra = []
    dec = []
    for corner in coordinates:
        ra.append(corner[0])
        dec.append(corner[1])
    mean_ra = mean(ra)
    if mean_ra < 0:
        mean_ra += 360
    return pd.Series((mean_ra, mean(dec)))

raw[['RA','DEC']] = raw['area'].apply(get_centroid)
raw

Unnamed: 0,id,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,...,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames,datetime,RA,DEC
0,11306273,coj1m011-fa12-20190331-0279-d00,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...",[11306276],2019-03-31 21:59:57.843000,263.559529,-60.587577
1,11306276,coj1m011-fa12-20190331-0279-d91,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,91,coj,1m0a,...,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...","[11306273, 11301884, 10504311]",2019-03-31 21:59:57.843000,263.198885,-60.410472
2,11306274,coj1m003-fa11-20190331-0233-d00,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,0,coj,1m0a,...,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...",[11306275],2019-03-31 21:59:55.455000,277.390793,19.374725
3,11306275,coj1m003-fa11-20190331-0233-d91,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,91,coj,1m0a,...,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...","[11306274, 11301886, 10496386]",2019-03-31 21:59:55.455000,277.203052,19.551833
4,11306232,coj1m011-fa12-20190331-0278-d00,2019-03-31T21:44:33.270000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,rp,2019-03-31T21:44:33.270000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-100.302...",[11306235],2019-03-31 21:44:33.270000,259.697993,-60.580450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21986,11144143,coj1m011-fa12-20190301-skyflat-bin1x1-Y,2019-03-01T01:28:19.485889Z,2019-03-01,calibrate,fa12,Flat,91,coj,1m0a,...,Y,2019-03-01T01:28:19.485889Z,SKYFLAT,465085876.0,,"{'type': 'Polygon', 'coordinates': [[[93.58544...","[11093840, 11083495, 11081159, 10504311]",2019-03-01 01:28:19.485889,93.384689,-27.937379
21987,11096269,coj1m003-fa11-20190301-0148-e00,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,0,coj,1m0a,...,rp,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...",[11096271],2019-03-01 00:00:00.000000,139.937263,-36.227410
21988,11096271,coj1m003-fa11-20190301-0148-e91,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,91,coj,1m0a,...,rp,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...","[11096269, 10515225, 10496386, 10496488, 10496...",2019-03-01 00:00:00.000000,139.717709,-36.050302
21989,11094194,coj1m003-fa11-20190301-0063-s91,2019-03-01T00:00:00Z,2019-03-01,standard,fa11,L94,91,coj,1m0a,...,R,2020-02-29T00:00:00Z,STANDARD,465829675.0,,"{'type': 'Polygon', 'coordinates': [[[44.81003...","[11094188, 10496386, 10496548, 10496488, 10496...",2019-03-01 00:00:00.000000,44.632945,1.140637


## Convert EXPTIME to numeric

In [6]:
raw["exptime_numeric"] = pd.to_numeric(raw["EXPTIME"])
raw

Unnamed: 0,id,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,...,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames,datetime,RA,DEC,exptime_numeric
0,11306273,coj1m011-fa12-20190331-0279-d00,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...",[11306276],2019-03-31 21:59:57.843000,263.559529,-60.587577,900.000
1,11306276,coj1m011-fa12-20190331-0279-d91,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,91,coj,1m0a,...,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...","[11306273, 11301884, 10504311]",2019-03-31 21:59:57.843000,263.198885,-60.410472,900.000
2,11306274,coj1m003-fa11-20190331-0233-d00,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,0,coj,1m0a,...,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...",[11306275],2019-03-31 21:59:55.455000,277.390793,19.374725,900.000
3,11306275,coj1m003-fa11-20190331-0233-d91,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,91,coj,1m0a,...,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...","[11306274, 11301886, 10496386]",2019-03-31 21:59:55.455000,277.203052,19.551833,900.000
4,11306232,coj1m011-fa12-20190331-0278-d00,2019-03-31T21:44:33.270000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,2019-03-31T21:44:33.270000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-100.302...",[11306235],2019-03-31 21:44:33.270000,259.697993,-60.580450,900.000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21986,11144143,coj1m011-fa12-20190301-skyflat-bin1x1-Y,2019-03-01T01:28:19.485889Z,2019-03-01,calibrate,fa12,Flat,91,coj,1m0a,...,2019-03-01T01:28:19.485889Z,SKYFLAT,465085876.0,,"{'type': 'Polygon', 'coordinates': [[[93.58544...","[11093840, 11083495, 11081159, 10504311]",2019-03-01 01:28:19.485889,93.384689,-27.937379,2.653
21987,11096269,coj1m003-fa11-20190301-0148-e00,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,0,coj,1m0a,...,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...",[11096271],2019-03-01 00:00:00.000000,139.937263,-36.227410,85.000
21988,11096271,coj1m003-fa11-20190301-0148-e91,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,91,coj,1m0a,...,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...","[11096269, 10515225, 10496386, 10496488, 10496...",2019-03-01 00:00:00.000000,139.717709,-36.050302,85.000
21989,11094194,coj1m003-fa11-20190301-0063-s91,2019-03-01T00:00:00Z,2019-03-01,standard,fa11,L94,91,coj,1m0a,...,2020-02-29T00:00:00Z,STANDARD,465829675.0,,"{'type': 'Polygon', 'coordinates': [[[44.81003...","[11094188, 10496386, 10496548, 10496488, 10496...",2019-03-01 00:00:00.000000,44.632945,1.140637,60.000


### Explore

In [7]:
for b, g in raw.groupby("BLKUID"):
    print(g[["RA", "OBSTYPE"]])
    break

              RA  OBSTYPE
21802  93.901556  SKYFLAT
21804  93.887298  SKYFLAT
21806  93.873040  SKYFLAT
21808  93.873069  SKYFLAT
21810  93.887328  SKYFLAT
21811  93.684855  SKYFLAT
21814  93.870884  SKYFLAT
21815  93.668717  SKYFLAT
21818  93.856645  SKYFLAT
21821  93.842407  SKYFLAT
21822  93.640241  SKYFLAT
21825  93.842436  SKYFLAT
21828  93.856676  SKYFLAT
21831  93.753017  SKYFLAT
21832  93.551432  SKYFLAT
21835  93.738819  SKYFLAT
21836  93.537234  SKYFLAT
21837  93.724621  SKYFLAT
21838  93.523036  SKYFLAT
21841  93.724650  SKYFLAT
21842  93.523040  SKYFLAT
21845  93.738849  SKYFLAT
21846  93.537240  SKYFLAT
21859  93.658563  SKYFLAT
21860  93.457435  SKYFLAT
21861  93.644397  SKYFLAT
21862  93.443269  SKYFLAT
21863  93.630231  SKYFLAT
21864  93.429103  SKYFLAT
21865  93.429107  SKYFLAT
21866  93.630259  SKYFLAT
21867  93.644426  SKYFLAT
21868  93.443275  SKYFLAT
21869  93.599634  SKYFLAT
21872  93.585491  SKYFLAT
21873  93.384689  SKYFLAT
21876  93.571348  SKYFLAT
21877  93.37

## Remove Invalid Frames
Remove frames where the exposure is 0s (not useful), or there is no defined area (CATALOG frames)

In [13]:
def remove_invalid_frames(df):
    df1 = df[df["exptime_numeric"] > 0.0]
    df2 = df1[df1["area"].notna()]
    print("Removed frames with 0s duration:", df.shape, "-->", df1.shape)
    print("Removed frames with no specified area:", df1.shape, "-->", df2.shape)
    return df2
temp1 = remove_invalid_frames(raw)

Removed frames with 0s duration: (21991, 39) --> (17784, 39)
Removed frames with no specified area: (17784, 39) --> (17784, 39)


## Filter out undesired columns
Take only the columns that we're interested in.

In [14]:
# Filter to only the desired columns? What are we throwing away here?
desired_columns = ['datetime','BLKUID','EXPTIME','FILTER','INSTRUME','OBJECT',
    'OBSTYPE','PROPID','REQNUM','RLEVEL','RA','DEC', 'SITEID', 'TELID', "exptime_numeric"]

temp2 = temp1[ desired_columns ]

### Unwanted columns

In [15]:
print("Dropped columns: {}".format([x for x in temp1.columns if x not in desired_columns]))

Dropped columns: ['id', 'basename', 'observation_date', 'observation_day', 'proposal_id', 'instrument_id', 'target_name', 'reduction_level', 'site_id', 'telescope_id', 'exposure_time', 'primary_optical_element', 'public_date', 'configuration_type', 'observation_id', 'request_id', 'version_set', 'url', 'filename', 'DATE_OBS', 'DAY_OBS', 'L1PUBDAT', 'area', 'related_frames']


In [16]:
print(temp1.iloc[0])

id                                                                  11306273
basename                                     coj1m011-fa12-20190331-0279-d00
observation_date                                 2019-03-31T21:59:57.843000Z
observation_day                                                   2019-03-31
proposal_id                                                        calibrate
instrument_id                                                           fa12
target_name                                                                 
reduction_level                                                            0
site_id                                                                  coj
telescope_id                                                            1m0a
exposure_time                                                          900.0
primary_optical_element                                                   rp
public_date                                      2019-03-31T21:59:57.843000Z

## Remove Non-Science Proposal IDs

In [72]:
def is_science_propid(s):
    is_science = bool(re.match('\w+\d{4}[AB]+-\d+', x))
    is_calib = bool(s == "calibrate")
    return is_science or is_calib

def remove_non_science_propids(df, show=False):
    l1 = set(df["PROPID"].unique())
    is_science_propid = lambda x: bool(re.match('\w+\d{4}[AB]+-\d+', x))
    output = df[ df["PROPID"].apply(is_science_propid) ]
    l2 = set(output["PROPID"].unique())

    if show:
        print("Science PropIDs: {}".format(l2), end="\n\n")
        print("Non-Science PropIDs: {}".format(l1 - l2), end="\n\n")
        print("Exposures with Request IDs:")
        for p, g in df[df["PROPID"].isin(l1-l2)].groupby("PROPID"):
            print(f"'{p}'", g["REQNUM"].notna().sum())

    return output

temp3 = remove_non_science_propids(temp2, show=True)

Science PropIDs: {'UTX2019A-003', 'FTPEPO2014A-004', 'CON2019A-005', 'KEY2017AB-003a-TC', 'LCO2018B-007', 'LCO2019A-001b', 'UTX2019A-004', 'KEY2017AB-004', 'SUPA2019A-005', 'CLN2019A-007', 'LCO2019A-008', 'LCO2019A-005', 'CLN2019A-005', 'NOAO2019A-023', 'NOAO2019A-003', 'NOAO2019A-014', 'LCO2019A-006', 'LCO2019A-004', 'KEY2017AB-002b-TC', 'NAOC2019A-004', 'KEY2017AB-001', 'HAW2019A-002', 'KEY2017AB-003d', 'TAU2019A-005', 'FTPEPO2017AB-001', 'TAU2019A-004', 'CLN2019A-001', 'CON2019A-002', 'NAOC2019A-002', 'SAAO2019A-006', 'NOAO2019A-005', 'SUPA2019A-001'}

Non-Science PropIDs: {'standard', 'auto_focus', 'calibrate', 'LCOEngineering'}

Exposures with Request IDs:
'LCOEngineering' 0
'auto_focus' 1893
'calibrate' 0
'standard' 120


### Exploration

In [73]:
temp2a = temp2[ temp2["PROPID"].isin(['no_proposal', 'standard', 'calibrate', 'LCOEngineering', 'auto_focus', 'COJ_calib']) ]
temp2a["REQNUM"].notna().sum()

2013

In [74]:
temp2["REQNUM"].isna().sum()

8453

In [75]:
for p, g in temp2a.groupby("PROPID"):
    print(p)
    print(g["OBSTYPE"].value_counts())
    print()

LCOEngineering
OBSTYPE
EXPERIMENTAL    628
EXPOSE          124
Name: count, dtype: int64

auto_focus
OBSTYPE
EXPERIMENTAL    2025
EXPOSE           443
Name: count, dtype: int64

calibrate
OBSTYPE
SKYFLAT         3475
DARK            1291
BIAS              62
EXPERIMENTAL      10
Name: count, dtype: int64

standard
OBSTYPE
STANDARD    462
EXPOSE      120
Name: count, dtype: int64



## Sort frames by Observation Datetime
Can do because it's only one telescope

NOTE: Do we actually need to do this, as we group everything by BLKUID anyway?

In [76]:
# temp5 = temp4.sort_values('datetime').reset_index(drop=True)

## Reduce Frames (ERROR)

ERROR - Reduced to way too few frames. Need to have a further look here.

For any non-calibration PropID, there is usually one 0-RLEVEL frame (There are a couple that are missing a 0-RLEVEL). Because of this, we will take the lowest RLEVEL frame, but we are putting this AFTER the stage where we remove all frames with no EXPTIME or AREA, because the e90_cat files are registering an RLEVEL of 0 even though their name implies they should be 90.


For any non-calibration PropID, there is always one 0-RLEVEL frame. LCO probably just discards the raw data from their calibration frames as they are only useful once they've been processed?

Group the frames by the exact datetime (this should result in one group for each actual exposure taken). Then just take the least processed data frame.
<br>
_NOTE_: Should it be the most? Does it make a difference?

In [77]:
def reduce_frames(df):
    obs_groups = df.groupby('datetime')
    expected_frames = len(obs_groups)

    if expected_frames == len(df):
        print("THIS DOES NOTHING - NO FRAMES REDUCED")

    sole_frames = []
    for d, g in obs_groups:
        minimum_rlevel = g["RLEVEL"].min() # For the odd case where there is no 0-RLEVEL frame.
        raw_frames = g[g["RLEVEL"]==minimum_rlevel]
        
        if len(raw_frames) > 1:
            # There is the odd occurence where there are 2 0-RLEVEL frames (or 90-RLEVEL frames if no 0),
            # but they should have the same attributes for everything except the URL, which we've already dropped.]
            # This just checks that that is definitely the case.
            for c in raw_frames.columns:
                try:
                    if len(raw_frames[c].unique()) > 1:
                        print("ERROR - Duplicate entries with different attributes")
                        print("Column: ", c)
                        print(raw_frames[c])
                        raise Error

                except TypeError:
                    print("Warning - Still trying to compare unhashable data types: ", c)

            sole_frames.append(raw_frames.head(1))

        else:
            sole_frames.append(raw_frames)

    new_df = pd.concat(sole_frames, axis=0)


    resultant_frames = len(new_df)
    if resultant_frames != expected_frames:
        print("Unexpected number of frames returned: {} expected, {} received.".format(\
            expected_frames, resultant_frames))
    return new_df

temp4 = reduce_frames(temp3).reset_index(drop=True)
print(temp3.shape, "-->", temp4.shape)

(9144, 15) --> (4830, 15)


In [78]:
temp4.reset_index(drop=True)

Unnamed: 0,datetime,BLKUID,EXPTIME,FILTER,INSTRUME,OBJECT,OBSTYPE,PROPID,REQNUM,RLEVEL,RA,DEC,SITEID,TELID,exptime_numeric
0,2019-03-01 00:00:00.000,466350445.0,85.0,rp,fa11,g3G052,EXPOSE,CLN2019A-007,1743939.0,0,139.937263,-36.227410,coj,1m0a,85.0
1,2019-03-01 09:33:03.073,466119910.0,10.0,gp,fa12,CVSO 1320,EXPOSE,NOAO2019A-005,1741698.0,0,83.178468,-6.226554,coj,1m0a,10.0
2,2019-03-01 11:33:00.500,466185706.0,10.0,gp,fa12,CVSO 191,EXPOSE,NOAO2019A-005,1723127.0,0,86.122576,-0.263000,coj,1m0a,10.0
3,2019-03-01 11:49:13.761,466194166.0,300.0,U,fa12,AT2018hyz,EXPOSE,TAU2019A-004,1743845.0,0,151.933344,1.471442,coj,1m0a,300.0
4,2019-03-01 11:54:40.893,466194166.0,300.0,U,fa12,AT2018hyz,EXPOSE,TAU2019A-004,1743845.0,0,151.933344,1.471442,coj,1m0a,300.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4825,2019-03-31 15:08:45.760,494193367.0,60.0,rp,fa12,obj_135836,EXPOSE,NOAO2019A-003,1737711.0,0,162.169648,-59.837254,coj,1m0a,60.0
4826,2019-03-31 15:11:16.752,494181751.0,90.0,rp,fa11,sn2019bka,EXPOSE,KEY2017AB-001,1760536.0,0,176.342098,19.491844,coj,1m0a,90.0
4827,2019-03-31 15:13:12.927,494181751.0,90.0,rp,fa11,sn2019bka,EXPOSE,KEY2017AB-001,1760536.0,0,176.342098,19.491844,coj,1m0a,90.0
4828,2019-03-31 15:15:24.222,494181751.0,90.0,ip,fa11,sn2019bka,EXPOSE,KEY2017AB-001,1760536.0,0,176.342098,19.491844,coj,1m0a,90.0


## Extract Science Blocks

### Is Valid Block?

In [79]:
def is_valid_block(block):
    return block["OBSTYPE"].isin(["EXPOSE", "SPECTRUM"]).any()

### Get Largest Intrablock Gap

In [80]:
def get_largest_intrablock_gap(df):
    # Sort rows based on start time
    # Iterate over rows, get gaps
    # Retain largest gap
    sdf = df.sort_values('datetime')
    largest_gap = 0
    for i in range(len(sdf)-1):
        current_ending = sdf.datetime.iloc[i] + dt.timedelta(\
            seconds=float(sdf.EXPTIME.iloc[i]))
        gap = (sdf.datetime.iloc[i+1] - current_ending).total_seconds()
        if gap > largest_gap:
            largest_gap = gap
    return largest_gap

### Extract Target, RA, and DEC (WIP)
* Can we just take the last Target of the block, or only the target of Science Frames (EXPOSE, SPECTRUM)?
* Does it even matter, if we just extract the RA and DEC instead?

In [81]:
def extract_target(block):
    num_targets = len(block.OBJECT.unique())
    
    if num_targets == 1:
        return block.OBJECT.unique()[0]
        
    elif num_targets == 0:
        return None
        
    elif num_targets == 2:
        for index, row in block.iterrows():
            # Take target from first Science frame (Expose or Spectrum)
            # as calibration frames might still be tracking previous targets.
            if row["OBSTYPE"] in ("EXPOSE", "SPECTRUM"):
                return row["OBJECT"]

        # Assuming that we have a block made only of Calibration frames, because the main science frame failed?
        # Need to verify that the pattern matches others in the same Request Number (if possible)
        # Might need manual verification on this one.
        # Look at the timings to pattern match.
        
        
        # Add in exception case for if no science frames are found
        print("ERROR - No valid science target found.")
        print(block)
        return None
                
    else:
        print("ERROR - More than 2 targets found in single block.")
        print(block)
        return None
        
        # if all(x in ('SPECTRUM','ARC','LAMPFLAT') for x in block.OBSTYPE):
        #     if len(block.iloc[2:].OBJECT.unique()) == 1:
        #         # Due to telescope not moving before initial calibration
        #         return block.OBJECT.iloc[-1]
        #     else:
        #         print("FAILED SUBSET CHECK:")
        #         print(block.iloc[2:])
        #         print("")
        #         print(block)
        #         print("\n")
        # else:
        #     print("FAILED OBSTYPE CHECK:")
        #     # print(block[["OBSTYPE", "OBJECT"]].value_counts())
        #     print(block[["OBSTYPE", "OBJECT", "EXPTIME"]])
        #     # print(block["OBSTYPE"].unique())
        #     # print(block["OBJECT"].unique())
        #     # print(block["OBJECT"].value_counts())
        #     # print(block)
        #     print("")

    # If all else fails, return tuple of all targets
    return tuple(sorted(block.OBJECT.unique()))

In [82]:
def extract_target(block):
    science_exposures = ["EXPOSE", "SPECTRUM"]
    science_frames = block[block["OBSTYPE"].isin(science_exposures)]
    num_targets = len(science_frames["OBJECT"].unique())

    if num_targets != 1:
        print("ERROR - Unexpected number of targets: {}".format(num_targets))
        print(block)
        return None

    return block[["OBJECT", "OBSTYPE", "RA", "DEC", "EXPTIME", "datetime"]]

In [83]:
extract_target(block)

Unnamed: 0,OBJECT,OBSTYPE,RA,DEC,EXPTIME,datetime
3754,obj_217550,EXPOSE,162.777177,-60.491834,60.0,2019-03-31 15:15:29.502


#### Testing - Compare RA and DEC
The RA and DEC is consistent with the Target (OBJECT) of the frame. As we are assuming that calibration frames may be done on previous targets without moving the telescope, those RAs and DECs should not be included when calculating the mean RA and DEC for the target of the observation.

In [84]:
for blkuid, block in temp4.groupby("BLKUID"):
    num_targets = len(block.OBJECT.unique())
    if num_targets > 1:
        # print(block)
        print(block[["OBJECT", "RA", "DEC"]].iloc[0].to_list())
        break

#### Testing - Check CATALOG against EXPOSE
Do CATALOG observations always have the same targets as the rest of the block?

RESULT: CATALOG observations never result in multiple targets. They do not appear to be calibration frames.

In [85]:
for blkuid, block in temp4.groupby("BLKUID"):
    exptypes = block["OBSTYPE"].unique()
    num_targets = len(block["OBJECT"].unique())
    if "CATALOG" in exptypes:
        if num_targets > 1:
            print(block)
            print()

#### Testing - Check CATALOG RA and DEC
CATALOG frames do not have valid RA and DEC (from not having a valid 'area' value), so they should all be filtered out.

In [86]:
for i, row in temp4.iterrows():
    if row["OBSTYPE"] == "CATALOG":
        if pd.notna(row["RA"]):
            print(row)
print("Done")

Done


#### Testing - Check CATALOG times relative to other observations

In [87]:
for blkuid, block in temp4.groupby("BLKUID"):
    exptypes = block["OBSTYPE"].unique()
    if ("CATALOG" in exptypes) and ("EXPOSE" not in exptypes):
        print(block[["OBSTYPE", "EXPTIME", "datetime", "OBJECT", "area", "PROPID", "REQNUM"]])
        print()

In [88]:
for blkuid, block in temp4.groupby("REQNUM"):
    exptypes = block["OBSTYPE"].unique()
    if ("CATALOG" in exptypes) and ("EXPOSE" not in exptypes):
        print(block[["OBSTYPE", "EXPTIME", "datetime", "OBJECT", "area", "PROPID", "REQNUM"]])
        print()

### WORK FROM HERE

In [89]:
for blkuid, block in temp4.groupby("BLKUID"):
    extract_target(block)
    # science_exposures = ("EXPOSE", "SPECTRUM", "CATALOG")
    # num_sci_exposures = len(block[block["OBSTYPE"].isin(science_exposures)])
    # if num_sci_exposures < 1:
    #     continue

    # num_sci_targets = len(block[block["OBSTYPE"].isin(science_exposures)]["OBJECT"].unique())
    # num_targets = len(block["OBJECT"].unique())
    
    # if num_targets > 1:
    #     if num_sci_targets != 1:
    #         print(num_targets, num_sci_targets)
    #         print(block["OBSTYPE"].unique())
    
    # # if num_sci_targets > 1:
    #     # print(block)

In [90]:
for blkuid, block in temp4.groupby("BLKUID"):
    if "NGC1818" in block["OBJECT"].unique():
        print(block)
        break

In [91]:
for i, row in temp4.iterrows():
    pass

### Get Pattern (WIP)
* Spectrums often overlap with a lot of their guiding observations
* Catalogs sometimes overlap with some of their exposure observations? Are they different?

In [92]:
def get_pattern(block):
    l1 = len(block)
    sblock = block.sort_values('datetime')
    pattern = []
    for row in sblock[['EXPTIME','INSTRUME','FILTER','OBSTYPE']].itertuples(\
        index=False):
        pattern.append( (float(row[0]),row[1],row[2],row[3]) )
    if len(pattern) < l1:
        print("PROBLEM - mismatched lengths")
    return pattern

### Condense Pattern

In [93]:
def condense_pattern(pattern_tuple):
    condensed_list = []
    current_style = pattern_tuple[0]
    current_count = 0
    for frame in pattern_tuple:
        if current_style == frame:
            current_count += 1
        else:
            condensed_list.append( (current_style,current_count) )
            current_style = frame
            current_count = 1
    condensed_list.append( (current_style, current_count) )
    return tuple(condensed_list)

### Get Pattern Length

In [94]:
def get_pattern_length(pattern_tuple):
    total = 0
    for e in pattern_tuple:
        for _ in range(e[1]):
            total += float(e[0][0])
    return total

### Get Block Times

In [95]:
def get_block_times(block):
    block_times = (block["datetime"] - block["datetime"].iloc[0]).to_list()
    rounded_block_times = [round(i.seconds/30) for i in block_times]
    return rounded_block_times

### Extract all Science Blocks

In [96]:
# Extract contiguous observing blocks
def extract_science_blocks(df):
    # Extract Blocks
    blkuid_groups = df.groupby('BLKUID')
    block_list = []
    
    for blkuid, block in blkuid_groups:

        # Check if the block is a valid science block
        # (It should have at least one EXPOSE or SPECTRUM observation).
        # (If not, assumed it's a failed block).
        if not is_valid_block(block):
            continue
        
        # Get subset of science frames
        block_sci = block[ block['OBSTYPE'].isin(['EXPOSE','SPECTRUM']) ]

        # PropID - Why do we do this and not ReqNum?
        propid_list = [ x for x in block.PROPID.unique() ]
        if len(propid_list) > 1:
            print("ERROR: Block with multiple science propids")
            print(propid_list)
            return None
        propid = propid_list[0]

        # Get the first and last rows of the block
        first_row = block.nsmallest(1,'datetime')
        last_row = block.nlargest(1,'datetime')

        # start_date
        start_date = first_row.datetime.iloc[0]
        
        # duration
        end_date = last_row.datetime.iloc[0] + dt.timedelta(
            seconds=float(last_row.EXPTIME.iloc[0]))
        duration = (end_date - start_date).total_seconds()
        
        # exposure_sum
        exposure_sum = block.EXPTIME.astype(float).sum() + finfo(float).eps
            # NOTE: Exclude blocks of Zero duration
        if exposure_sum == 0.0:
            continue
        
        # science_exposure_sum
        science_exposure_sum = block_sci.EXPTIME.astype(float).sum()
        
        # time_efficiency
        time_efficiency = round(exposure_sum / duration,5)
        
        # exposure_science_efficiency
        exposure_science_efficiency = round(science_exposure_sum / exposure_sum,5)
        
        # total_science_efficiency
        total_science_efficiency = round(science_exposure_sum / duration,5)
        
        # largest_gap
        largest_gap = get_largest_intrablock_gap(block)
        
        # targets
        target = extract_target(block)
        
        # mean_ra and mean_dec - CHANGE THIS
        # TO BE INCLUDED IN EXTRACT TARGET
        mean_ra = block.RA.mean() / 15. # In Hours
        mean_dec = block.DEC.mean()
        
        # moving - Difference of > 4 arcseconds over block
        if (abs(first_row.RA.iloc[0] - last_row.RA.iloc[0]) > 0.001) or \
            (abs(first_row.DEC.iloc[0] - last_row.DEC.iloc[0]) > 0.001):
            moving = True
        else:
            moving = False
       
        # pattern
        pattern = condense_pattern(get_pattern(block))

        # pattern_length
        pattern_length = get_pattern_length(pattern)
        
        # num_exposures
        num_exposures = len(block)
        
        # orphan
        if len(block_sci) == 0:
            orphan = True
        else:
            orphan = False
        
        # reqnum
        reqnum_list = block.REQNUM.unique()
        if len(reqnum_list) > 1:
            reqnum = tuple(sorted(reqnum_list))
        else:
            reqnum = reqnum_list[0]
        
        # instrument
        instrument_list = block.INSTRUME.unique()
        if len(instrument_list) > 1:
            instrument = tuple(sorted(instrument_list))
        else:
            instrument = instrument_list[0]
        
        # science_exposure_times
        science_exposure_times = tuple(block_sci.EXPTIME)

        # SiteIDs
        siteid_list = tuple(block["SITEID"].unique())

        # TelescopeIDs
        telid_list = tuple(block["TELID"].unique())

        # Block Times
        block_times = get_block_times(block)

        block_list.append({
            'blkuid': blkuid,
            'propid': propid,
            'start_date': start_date,
            'duration': duration,
            'exposure_sum': exposure_sum,
            'science_exposure_sum': science_exposure_sum,
            'time_efficiency': time_efficiency,
            'exposure_science_efficiency': exposure_science_efficiency,
            'total_science_efficiency': total_science_efficiency,
            'largest_gap': largest_gap,
            'target': target,
            'mean_ra': mean_ra,
            'mean_dec': mean_dec,
            'moving': moving,
            'pattern': pattern,
            'orphan': orphan,
            'reqnum': reqnum,
            'instrument': instrument,
            'num_exposures': num_exposures,
            'science_exposure_times': science_exposure_times,
            "siteids": siteid_list,
            "telids": telid_list,
            "block_times": block_times,
            "pattern_length": pattern_length,
            "rounded_ra": round(mean_ra, 1),
            "rounded_dec": round(mean_dec, 1)
        })

    return pd.DataFrame(block_list)

block_list = extract_science_blocks(temp4)
block_list

Unnamed: 0,blkuid,propid,start_date,duration,exposure_sum,science_exposure_sum,time_efficiency,exposure_science_efficiency,total_science_efficiency,largest_gap,...,reqnum,instrument,num_exposures,science_exposure_times,siteids,telids,block_times,pattern_length,rounded_ra,rounded_dec
0,466119910.0,NOAO2019A-005,2019-03-01 09:33:03.073,10.000,10.0,10.0,1.00000,1.0,1.00000,0.000,...,1741698.0,fa12,1,"(10.0,)","(coj,)","(1m0a,)",[0],10.0,5.5,-6.2
1,466185706.0,NOAO2019A-005,2019-03-01 11:33:00.500,10.000,10.0,10.0,1.00000,1.0,1.00000,0.000,...,1723127.0,fa12,1,"(10.0,)","(coj,)","(1m0a,)",[0],10.0,5.7,-0.3
2,466194166.0,TAU2019A-004,2019-03-01 11:49:13.761,2482.245,2120.0,2120.0,0.85407,1.0,0.85407,40.776,...,1743845.0,fa12,12,"(300.0, 300.0, 200.0, 200.0, 120.0, 120.0, 200...","(coj,)","(1m0a,)","[0, 11, 22, 30, 38, 43, 48, 56, 64, 68, 74, 79]",2120.0,10.1,1.5
3,466350445.0,CLN2019A-007,2019-03-01 00:00:00.000,60107.424,170.0,170.0,0.00283,1.0,0.00283,59937.424,...,1743939.0,fa11,2,"(85.0, 85.0)","(coj,)","(1m0a,)","[0, 2001]",170.0,9.3,-36.2
4,466351201.0,CLN2019A-007,2019-03-01 16:40:34.324,90.000,90.0,90.0,1.00000,1.0,1.00000,0.000,...,1743940.0,fa12,1,"(90.0,)","(coj,)","(1m0a,)",[0],90.0,9.3,-35.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
932,494182150.0,FTPEPO2017AB-001,2019-03-31 14:56:31.446,101.044,60.0,60.0,0.59380,1.0,0.59380,41.044,...,1758459.0,fa12,2,"(30.0, 30.0)","(coj,)","(1m0a,)","[0, 2]",60.0,13.2,-59.0
933,494187766.0,NOAO2019A-003,2019-03-31 15:00:36.569,60.000,60.0,60.0,1.00000,1.0,1.00000,0.000,...,1737709.0,fa12,1,"(60.0,)","(coj,)","(1m0a,)",[0],60.0,10.7,-60.4
934,494187769.0,NOAO2019A-003,2019-03-31 15:05:29.832,60.000,60.0,60.0,1.00000,1.0,1.00000,0.000,...,1737710.0,fa12,1,"(60.0,)","(coj,)","(1m0a,)",[0],60.0,10.8,-59.7
935,494193367.0,NOAO2019A-003,2019-03-31 15:08:45.760,60.000,60.0,60.0,1.00000,1.0,1.00000,0.000,...,1737711.0,fa12,1,"(60.0,)","(coj,)","(1m0a,)",[0],60.0,10.8,-59.8


In [97]:
block_list[["pattern", "pattern_length", "total_science_efficiency"]].sort_values("total_science_efficiency")

Unnamed: 0,pattern,pattern_length,total_science_efficiency
604,"(((60.0, fa12, rp, EXPOSE), 2),)",120.0,0.00274
3,"(((85.0, fa11, rp, EXPOSE), 2),)",170.0,0.00283
748,"(((5.0, fa11, U, EXPOSE), 1), ((1.0, fa11, B, ...",9.0,0.05089
692,"(((1.0, fa12, V, EXPOSE), 1), ((1.5, fa12, V, ...",2.5,0.05590
111,"(((120.0, fa12, ip, EXPOSE), 23),)",2760.0,0.07082
...,...,...,...
397,"(((60.0, fa12, rp, EXPOSE), 1),)",60.0,1.00000
398,"(((60.0, fa11, rp, EXPOSE), 1),)",60.0,1.00000
399,"(((60.0, fa12, rp, EXPOSE), 1),)",60.0,1.00000
389,"(((60.0, fa12, rp, EXPOSE), 1),)",60.0,1.00000


### Testing

#### Exploring Science Blocks

In [98]:
for r, g in block_list.groupby("reqnum"):
    # print(g["siteids"])
    if len(g["siteids"]) > 1:
        # print(g["siteids"])
        print(g["siteids"].unique())
        # break

[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]


In [99]:
g

Unnamed: 0,blkuid,propid,start_date,duration,exposure_sum,science_exposure_sum,time_efficiency,exposure_science_efficiency,total_science_efficiency,largest_gap,...,reqnum,instrument,num_exposures,science_exposure_times,siteids,telids,block_times,pattern_length,rounded_ra,rounded_dec
915,493981741.0,KEY2017AB-002b-TC,2019-03-31 08:54:46.672,15692.396,12400.0,12400.0,0.79019,1.0,0.79019,27.727,...,1760738.0,fa11,124,"(100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100...","(coj,)","(1m0a,)","[0, 4, 8, 13, 17, 21, 25, 30, 34, 38, 42, 46, ...",12400.0,8.1,-20.3


In [100]:
print(temp4.columns)
# print(bl1.columns)

Index(['datetime', 'BLKUID', 'EXPTIME', 'FILTER', 'INSTRUME', 'OBJECT',
       'OBSTYPE', 'PROPID', 'REQNUM', 'RLEVEL', 'RA', 'DEC', 'SITEID', 'TELID',
       'exptime_numeric'],
      dtype='object')


#### Ensuring each block only belongs to one request

In [101]:
for b, g in temp4.groupby("BLKUID"):
    if len(g["REQNUM"].unique()) > 1:
        print(b)

#### Checking calibration frames with Expose

In [102]:
for b, g in temp4.groupby("BLKUID"):
    obstypes = g["OBSTYPE"].unique()
    if "EXPOSE" in obstypes:
        print(obstypes)

['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']
['EXPOSE']

#### Checking for multiple science frames in a block

In [103]:
for b, g in temp4.groupby("BLKUID"):
    num_science_frames = g["OBSTYPE"].isin(["EXPOSE", "SPECTRUM"]).sum()
    if num_science_frames > 1:
        print(g[["OBSTYPE", "FILTER", "EXPTIME", "datetime"]])
        print()
        # print(num_science_frames)

   OBSTYPE FILTER  EXPTIME                datetime
3   EXPOSE      U    300.0 2019-03-01 11:49:13.761
4   EXPOSE      U    300.0 2019-03-01 11:54:40.893
5   EXPOSE      B    200.0 2019-03-01 12:00:21.517
6   EXPOSE      B    200.0 2019-03-01 12:04:08.445
7   EXPOSE      V    120.0 2019-03-01 12:08:07.945
8   EXPOSE      V    120.0 2019-03-01 12:10:34.973
9   EXPOSE     gp    200.0 2019-03-01 12:13:14.305
10  EXPOSE     gp    200.0 2019-03-01 12:17:01.426
11  EXPOSE     rp    120.0 2019-03-01 12:21:02.202
12  EXPOSE     rp    120.0 2019-03-01 12:23:28.954
13  EXPOSE     ip    120.0 2019-03-01 12:26:09.254
14  EXPOSE     ip    120.0 2019-03-01 12:28:36.006

   OBSTYPE FILTER  EXPTIME                datetime
0   EXPOSE     rp     85.0 2019-03-01 00:00:00.000
15  EXPOSE     rp     85.0 2019-03-01 16:40:22.424

   OBSTYPE FILTER  EXPTIME                datetime
19  EXPOSE     gp    100.0 2019-03-02 09:45:47.692
20  EXPOSE     gp    100.0 2019-03-02 09:47:54.864
21  EXPOSE     rp    100.0 20

## Incomplete Pattern Matching (WIP)

In [104]:
for r, g in block_list.groupby("reqnum"):
    # print(g["siteids"])
    if len(g["siteids"]) > 1:
        # print(g["siteids"])
        print(g["siteids"].unique())
        # break

[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]
[('coj',)]


## Full Data Cleaning

In [105]:
def clean_data(data_path):
    data_name = os.path.basename(os.path.normpath(data_path))
    
    if not os.path.isdir(data_path):
        print("Could not find relative directory '{}'".format(data_path))
        return
    if not os.path.isfile(pathjoin(data_path,'_complete')):
        print("Data at relative directory '{}' does not have '_complete' file".format(
            data_path))

    # Data exists
    print("Extracting data for files in '{}'...".format(data_name))

    print("Loading Dataframe...")
    raw = merge_datasets(data_path)

    print("- Original Frames: {}".format(len(raw)))
    
    print("Converting dates to datetime objects...")
    raw['datetime'] = raw['DATE_OBS'].apply(str_to_datetime)

    print("Extracting RA and Dec...")
    raw[['RA','DEC']] = raw['area'].apply(get_centroid)

    print("Converting EXPTIME to a numeric value...")
    raw["exptime_numeric"] = pd.to_numeric(raw["EXPTIME"])

    print("Removing frames with no EXPTIME or AREA...")
    df = remove_invalid_frames(raw)
    
    print("Dropping excess columns...")
    df = df[ desired_columns ]

    # print("Filling empty proposal IDs...")
    # df['PROPID'] = df['PROPID'].apply(fill_empty_proposal)

    print("Removing non-science proposals...")
    df = remove_non_science_propids(df)

    print("- Only-Science PropID Frames: {}".format(len(df)))
    
    print("Sorting frames by date...")
    df = df.sort_values('datetime').reset_index(drop=True)

    print("Reducing frames...")
    df = reduce_frames(df)
    print("- Reduced Frames: {}".format(len(df)))
    
    print("Extracting science blocks...")
    block_list = extract_science_blocks(df)

    print("- Science Blocks: {}".format(len(block_list)))

    return (df, block_list, raw)

---

# Extract Data

In [106]:
dirpaths = []
for (root, directories, files) in os.walk("archive_data_2019"):
    if len(directories) == 0:
        dirpaths.append(root)
dirpaths

['archive_data_2019\\2019-03\\coj_1m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\coj_2m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\cpt_1m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\elp_1m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\lsc_1m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\ogg_2m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-03\\tfn_1m0a_2019-03-01_2019-04-01',
 'archive_data_2019\\2019-04\\coj_1m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\coj_2m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\cpt_1m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\elp_1m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\lsc_1m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\ogg_2m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-04\\tfn_1m0a_2019-04-01_2019-05-01',
 'archive_data_2019\\2019-05\\coj_1m0a_2019-05-01_2019-06-01',
 'archive_data_2019\\2019-05\\coj_2m0a_2019-05-01_2019-

## Extract Data for COJ

In [109]:
for dpath in dirpaths:
    df = merge_datasets(dirpath)
    break
df

Unnamed: 0,id,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,...,SITEID,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames
0,11306273,coj1m011-fa12-20190331-0279-d00,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,coj,1m0a,900.000,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...",[11306276]
1,11306276,coj1m011-fa12-20190331-0279-d91,2019-03-31T21:59:57.843000Z,2019-03-31,calibrate,fa12,,91,coj,1m0a,...,coj,1m0a,900.000,rp,2019-03-31T21:59:57.843000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-96.4405...","[11306273, 11301884, 10504311]"
2,11306274,coj1m003-fa11-20190331-0233-d00,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,0,coj,1m0a,...,coj,1m0a,900.000,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...",[11306275]
3,11306275,coj1m003-fa11-20190331-0233-d91,2019-03-31T21:59:55.455000Z,2019-03-31,calibrate,fa11,,91,coj,1m0a,...,coj,1m0a,900.000,ip,2019-03-31T21:59:55.455000Z,DARK,493726579.0,,"{'type': 'Polygon', 'coordinates': [[[-82.6092...","[11306274, 11301886, 10496386]"
4,11306232,coj1m011-fa12-20190331-0278-d00,2019-03-31T21:44:33.270000Z,2019-03-31,calibrate,fa12,,0,coj,1m0a,...,coj,1m0a,900.000,rp,2019-03-31T21:44:33.270000Z,DARK,493726327.0,,"{'type': 'Polygon', 'coordinates': [[[-100.302...",[11306235]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21986,11144143,coj1m011-fa12-20190301-skyflat-bin1x1-Y,2019-03-01T01:28:19.485889Z,2019-03-01,calibrate,fa12,Flat,91,coj,1m0a,...,coj,1m0a,2.653,Y,2019-03-01T01:28:19.485889Z,SKYFLAT,465085876.0,,"{'type': 'Polygon', 'coordinates': [[[93.58544...","[11093840, 11083495, 11081159, 10504311]"
21987,11096269,coj1m003-fa11-20190301-0148-e00,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,0,coj,1m0a,...,coj,1m0a,85.000,rp,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...",[11096271]
21988,11096271,coj1m003-fa11-20190301-0148-e91,2019-03-01T00:00:00Z,2019-03-01,CLN2019A-007,fa11,g3G052,91,coj,1m0a,...,coj,1m0a,85.000,rp,2020-02-29T00:00:00Z,EXPOSE,466350445.0,1743939.0,"{'type': 'Polygon', 'coordinates': [[[139.9372...","[11096269, 10515225, 10496386, 10496488, 10496..."
21989,11094194,coj1m003-fa11-20190301-0063-s91,2019-03-01T00:00:00Z,2019-03-01,standard,fa11,L94,91,coj,1m0a,...,coj,1m0a,60.000,R,2020-02-29T00:00:00Z,STANDARD,465829675.0,,"{'type': 'Polygon', 'coordinates': [[[44.81003...","[11094188, 10496386, 10496548, 10496488, 10496..."


In [107]:
dfs = []
bls = []

for dpath in dirpaths:
    df = merge_datasets(dirpath)
    print(dpath)
    df1, bl1, raw1 = clean_data(dpath)
    dfs.append(df1)
    bls.append(bl1)

archive_data_2019\2019-03\coj_1m0a_2019-03-01_2019-04-01
Extracting data for files in 'coj_1m0a_2019-03-01_2019-04-01'...
Loading Dataframe...
- Original Frames: 21991
Converting dates to datetime objects...
Extracting RA and Dec...
Converting EXPTIME to a numeric value...
Removing frames with no EXPTIME or AREA...
Removed frames with 0s duration: (21991, 39) --> (17784, 39)
Removed frames with no specified area: (17784, 39) --> (17784, 39)
Dropping excess columns...
Removing non-science proposals...
- Only-Science PropID Frames: 9144
Sorting frames by date...
Reducing frames...
- Reduced Frames: 4830
Extracting science blocks...
- Science Blocks: 937
archive_data_2019\2019-03\coj_2m0a_2019-03-01_2019-04-01
Extracting data for files in 'coj_2m0a_2019-03-01_2019-04-01'...
Loading Dataframe...
- Original Frames: 1917
Converting dates to datetime objects...
Extracting RA and Dec...
Converting EXPTIME to a numeric value...
Removing frames with no EXPTIME or AREA...
Removed frames with 0s d

NameError: name 'Error' is not defined

In [None]:
df1, bl1, raw1 = clean_data(dirpaths[0])

## Extract Data for OGG

In [61]:
df2, bl2, raw2 = clean_data(dirpaths[1])

Extracting data for files in 'lco/ogg_2m0a_2016-02-01_2016-08-01'...
Loading Dataframe...
- Original Frames: 158391
Converting dates to datetime objects...
Extracting RA and Dec...
Converting EXPTIME to a numeric value...
Removing frames with no EXPTIME or AREA...
Removed frames with 0s duration: (158391, 25) --> (148863, 25)
Removed frames with no specified area: (148863, 25) --> (141851, 25)
Dropping excess columns...
Removing non-science proposals...
- Only-Science PropID Frames: 122119
Sorting frames by date...
Reducing frames...
- Reduced Frames: 110486
Extracting science blocks...
- Science Blocks: 1962


In [62]:
print(bl1.shape, bl2.shape, bl1.shape[0]+bl2.shape[0])

(695, 26) (1962, 26) 2657


## Join Dataframes

In [63]:
jbl = pd.concat([bl1, bl2], axis=0)

In [64]:
jbl

Unnamed: 0,blkuid,propid,start_date,duration,exposure_sum,science_exposure_sum,time_efficiency,exposure_science_efficiency,total_science_efficiency,largest_gap,...,reqnum,instrument,num_exposures,science_exposure_times,siteids,telids,block_times,pattern_length,rounded_ra,rounded_dec
0,76701806.0,KEY2014A-002,2016-02-01 10:14:29.486,1800.000,1800.0,1800.0,1.00000,1.00000,1.00000,0.000,...,487801.0,en05,1,"(1800.000000,)","(coj,)","(2m0a,)",[0],1800.0,1.9,-10.0
1,76705538.0,FTPEPO2014A-004,2016-02-01 11:14:18.716,30.000,30.0,30.0,1.00000,1.00000,1.00000,0.000,...,490067.0,fs01,1,"(30.000000,)","(coj,)","(2m0a,)",[0],30.0,3.0,-46.5
2,76706489.0,ARI2015B-001,2016-02-01 11:19:01.488,92.822,60.0,60.0,0.64640,1.00000,0.64640,32.822,...,486725.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-66.5
3,76707521.0,ARI2015B-001,2016-02-01 11:25:02.636,92.628,60.0,60.0,0.64775,1.00000,0.64775,32.628,...,486743.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-64.9
4,76708615.0,FTPEPO2014A-004,2016-02-01 11:32:53.661,419.341,360.0,360.0,0.85849,1.00000,0.85849,29.995,...,489777.0,fs01,3,"(120.000000, 120.000000, 120.000000)","(coj,)","(2m0a,)","[0, 5, 10]",360.0,2.2,-63.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1957,88134994.0,KEY2014A-002,2016-07-31 07:26:44.492,1541.005,2270.0,1200.0,1.47306,0.52863,0.77871,96.322,...,671626.0,"(en06, kb42)",103,"(600.000000, 600.000000)","(ogg,)","(2m0a,)","[0, 0, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, ...",2270.0,16.2,65.7
1958,88140030.0,KEY2014A-002,2016-07-31 08:16:58.691,2671.759,2520.0,2520.0,0.94320,1.00000,0.94320,30.756,...,672957.0,fs02,8,"(300.000000, 300.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 10, 22, 32, 43, 56, 69, 79]",2520.0,16.2,65.7
1959,88142502.0,FTP2016A-001,2016-07-31 09:09:58.421,200.000,200.0,200.0,1.00000,1.00000,1.00000,0.000,...,536501.0,fs02,1,"(200.000000,)","(ogg,)","(2m0a,)",[0],200.0,17.5,-29.4
1960,88142517.0,KEY2014A-003,2016-07-31 09:20:54.700,3606.364,3400.0,3400.0,0.94278,1.00000,0.94278,35.490,...,672012.0,fs02,10,"(400.000000, 400.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 14, 28, 39, 50, 64, 78, 89, 100, 110]",3400.0,17.3,28.4


In [67]:
for c in jbl.columns:
    print(c)
    print(jbl[c].iloc[0])
    print(jbl[c].dtype)
    print()

blkuid
76701806.0
float64

propid
KEY2014A-002
object

start_date
2016-02-01 10:14:29.486000
datetime64[ns]

duration
1800.0
float64

exposure_sum
1800.0
float64

science_exposure_sum
1800.0
float64

time_efficiency
1.0
float64

exposure_science_efficiency
1.0
float64

total_science_efficiency
1.0
float64

largest_gap
0.0
float64

target
        OBJECT   OBSTYPE         RA        DEC      EXPTIME  \
2  SDSS0153m10  SPECTRUM  28.323107 -10.003155  1800.000000   

                 datetime  
2 2016-02-01 10:14:29.486  
object

mean_ra
1.8882071263581635
float64

mean_dec
-10.00315497413299
float64

moving
False
bool

pattern
(((1800.0, 'en05', 'air', 'SPECTRUM'), 1),)
object

orphan
False
bool

reqnum
487801.0
float64

instrument
en05
object

num_exposures
1
int64

science_exposure_times
('1800.000000',)
object

siteids
('coj',)
object

telids
('2m0a',)
object

block_times
[0]
object

pattern_length
1800.0
float64

rounded_ra
1.9
float64

rounded_dec
-10.0
float64



# Data Exploration

In [None]:
bl2.sort_values("exposure_science_efficiency", ascending=True).head(10)

In [60]:
bl2["pattern"].value_counts(ascending=False)

pattern
(((200.000000, fs02, ip, EXPOSE), 1),)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          

### Check Joint Block List for Request Numbers that span both sites

In [105]:
for r, g in jbl.groupby("reqnum"):
    if len(g["siteids"].unique()) > 1:
        print(g[["pattern", "siteids", "target"]])

                                               pattern siteids   target
51   (((60.000000, en05, air, ARC), 1), ((20.000000...  (coj,)  1324+03
139  (((10.000000, kb42, LL, GUIDE), 24), ((1800.00...  (ogg,)  1324+03
                                               pattern siteids   target
270  (((70.000000, fs01, B, CATALOG), 1), ((70.0000...  (coj,)  NGC2264
225  (((70.000000, fs02, V, EXPOSE), 1), ((70.00000...  (ogg,)  NGC2264
                                               pattern siteids   target
143  (((60.000000, en05, air, ARC), 1), ((20.000000...  (coj,)         
233               (((10.000000, kb42, LL, GUIDE), 7),)  (ogg,)  NGC3227
234  (((10.000000, kb42, LL, GUIDE), 5), ((1200.000...  (ogg,)  NGC3227
                                               pattern siteids   target
173  (((60.000000, en05, air, ARC), 1), ((20.000000...  (coj,)  1324+03
271  (((10.000000, kb42, LL, GUIDE), 4), ((1800.000...  (ogg,)  1324+03
                                               pattern siteids  

In [110]:
for r, g in jbl.groupby("reqnum"):
    s_target = set(g["target"].unique())
    s_target.discard("")
    s_target.discard(None)
    if len(s_target) > 1:
        print(g["target"].unique())
        print(g)
        print()

['L745-46A' 'Feige66']
         blkuid        propid              start_date  duration  exposure_sum  \
322  78490575.0  ANU2015B-003 2016-02-24 15:07:37.112    60.000          60.0   
323  78491163.0  ANU2015B-003 2016-02-24 15:09:43.500   589.237         300.0   

     science_exposure_sum  time_efficiency  exposure_science_efficiency  \
322                   0.0          1.00000                          0.0   
323                  60.0          0.50913                          0.2   

     total_science_efficiency  largest_gap  ...   mean_dec  moving  \
322                   0.00000        0.000  ... -17.411628   False   
323                   0.10183      161.312  ...   8.093719    True   

                                               pattern  orphan    reqnum  \
322           (((60.000000, en05, air, LAMPFLAT), 1),)    True  492924.0   
323  (((60.000000, en05, air, LAMPFLAT), 1), ((60.0...   False  492924.0   

     instrument  num_exposures science_exposure_times  siteids   te

## Checking Requests with Multiple Targets?

In [140]:
test_df = df1

for r, g in test_df.groupby("REQNUM"):
    target_list = set()
    for b, g2 in g.groupby("BLKUID"):
        target_list.add(g2["OBJECT"].unique()[-1])
    target_list.discard("")
    if len(target_list) > 1:
        print(target_list)
        print(g[["datetime", "BLKUID", "OBJECT", "OBSTYPE", "EXPTIME"]])
        print(g[["OBSTYPE", "OBJECT"]].value_counts())

{'Feige66', 'L745-46A'}
                    datetime      BLKUID    OBJECT   OBSTYPE    EXPTIME
1689 2016-02-24 15:07:37.112  78490575.0  L745-46A  LAMPFLAT  60.000000
1690 2016-02-24 15:09:43.500  78491163.0  L745-46A  LAMPFLAT  60.000000
1691 2016-02-24 15:11:45.776  78491163.0  L745-46A       ARC  60.000000
1692 2016-02-24 15:15:27.088  78491163.0   Feige66  SPECTRUM  60.000000
1693 2016-02-24 15:16:56.409  78491163.0   Feige66       ARC  60.000000
1694 2016-02-24 15:18:32.737  78491163.0   Feige66  LAMPFLAT  60.000000
OBSTYPE   OBJECT  
LAMPFLAT  L745-46A    2
ARC       Feige66     1
          L745-46A    1
LAMPFLAT  Feige66     1
SPECTRUM  Feige66     1
Name: count, dtype: int64
{'L745-46A', 'TYC 8907-679-1'}
                    datetime      BLKUID          OBJECT   OBSTYPE     EXPTIME
1655 2016-02-24 11:51:17.202  78479694.0        L745-46A  LAMPFLAT   60.000000
1656 2016-02-24 11:53:18.440  78479694.0        L745-46A       ARC   60.000000
1657 2016-02-24 12:20:11.110  78480665.

In [152]:
valid = []
not_valid = []
for r, g in test_df.groupby("REQNUM"):
    has_valid_block = False
    block_count = 0
    for b, g2 in g.groupby("BLKUID"):
        if g2["OBSTYPE"].isin(["EXPOSE", "SPECTRUM"]).any():
            has_valid_block = True
        block_count += 1

    if block_count > 10:
        print(g)
    
    if has_valid_block:
        valid.append(block_count)
    else:
        not_valid.append(block_count)
        
# print(valid, not_valid)
for i in set(valid):
    print(i, valid.count(i))
print()

for i in set(not_valid):
    print(i, not_valid.count(i))

                    datetime      BLKUID    EXPTIME FILTER INSTRUME   OBJECT  \
3318 2016-05-22 10:13:35.003  83300610.0  60.000000    air     en05  1435+02   
3319 2016-05-22 10:15:20.837  83300610.0  20.000000    air     en05  1435+02   
3320 2016-05-22 11:08:44.097  83303184.0  60.000000    air     en05  1435+02   
3321 2016-05-22 11:10:25.979  83303184.0  20.000000    air     en05  1435+02   
3322 2016-05-22 11:24:22.463  83304895.0  60.000000    air     en05  1435+02   
3323 2016-05-22 11:26:04.272  83304895.0  20.000000    air     en05  1435+02   
3324 2016-05-22 11:35:11.455  83305751.0  60.000000    air     en05  1435+02   
3325 2016-05-22 11:36:53.003  83305751.0  20.000000    air     en05  1435+02   
3326 2016-05-22 11:54:44.848  83306606.0  60.000000    air     en05  1435+02   
3327 2016-05-22 11:56:33.012  83306606.0  20.000000    air     en05  1435+02   
3328 2016-05-22 11:59:53.611  83307037.0  60.000000    air     en05  1435+02   
3329 2016-05-22 12:01:40.079  83307037.0

### Ensuring Blocks only occur at one telescope
At least in this paradigm, where there is little overlap between the telescopes?
It makes no sense for blocks to be spread across multiple telescopes, as the blocks include calibration frames like Arcs and Lampflats, and these are meaningless if not done on the same telescope as the scientific observation.

# Data Products

## RA and Dec Distribution

In [53]:
jbl.columns

Index(['blkuid', 'propid', 'start_date', 'duration', 'exposure_sum',
       'science_exposure_sum', 'time_efficiency',
       'exposure_science_efficiency', 'total_science_efficiency',
       'largest_gap', 'target', 'mean_ra', 'mean_dec', 'moving', 'pattern',
       'orphan', 'reqnum', 'instrument', 'num_exposures',
       'science_exposure_times', 'siteids', 'telids', 'block_times',
       'pattern_length', 'rounded_ra', 'rounded_dec'],
      dtype='object')

In [58]:
jbl["target"]

0               OBJECT   OBSTYPE         RA        DEC...
1          OBJECT OBSTYPE         RA        DEC    EXP...
2            OBJECT OBSTYPE         RA        DEC    E...
3            OBJECT OBSTYPE         RA        DEC    E...
4            OBJECT OBSTYPE         RA        DEC     ...
                              ...                        
1957            OBJECT   OBSTYPE          RA        DE...
1958            OBJECT OBSTYPE          RA        DEC ...
1959                   OBJECT OBSTYPE          RA     ...
1960              OBJECT OBSTYPE          RA        DE...
1961              OBJECT OBSTYPE          RA       DEC...
Name: target, Length: 2657, dtype: object

In [57]:
target_dict = {}
for i, row in jbl.iterrows():
    target = row["target"]
    ra = row["mean_ra"]
    dec = row["mean_dec"]
    print(target)
    print(row)
    break

        OBJECT   OBSTYPE         RA        DEC      EXPTIME  \
2  SDSS0153m10  SPECTRUM  28.323107 -10.003155  1800.000000   

                 datetime  
2 2016-02-01 10:14:29.486  
blkuid                                                                76701806.0
propid                                                              KEY2014A-002
start_date                                            2016-02-01 10:14:29.486000
duration                                                                  1800.0
exposure_sum                                                              1800.0
science_exposure_sum                                                      1800.0
time_efficiency                                                              1.0
exposure_science_efficiency                                                  1.0
total_science_efficiency                                                     1.0
largest_gap                                                                  0.0
target 

In [None]:
pattern_dict = {}
for p, c in jbl["pattern", "target"].value_counts().to_dict().items():
    for x in p:
        t = x[0][0]
        y = x[0][3]
        r = x[1]
        for _ in range(r):
            print(t, y)
    print()

    translated_pattern = []
    for i in p:
        frame_length = i[0][0]
        instances = i[1]
        # print(frame_length, instances)
        for _ in range(instances):
            translated_pattern.append(frame_length)

    pattern_tuple = tuple(translated_pattern)

    if pattern_tuple not in pattern_dict:
        pattern_dict[pattern_tuple] = {
            "count": 0,
            "targets": []
        }

    pattern_dict[pattern_tuple]["count"] += c
    
sorted(pattern_dict.items())

In [51]:
total = 0

for p, c in pattern_dict.items():
    total += c

patterns = {}
counter = 0
running_total = 0

for p, c in pattern_dict.items():
    print(c)
    running_total += c / total
    patterns[counter] = {
        "pattern": p,
        "count": c,
        "cumulative_probability": running_total
    }
    counter += 1

json.dump(patterns, open("observation_patterns_v2.json", "w"), indent=4)

Unnamed: 0,blkuid,propid,start_date,duration,exposure_sum,science_exposure_sum,time_efficiency,exposure_science_efficiency,total_science_efficiency,largest_gap,...,reqnum,instrument,num_exposures,science_exposure_times,siteids,telids,block_times,pattern_length,rounded_ra,rounded_dec
0,76701806.0,KEY2014A-002,2016-02-01 10:14:29.486,1800.000,1800.0,1800.0,1.00000,1.00000,1.00000,0.000,...,487801.0,en05,1,"(1800.000000,)","(coj,)","(2m0a,)",[0],1800.0,1.9,-10.0
1,76705538.0,FTPEPO2014A-004,2016-02-01 11:14:18.716,30.000,30.0,30.0,1.00000,1.00000,1.00000,0.000,...,490067.0,fs01,1,"(30.000000,)","(coj,)","(2m0a,)",[0],30.0,3.0,-46.5
2,76706489.0,ARI2015B-001,2016-02-01 11:19:01.488,92.822,60.0,60.0,0.64640,1.00000,0.64640,32.822,...,486725.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-66.5
3,76707521.0,ARI2015B-001,2016-02-01 11:25:02.636,92.628,60.0,60.0,0.64775,1.00000,0.64775,32.628,...,486743.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-64.9
4,76708615.0,FTPEPO2014A-004,2016-02-01 11:32:53.661,419.341,360.0,360.0,0.85849,1.00000,0.85849,29.995,...,489777.0,fs01,3,"(120.000000, 120.000000, 120.000000)","(coj,)","(2m0a,)","[0, 5, 10]",360.0,2.2,-63.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1957,88134994.0,KEY2014A-002,2016-07-31 07:26:44.492,1541.005,2270.0,1200.0,1.47306,0.52863,0.77871,96.322,...,671626.0,"(en06, kb42)",103,"(600.000000, 600.000000)","(ogg,)","(2m0a,)","[0, 0, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, ...",2270.0,16.2,65.7
1958,88140030.0,KEY2014A-002,2016-07-31 08:16:58.691,2671.759,2520.0,2520.0,0.94320,1.00000,0.94320,30.756,...,672957.0,fs02,8,"(300.000000, 300.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 10, 22, 32, 43, 56, 69, 79]",2520.0,16.2,65.7
1959,88142502.0,FTP2016A-001,2016-07-31 09:09:58.421,200.000,200.0,200.0,1.00000,1.00000,1.00000,0.000,...,536501.0,fs02,1,"(200.000000,)","(ogg,)","(2m0a,)",[0],200.0,17.5,-29.4
1960,88142517.0,KEY2014A-003,2016-07-31 09:20:54.700,3606.364,3400.0,3400.0,0.94278,1.00000,0.94278,35.490,...,672012.0,fs02,10,"(400.000000, 400.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 14, 28, 39, 50, 64, 78, 89, 100, 110]",3400.0,17.3,28.4


## Observation Distribution (without Patterns)

In [127]:
jdf = pd.concat([df1, df2], axis=0).reset_index(drop=True)
jdf

Unnamed: 0,datetime,BLKUID,EXPTIME,FILTER,INSTRUME,OBJECT,OBSTYPE,PROPID,REQNUM,RLEVEL,RA,DEC,SITEID,TELID,exptime_numeric
0,2016-02-01 10:14:29.486,76701806.0,1800.000000,air,en05,SDSS0153m10,SPECTRUM,KEY2014A-002,487801.0,0,28.323107,-10.003155,coj,2m0a,1800.0
1,2016-02-01 11:14:18.716,76705538.0,30.000000,V,fs01,g16aar,CATALOG,FTPEPO2014A-004,490067.0,0,,,coj,2m0a,30.0
2,2016-02-01 11:19:01.488,76706489.0,30.000000,V,fs01,NGC1818,EXPOSE,ARI2015B-001,486725.0,0,76.095609,-66.452102,coj,2m0a,30.0
3,2016-02-01 11:20:04.310,76706489.0,30.000000,I,fs01,NGC1818,CATALOG,ARI2015B-001,486725.0,0,,,coj,2m0a,30.0
4,2016-02-01 11:25:02.636,76707521.0,30.000000,V,fs01,NGC1831,EXPOSE,ARI2015B-001,486743.0,0,76.610291,-64.937112,coj,2m0a,30.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114652,2016-07-31 12:07:21.065,88146486.0,133.000000,V,fs02,1999 JU3,EXPOSE,LCO2016A-003,671156.0,0,292.205581,-5.173229,ogg,2m0a,133.0
114653,2016-07-31 12:10:04.338,88146486.0,133.000000,R,fs02,1999 JU3,EXPOSE,LCO2016A-003,671156.0,0,292.204543,-5.173259,ogg,2m0a,133.0
114654,2016-07-31 12:12:32.367,88146486.0,133.000000,R,fs02,1999 JU3,EXPOSE,LCO2016A-003,671156.0,0,292.203606,-5.173286,ogg,2m0a,133.0
114655,2016-07-31 12:15:26.020,88146486.0,133.000000,B,fs02,1999 JU3,EXPOSE,LCO2016A-003,671156.0,0,292.202508,-5.173318,ogg,2m0a,133.0


In [130]:
jdf["exptime_numeric"].value_counts().to_dict()

{10.0: 100808,
 60.0: 2651,
 20.0: 2143,
 300.0: 1217,
 200.0: 844,
 6.0: 823,
 120.0: 812,
 80.0: 761,
 30.0: 698,
 600.0: 450,
 133.0: 367,
 180.0: 326,
 100.0: 247,
 400.0: 230,
 1200.0: 217,
 1800.0: 191,
 3600.0: 163,
 15.0: 161,
 35.0: 140,
 55.0: 124,
 40.0: 122,
 50.0: 110,
 150.0: 92,
 2700.0: 91,
 240.0: 91,
 360.0: 84,
 4.0: 84,
 90.0: 66,
 0.0: 52,
 195.0: 48,
 25.0: 42,
 0.8: 40,
 70.0: 35,
 3000.0: 29,
 45.0: 24,
 8.0: 23,
 1500.0: 21,
 5.0: 18,
 125.0: 18,
 250.0: 14,
 233.0: 12,
 140.0: 11,
 29.0: 11,
 1.0: 10,
 900.0: 8,
 143.0: 8,
 78.0: 8,
 121.0: 8,
 46.0: 8,
 210.0: 6,
 82.0: 6,
 28.0: 5,
 32.0: 4,
 41.0: 4,
 232.5: 4,
 3.0: 4,
 130.0: 4,
 95.0: 4,
 700.0: 4,
 260.0: 4,
 220.0: 3,
 105.0: 3,
 12.0: 3,
 500.0: 3,
 2.0: 3,
 480.0: 3,
 170.0: 3,
 160.0: 3,
 321.0: 3,
 4000.0: 3,
 34.0: 2,
 4500.0: 2,
 320.0: 2,
 1100.0: 2,
 9.0: 2,
 1250.0: 1,
 2400.0: 1,
 800.0: 1,
 482.0: 1,
 75.0: 1,
 341.0: 1,
 33.0: 1}

## Pattern Distribution for Request Generation

In [49]:
jbl

Unnamed: 0,blkuid,propid,start_date,duration,exposure_sum,science_exposure_sum,time_efficiency,exposure_science_efficiency,total_science_efficiency,largest_gap,...,reqnum,instrument,num_exposures,science_exposure_times,siteids,telids,block_times,pattern_length,rounded_ra,rounded_dec
0,76701806.0,KEY2014A-002,2016-02-01 10:14:29.486,1800.000,1800.0,1800.0,1.00000,1.00000,1.00000,0.000,...,487801.0,en05,1,"(1800.000000,)","(coj,)","(2m0a,)",[0],1800.0,1.9,-10.0
1,76705538.0,FTPEPO2014A-004,2016-02-01 11:14:18.716,30.000,30.0,30.0,1.00000,1.00000,1.00000,0.000,...,490067.0,fs01,1,"(30.000000,)","(coj,)","(2m0a,)",[0],30.0,3.0,-46.5
2,76706489.0,ARI2015B-001,2016-02-01 11:19:01.488,92.822,60.0,60.0,0.64640,1.00000,0.64640,32.822,...,486725.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-66.5
3,76707521.0,ARI2015B-001,2016-02-01 11:25:02.636,92.628,60.0,60.0,0.64775,1.00000,0.64775,32.628,...,486743.0,fs01,2,"(30.000000, 30.000000)","(coj,)","(2m0a,)","[0, 2]",60.0,5.1,-64.9
4,76708615.0,FTPEPO2014A-004,2016-02-01 11:32:53.661,419.341,360.0,360.0,0.85849,1.00000,0.85849,29.995,...,489777.0,fs01,3,"(120.000000, 120.000000, 120.000000)","(coj,)","(2m0a,)","[0, 5, 10]",360.0,2.2,-63.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1957,88134994.0,KEY2014A-002,2016-07-31 07:26:44.492,1541.005,2270.0,1200.0,1.47306,0.52863,0.77871,96.322,...,671626.0,"(en06, kb42)",103,"(600.000000, 600.000000)","(ogg,)","(2m0a,)","[0, 0, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, ...",2270.0,16.2,65.7
1958,88140030.0,KEY2014A-002,2016-07-31 08:16:58.691,2671.759,2520.0,2520.0,0.94320,1.00000,0.94320,30.756,...,672957.0,fs02,8,"(300.000000, 300.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 10, 22, 32, 43, 56, 69, 79]",2520.0,16.2,65.7
1959,88142502.0,FTP2016A-001,2016-07-31 09:09:58.421,200.000,200.0,200.0,1.00000,1.00000,1.00000,0.000,...,536501.0,fs02,1,"(200.000000,)","(ogg,)","(2m0a,)",[0],200.0,17.5,-29.4
1960,88142517.0,KEY2014A-003,2016-07-31 09:20:54.700,3606.364,3400.0,3400.0,0.94278,1.00000,0.94278,35.490,...,672012.0,fs02,10,"(400.000000, 400.000000, 300.000000, 300.00000...","(ogg,)","(2m0a,)","[0, 14, 28, 39, 50, 64, 78, 89, 100, 110]",3400.0,17.3,28.4


In [50]:
jbl["pattern"].value_counts()

pattern
(((200.0, fs02, ip, EXPOSE), 1),)                                                                                                                                                                                                    141
(((10.0, kb42, LL, GUIDE), 5), ((1200.0, en06, air, SPECTRUM), 1), ((10.0, kb42, LL, GUIDE), 92), ((60.0, en06, air, ARC), 1), ((20.0, en06, air, LAMPFLAT), 1))                                                                      65
(((10.0, kb42, LL, GUIDE), 5), ((600.0, en06, air, SPECTRUM), 1), ((10.0, kb42, LL, GUIDE), 47), ((600.0, en06, air, SPECTRUM), 1), ((10.0, kb42, LL, GUIDE), 47), ((60.0, en06, air, ARC), 1), ((20.0, en06, air, LAMPFLAT), 1))     57
(((10.0, kb42, LL, GUIDE), 5), ((600.0, en06, air, SPECTRUM), 1), ((10.0, kb42, LL, GUIDE), 47), ((600.0, en06, air, SPECTRUM), 1), ((10.0, kb42, LL, GUIDE), 46), ((60.0, en06, air, ARC), 1), ((20.0, en06, air, LAMPFLAT), 1))     55
(((300.0, fs02, up, EXPOSE), 2), ((60.0, fs02, gp, EXPOSE), 

In [51]:
for i, row in jbl.iterrows():
    if row["time_efficiency"] > 1:
        exp_types = set()
        for e in row["pattern"]:
            exp_types.add(e[0][3])

        if "CATALOG" in exp_types:
            print(exp_types)
        
            print(row["time_efficiency"])
            print(row["pattern"])
            print(row["block_times"])

            print()

In [52]:
for pattern, group in jbl.groupby("pattern"):
    # print(group.columns)
    # print(pattern)
    
    print(group["time_efficiency"])
    # break

1835    0.04840
1837    0.04798
Name: time_efficiency, dtype: float64
270    0.04717
273    0.04727
276    0.04738
Name: time_efficiency, dtype: float64
73    0.11536
Name: time_efficiency, dtype: float64
1297    0.12946
Name: time_efficiency, dtype: float64
654    0.23081
664    0.23087
Name: time_efficiency, dtype: float64
1933    0.26173
Name: time_efficiency, dtype: float64
1900    0.22789
1935    0.22797
1947    0.22608
1954    0.22806
Name: time_efficiency, dtype: float64
1871    0.21668
Name: time_efficiency, dtype: float64
420     0.20073
568     0.19935
595     0.19710
736     0.19942
1549    0.19590
Name: time_efficiency, dtype: float64
275    0.28877
Name: time_efficiency, dtype: float64
987    0.35618
Name: time_efficiency, dtype: float64
655    0.41613
Name: time_efficiency, dtype: float64
683    0.38686
Name: time_efficiency, dtype: float64
734    1.0
Name: time_efficiency, dtype: float64
737     0.32231
865     0.33118
1088    0.33279
1670    0.33053
Name: time_efficienc

In [53]:
pattern_dict = {}
for p, c in jbl["pattern"].value_counts().to_dict().items():
    for x in p:
        t = x[0][0]
        y = x[0][3]
        r = x[1]
        for _ in range(r):
            print(t, y)
    print()

    translated_pattern = []
    for i in p:
        frame_length = i[0][0]
        instances = i[1]
        # print(frame_length, instances)
        for _ in range(instances):
            translated_pattern.append(frame_length)

    pattern_tuple = tuple(translated_pattern)
    if pattern_tuple not in pattern_dict:
        pattern_dict[pattern_tuple] = 0
    pattern_dict[pattern_tuple] += c
    
sorted(pattern_dict.items())

200.0 EXPOSE

10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
1200.0 SPECTRUM
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10.0 GUIDE
10

[((0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8,
   0.8),
  2),
 ((1.0, 1.0, 1.0), 3),
 ((2.0, 2.0, 5.0, 2.0, 5.0, 5.0), 1),
 ((3.0, 3.0, 3.0), 1),
 ((4.0, 4.0, 4.0, 4.0), 1),
 ((4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0), 6),
 ((4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0,
   4.0),
  1),
 ((5.0, 5.0, 5.0), 5),
 ((6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
   6.0,
 

In [54]:
sorted(pattern_dict.items(), reverse=True, key=lambda x: x[1])

[((200.0,), 274),
 ((200.0, 200.0, 200.0), 114),
 ((10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   1200.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   10.0,
   60.0,
   20.0),
  65),
 ((120.0, 120.0, 120.0), 58),
 ((10.0,
  

In [55]:
total = 0
for p, c in pattern_dict.items():
    total += c
print(total)

2657


In [56]:
import random
rnum = random.randint(0, total)

running_total = 0

for p, c in sorted(pattern_dict.items(), reverse=True, key=lambda x: x[1]):
    running_total += c
    if running_total >= rnum:
        print(p)
        break

(20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.0, 20.

In [57]:
total = 0

for p, c in pattern_dict.items():
    total += c

patterns = {}
counter = 0
running_total = 0

for p, c in pattern_dict.items():
    print(c)
    running_total += c / total
    patterns[counter] = {
        "pattern": p,
        "count": c,
        "cumulative_probability": running_total
    }
    counter += 1

json.dump(patterns, open("observation_patterns_v2.json", "w"), indent=4)

274
65
57
55
53
114
55
49
48
45
57
37
36
34
34
52
33
33
47
36
28
27
23
22
26
45
40
20
18
27
18
18
17
58
19
15
15
16
19
34
24
12
18
11
19
10
10
19
10
9
9
18
8
8
8
8
8
7
7
10
7
7
6
6
10
5
11
7
5
5
5
5
5
4
7
5
4
4
5
11
5
11
6
4
4
5
4
6
4
4
4
5
5
4
4
5
10
4
3
3
3
4
5
7
3
3
5
3
3
3
3
3
7
3
3
3
3
3
3
3
3
8
2
2
2
2
2
4
2
2
2
2
2
2
2
2
2
6
2
2
2
2
2
2
2
2
2
2
3
2
2
4
4
2
2
2
2
3
2
2
2
2
2
2
2
2
2
2
2
3
2
4
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
3
1
1
1
1
1
1
1
2
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
1
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1


In [58]:
t = 0
for i, p in patterns.items():
    print(p["cumulative_probability"])

0.10312382386149793
0.12758750470455402
0.14904027098231087
0.16974030861874295
0.18968761761385022
0.23259315016936397
0.253293187805796
0.2717350395182537
0.28980052691004893
0.30673692133985697
0.32818968761761386
0.3421151674821227
0.35566428302596914
0.3684606699284908
0.38125705683101246
0.40082800150545733
0.4132480240873166
0.42566804666917585
0.4433571697403087
0.45690628528415517
0.4674444862627024
0.47760632292058725
0.48626270229582247
0.4945427173503953
0.5043281896876177
0.5212645841174258
0.5363191569439218
0.5438464433571699
0.5506210011290932
0.560782837786978
0.5675573955589013
0.5743319533308245
0.5807301467820853
0.6025592773805046
0.6097101994730902
0.6153556642830262
0.6210011290929622
0.6270229582235607
0.6341738803161463
0.6469702672186679
0.6560030109145656
0.6605193827625144
0.6672939405344377
0.6714339480617241
0.6785848701543097
0.6823485133609337
0.6861121565675578
0.6932630786601434
0.6970267218667674
0.700414000752729
0.7038012796386905
0.7105758374106138

In [74]:
jbl[["pattern", "mean_ra", "mean_dec", "target"]]

Unnamed: 0,pattern,mean_ra,mean_dec,target
0,"(((1800.0, en05, air, SPECTRUM), 1),)",1.888207,-10.003155,SDSS0153m10
1,"(((30.0, fs01, V, EXPOSE), 1), ((30.0, fs01, I...",5.073041,-66.452102,NGC1818
2,"(((30.0, fs01, V, EXPOSE), 1), ((30.0, fs01, I...",5.107353,-64.937112,NGC1831
3,"(((120.0, fs01, V, CATALOG), 1), ((120.0, fs01...",2.150202,-63.253123,G16 aar
4,"(((1800.0, en05, air, SPECTRUM), 1), ((80.0, e...",9.709387,-16.953073,AT 2016zb
...,...,...,...,...
1845,"(((10.0, kb42, LL, GUIDE), 5), ((600.0, en06, ...",16.230727,65.710628,Mrk876
1846,"(((300.0, fs02, gp, EXPOSE), 2), ((300.0, fs02...",16.235147,65.702387,Mrk876
1847,"(((200.0, fs02, I, EXPOSE), 1),)",17.478693,-29.379054,XTE J1728-295
1848,"(((400.0, fs02, B, EXPOSE), 2), ((300.0, fs02,...",17.278916,28.353039,ptf16bad


---

# Data Exploration

In [272]:
raw

Unnamed: 0,id,basename,area,related_frames,version_set,filename,url,RLEVEL,DAY_OBS,DATE_OBS,...,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,datetime,RA,DEC,exptime_numeric
0,4560853,coj2m002-fs01-20160731-0042-d00,"{'type': 'Polygon', 'coordinates': [[[14.62160...",[19720335],"[{'id': 20838003, 'created': '2020-03-16T13:37...",coj2m002-fs01-20160731-0042-d00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-07-31,2016-07-31T06:26:36.154000Z,...,900.000000,air,2016-07-31T06:26:36.154000Z,DARK,88091935.0,,2016-07-31 06:26:36.154,14.557767,-58.876407,900.0
1,19720335,coj2m002-fs01-20160731-0042-d91,"{'type': 'Polygon', 'coordinates': [[[14.62134...","[4560853, 3536542]","[{'id': 20843168, 'created': '2020-03-16T15:23...",coj2m002-fs01-20160731-0042-d91.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,91,2016-07-31,2016-07-31T06:26:36.154000Z,...,900.000000,air,2016-07-31T06:26:36.154000Z,DARK,88091935.0,,2016-07-31 06:26:36.154,14.558595,-58.876186,900.0
2,4560692,coj2m002-fs01-20160731-0041-d00,"{'type': 'Polygon', 'coordinates': [[[10.66148...",[19720342],"[{'id': 20838012, 'created': '2020-03-16T13:37...",coj2m002-fs01-20160731-0041-d00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-07-31,2016-07-31T06:10:50.967000Z,...,900.000000,air,2016-07-31T06:10:50.967000Z,DARK,88091935.0,,2016-07-31 06:10:50.967,10.597648,-58.877839,900.0
3,19720342,coj2m002-fs01-20160731-0041-d91,"{'type': 'Polygon', 'coordinates': [[[10.66122...","[4560692, 3536542]","[{'id': 20843175, 'created': '2020-03-16T15:23...",coj2m002-fs01-20160731-0041-d91.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,91,2016-07-31,2016-07-31T06:10:50.967000Z,...,900.000000,air,2016-07-31T06:10:50.967000Z,DARK,88091935.0,,2016-07-31 06:10:50.967,10.598476,-58.877618,900.0
4,4560594,coj2m002-fs01-20160731-0040-d00,"{'type': 'Polygon', 'coordinates': [[[6.701590...",[19720357],"[{'id': 20838028, 'created': '2020-03-16T13:38...",coj2m002-fs01-20160731-0040-d00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-07-31,2016-07-31T05:55:05.764000Z,...,900.000000,air,2016-07-31T05:55:05.764000Z,DARK,88091935.0,,2016-07-31 05:55:05.764,6.637748,-58.878823,900.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
118,146227,coj2m002-fs01-20160201-0003-b00,"{'type': 'Polygon', 'coordinates': [[[179.9397...","[146396, 19714860]","[{'id': 20827059, 'created': '2020-03-16T09:17...",coj2m002-fs01-20160201-0003-b00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-02-01,2016-02-01T05:21:50.351000Z,...,0.000000,air,2016-02-01T05:21:50.351000Z,BIAS,76661666.0,,2016-02-01 05:21:50.351,107.876266,-58.696566,0.0
119,19714895,coj2m002-fs01-20160201-0002-b91,"{'type': 'Polygon', 'coordinates': [[[178.7851...","[146242, 3536542]","[{'id': 20829619, 'created': '2020-03-16T10:38...",coj2m002-fs01-20160201-0002-b91.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,91,2016-02-01,2016-02-01T05:17:59.153000Z,...,0.000000,air,2016-02-01T05:17:59.153000Z,BIAS,76661666.0,,2016-02-01 05:17:59.153,178.722735,-58.696325,0.0
120,146242,coj2m002-fs01-20160201-0002-b00,"{'type': 'Polygon', 'coordinates': [[[178.7854...",[19714895],"[{'id': 20827073, 'created': '2020-03-16T09:17...",coj2m002-fs01-20160201-0002-b00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-02-01,2016-02-01T05:17:59.153000Z,...,0.000000,air,2016-02-01T05:17:59.153000Z,BIAS,76661666.0,,2016-02-01 05:17:59.153,178.721912,-58.696545,0.0
121,146233,coj2m002-fs01-20160201-0001-b00,"{'type': 'Polygon', 'coordinates': [[[178.7854...","[146411, 19714998]","[{'id': 20827126, 'created': '2020-03-16T09:17...",coj2m002-fs01-20160201-0001-b00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-02-01,2016-02-01T05:17:13.377000Z,...,0.000000,air,2016-02-01T05:17:13.377000Z,BIAS,76661666.0,,2016-02-01 05:17:13.377,178.721912,-58.696545,0.0


In [53]:
raw.columns

Index(['id', 'basename', 'area', 'related_frames', 'version_set', 'filename',
       'url', 'RLEVEL', 'DAY_OBS', 'DATE_OBS', 'PROPID', 'INSTRUME', 'OBJECT',
       'SITEID', 'TELID', 'EXPTIME', 'FILTER', 'L1PUBDAT', 'OBSTYPE', 'BLKUID',
       'REQNUM'],
      dtype='object')

In [54]:
raw["related_frames"]

0              [19720335]
1      [4560853, 3536542]
2              [19720342]
3      [4560692, 3536542]
4              [19720357]
              ...        
118    [146396, 19714860]
119     [146242, 3536542]
120            [19714895]
121    [146411, 19714998]
122     [146233, 3536542]
Name: related_frames, Length: 20123, dtype: object

In [55]:
raw.iloc[0]

id                                                          4560853
basename                            coj2m002-fs01-20160731-0042-d00
area              {'type': 'Polygon', 'coordinates': [[[14.62160...
related_frames                                           [19720335]
version_set       [{'id': 20838003, 'created': '2020-03-16T13:37...
filename                    coj2m002-fs01-20160731-0042-d00.fits.fz
url               https://archive-lco-global.s3.amazonaws.com/co...
RLEVEL                                                            0
DAY_OBS                                                  2016-07-31
DATE_OBS                                2016-07-31T06:26:36.154000Z
PROPID                                                    calibrate
INSTRUME                                                       fs01
OBJECT                                                             
SITEID                                                          coj
TELID                                           

In [57]:
raw[raw["id"].isin([19720335, 4560853, 3536542])]

Unnamed: 0,id,basename,area,related_frames,version_set,filename,url,RLEVEL,DAY_OBS,DATE_OBS,...,INSTRUME,OBJECT,SITEID,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM
0,4560853,coj2m002-fs01-20160731-0042-d00,"{'type': 'Polygon', 'coordinates': [[[14.62160...",[19720335],"[{'id': 20838003, 'created': '2020-03-16T13:37...",coj2m002-fs01-20160731-0042-d00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-07-31,2016-07-31T06:26:36.154000Z,...,fs01,,coj,2m0a,900.0,air,2016-07-31T06:26:36.154000Z,DARK,88091935.0,
1,19720335,coj2m002-fs01-20160731-0042-d91,"{'type': 'Polygon', 'coordinates': [[[14.62134...","[4560853, 3536542]","[{'id': 20843168, 'created': '2020-03-16T15:23...",coj2m002-fs01-20160731-0042-d91.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,91,2016-07-31,2016-07-31T06:26:36.154000Z,...,fs01,,coj,2m0a,900.0,air,2016-07-31T06:26:36.154000Z,DARK,88091935.0,


In [66]:
for i, r in raw.iterrows():
    print("Prime Frame: {} / {} / {}".format(r["id"], r["OBSTYPE"], r["RLEVEL"]))
    # print(r["id"], r["RLEVEL"])
    
    group = raw[raw["id"].isin(r["related_frames"])]

    for i2, r2 in group.iterrows():
        print("Rel. Frame: {} / {} / {}".format(r2["id"], r2["OBSTYPE"], r2["RLEVEL"]))
    print(r["related_frames"])
    
    # print(group["id"])
    
    # print(i)
    # print(r)
    # print(r["id"])

    
    
    # break
    print()

Prime Frame: 4560853 / DARK / 0
Rel. Frame: 19720335 / DARK / 91
[19720335]

Prime Frame: 19720335 / DARK / 91
Rel. Frame: 4560853 / DARK / 0
[4560853, 3536542]

Prime Frame: 4560692 / DARK / 0
Rel. Frame: 19720342 / DARK / 91
[19720342]

Prime Frame: 19720342 / DARK / 91
Rel. Frame: 4560692 / DARK / 0
[4560692, 3536542]

Prime Frame: 4560594 / DARK / 0
Rel. Frame: 19720357 / DARK / 91
[19720357]

Prime Frame: 19720357 / DARK / 91
Rel. Frame: 4560594 / DARK / 0
[4560594, 3536542]

Prime Frame: 4565944 / DARK / 91
Rel. Frame: 4560357 / DARK / 0
Rel. Frame: 4565943 / BIAS / 91
[4560357, 4565943, 3536542]

Prime Frame: 4560401 / DARK / 0
Rel. Frame: 19720344 / DARK / 91
[19720344]

Prime Frame: 19720344 / DARK / 91
Rel. Frame: 4560401 / DARK / 0
[4560401, 3536542]

Prime Frame: 4560357 / DARK / 0
Rel. Frame: 4565944 / DARK / 91
Rel. Frame: 19720349 / DARK / 91
[4565944, 19720349]

Prime Frame: 19720349 / DARK / 91
Rel. Frame: 4560357 / DARK / 0
[4560357, 3536542]

Prime Frame: 4560285 / D

KeyboardInterrupt: 

In [77]:
raw[raw["id"]==4557664]

Unnamed: 0,id,basename,area,related_frames,version_set,filename,url,RLEVEL,DAY_OBS,DATE_OBS,...,INSTRUME,OBJECT,SITEID,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM
229,4557664,coj2m002-fs01-20160730-0139-e00,"{'type': 'Polygon', 'coordinates': [[[35.43906...",[4558700],"[{'id': 20837874, 'created': '2020-03-16T13:37...",coj2m002-fs01-20160730-0139-e00.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,0,2016-07-30,2016-07-30T19:38:19.819000Z,...,fs01,ASASSN-16gy,coj,2m0a,40.0,I,2016-07-30T19:38:19.819000Z,EXPOSE,88098658.0,655782.0


In [78]:
raw[raw["id"]==4558700]

Unnamed: 0,id,basename,area,related_frames,version_set,filename,url,RLEVEL,DAY_OBS,DATE_OBS,...,INSTRUME,OBJECT,SITEID,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM
228,4558700,coj2m002-fs01-20160730-0139-e91,"{'type': 'Polygon', 'coordinates': [[[35.43389...","[4557664, 4558630, 4558631, 4532523, 3536541]","[{'id': 4769041, 'created': '2016-07-31T00:05:...",coj2m002-fs01-20160730-0139-e91.fits.fz,https://archive-lco-global.s3.amazonaws.com/co...,91,2016-07-30,2016-07-30T19:38:19.819000Z,...,fs01,ASASSN-16gy,coj,2m0a,40.0,I,2016-07-30T19:38:19.819000Z,EXPOSE,88098658.0,655782.0


In [None]:
def reduce_frames(df):
    obs_groups = df.groupby('datetime')
    expected_frames = len(obs_groups)

    new_df = pd.concat([
        obs_frames[ obs_frames['RLEVEL'] == obs_frames['RLEVEL'].max() ].head(1)
        for d, obs_frames in obs_groups ]
    )

    resultant_frames = len(new_df)
    if resultant_frames != expected_frames:
        print("Unexpected number of frames returned: {} expected, {} received.".format(\
            expected_frames, resultant_frames))
    return new_df

df = reduce_frames(df)

In [92]:
obs_groups = raw.groupby('datetime')
expected_frames = len(obs_groups)
print(expected_frames)
new_df = pd.concat([
    obs_frames[ obs_frames['RLEVEL'] == obs_frames['RLEVEL'].min() ].head(1)
    for d, obs_frames in obs_groups ]
)

10627


In [93]:
new_df["RLEVEL"].unique()

array([ 0, 91], dtype=int64)

# Comparing with current Archive API

In [1]:
import requests
import json
import urllib
import pandas as pd
import numpy as np

def test_archive_api():
    base_url = "https://archive-api.lco.global/frames/?"
    param_dict = {
        "start": "2016-02-01 00:00",
        "end": "2016-08-01 00:00",
        "public": "true",
        "SITEID": "coj",
        "TELID": "1m0a",
        "limit": 100,
        "offset": 0
    }
    request_url = base_url + urllib.parse.urlencode(param_dict)
    r = requests.get(url=request_url)
    df = pd.DataFrame(r.json()["results"])
    return df

new_df = test_archive_api()

In [2]:
new_df

Unnamed: 0,id,basename,observation_date,observation_day,proposal_id,instrument_id,target_name,reduction_level,site_id,telescope_id,...,SITEID,TELID,EXPTIME,FILTER,L1PUBDAT,OBSTYPE,BLKUID,REQNUM,area,related_frames
0,4565855,coj1m003-kb77-20160731-0300-d00,2016-07-31T22:23:39.951000Z,2016-07-31,calibrate,kb77,,0,coj,1m0a,...,coj,1m0a,299.236,R,2016-07-31T22:23:39.951000Z,DARK,88091929,,"{'type': 'Polygon', 'coordinates': [[[58.67261...",[]
1,4565846,coj1m003-kb77-20160731-0299-d00,2016-07-31T22:18:24.589000Z,2016-07-31,calibrate,kb77,,0,coj,1m0a,...,coj,1m0a,299.240,R,2016-07-31T22:18:24.589000Z,DARK,88091929,,"{'type': 'Polygon', 'coordinates': [[[57.35386...",[]
2,4565776,coj1m003-kb77-20160731-0298-d00,2016-07-31T22:13:09.198000Z,2016-07-31,calibrate,kb77,,0,coj,1m0a,...,coj,1m0a,299.236,R,2016-07-31T22:13:09.198000Z,DARK,88091929,,"{'type': 'Polygon', 'coordinates': [[[56.03526...",[]
3,4565777,coj1m003-kb77-20160731-0297-d00,2016-07-31T22:07:53.758000Z,2016-07-31,calibrate,kb77,,0,coj,1m0a,...,coj,1m0a,299.237,R,2016-07-31T22:07:53.758000Z,DARK,88091929,,"{'type': 'Polygon', 'coordinates': [[[54.71614...",[]
4,4565697,coj1m003-kb77-20160731-0296-d00,2016-07-31T22:02:38.295000Z,2016-07-31,calibrate,kb77,,0,coj,1m0a,...,coj,1m0a,299.236,R,2016-07-31T22:02:38.295000Z,DARK,88091929,,"{'type': 'Polygon', 'coordinates': [[[53.39417...",[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,4564796,coj1m003-kb77-20160731-0237-e00,2016-07-31T18:42:17.761000Z,2016-07-31,CON2016A-005,kb77,SDSS234540.44-093610.1,0,coj,1m0a,...,coj,1m0a,99.695,rp,2017-07-31T18:42:17.761000Z,EXPOSE,88174203,614350.0,"{'type': 'Polygon', 'coordinates': [[[-3.43029...",[4567330]
96,4567330,coj1m003-kb77-20160731-0237-e91,2016-07-31T18:42:17.761000Z,2016-07-31,CON2016A-005,kb77,SDSS234540.44-093610.1,91,coj,1m0a,...,coj,1m0a,99.695,rp,2017-07-31T18:42:17.761000Z,EXPOSE,88174203,614350.0,"{'type': 'Polygon', 'coordinates': [[[-3.42959...","[4564796, 4566946, 4566945, 4532857, 4380129]"
97,4564790,coj1m003-kb77-20160731-0236-e00,2016-07-31T18:40:29.161000Z,2016-07-31,CON2016A-005,kb77,SDSS234540.44-093610.1,0,coj,1m0a,...,coj,1m0a,99.693,rp,2017-07-31T18:40:29.161000Z,EXPOSE,88174203,614350.0,"{'type': 'Polygon', 'coordinates': [[[-3.43029...",[4567329]
98,4567329,coj1m003-kb77-20160731-0236-e91,2016-07-31T18:40:29.161000Z,2016-07-31,CON2016A-005,kb77,SDSS234540.44-093610.1,91,coj,1m0a,...,coj,1m0a,99.693,rp,2017-07-31T18:40:29.161000Z,EXPOSE,88174203,614350.0,"{'type': 'Polygon', 'coordinates': [[[-3.42958...","[4564790, 4566946, 4566945, 4532857, 4380129]"


In [9]:
new_df.columns

Index(['id', 'basename', 'observation_date', 'observation_day', 'proposal_id',
       'instrument_id', 'target_name', 'reduction_level', 'site_id',
       'telescope_id', 'exposure_time', 'primary_optical_element',
       'public_date', 'configuration_type', 'observation_id', 'request_id',
       'version_set', 'url', 'filename', 'DATE_OBS', 'DAY_OBS', 'PROPID',
       'INSTRUME', 'OBJECT', 'RLEVEL', 'SITEID', 'TELID', 'EXPTIME', 'FILTER',
       'L1PUBDAT', 'OBSTYPE', 'BLKUID', 'REQNUM', 'area', 'related_frames'],
      dtype='object')

In [10]:
raw.columns

Index(['id', 'basename', 'area', 'related_frames', 'version_set', 'filename',
       'url', 'RLEVEL', 'DAY_OBS', 'DATE_OBS', 'PROPID', 'INSTRUME', 'OBJECT',
       'SITEID', 'TELID', 'EXPTIME', 'FILTER', 'L1PUBDAT', 'OBSTYPE', 'BLKUID',
       'REQNUM', 'datetime', 'RA', 'DEC', 'exptime_numeric'],
      dtype='object')

In [11]:
print([x for x in new_df.columns if x not in raw.columns])
print()
print([x for x in raw.columns if x not in new_df.columns])

['observation_date', 'observation_day', 'proposal_id', 'instrument_id', 'target_name', 'reduction_level', 'site_id', 'telescope_id', 'exposure_time', 'primary_optical_element', 'public_date', 'configuration_type', 'observation_id', 'request_id']

['datetime', 'RA', 'DEC', 'exptime_numeric']


In [13]:
new_df.iloc[0]

id                                                                   4565855
basename                                     coj1m003-kb77-20160731-0300-d00
observation_date                                 2016-07-31T22:23:39.951000Z
observation_day                                                   2016-07-31
proposal_id                                                        calibrate
instrument_id                                                           kb77
target_name                                                                 
reduction_level                                                            0
site_id                                                                  coj
telescope_id                                                            1m0a
exposure_time                                                        299.236
primary_optical_element                                                    R
public_date                                      2016-07-31T22:23:39.951000Z