<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/k2_pix_small.png">
*This notebook contains instructional material from the [K2 Guest Observer Office](https://keplerscience.arc.nasa.gov/); the content is available [on GitHub](https://github.com/gully/k2-metadata).*


<!--NAVIGATION-->
< [Munge metadata into tidy dataframes](01.00-Munge-metadata-into-tidy-dataframes.ipynb) | [Contents](Index.ipynb) | [EPIC Catalog Column Descriptions](01.02-EPIC_catalog_column_descriptions.ipynb) >

# K2 Guest Observer Proposal Information

Most targets observed in the K2 mission are selected through the *Guest Observer* proposal program in which community members propose targets through a competitive application process.  The successful proposals are assigned Guest Observer "Investigation IDs", which help track who proposed for what targets.  

The Kepler/K2 Guest Observer office provides [machine-readable tables](https://keplerscience.arc.nasa.gov/k2-approved-programs.html) of targets with the accompanying list of Investigation IDs on our website (select the [Target list (csv)](https://keplerscience.arc.nasa.gov/data/campaigns/c15/K2Campaign15targets.csv) link).  Some popular targets are proposed for by multiple groups; these have more than one accompanying Investigation ID delimited by pipes "|". There are typically a few tens of thousands of targets per campaign.

In this notebook we programmatically read in the campaign-level tables and concatenate them into a single large csv file.

The Guest Observer proposals also include RA and DEC and Kepler magnitude; this information is usually redundant with the EPIC catalog, and is provided for convenience.  Some targets were or will-be re-observed in multiple campaigns, so we must include the campaign alongside the EPIC ID.

In [1]:
#! wget –quiet --directory-prefix=../metadata/raw/proposed_targets/ https://keplerscience.arc.nasa.gov/data/campaigns/c18/K2Campaign18targets.csv
#! wget –quiet --directory-prefix=../metadata/raw/proposed_targets/ https://keplerscience.arc.nasa.gov/data/campaigns/c19/K2Campaign19targets.csv    

In [2]:
import pandas as pd
import glob

We will also use natsort to naturally sort strings as digits.

In [3]:
#! pip install natsort
import natsort

The Guest Observer files are just csv files.

In [4]:
! head -n 2 ../metadata/raw/proposed_targets/K2Campaign5targets.csv

EPIC ID, RA (J2000) [deg], Dec (J2000) [deg], magnitude, Investigation IDs
200008644, , , , LC_M67_TILE|GO5031_LC


In [5]:
! tail -n 5 ../metadata/raw/proposed_targets/K2Campaign5targets.csv

228682514, 123.4091492, 19.2470398, 20.43, GO5069_LC
228682515, 126.2561111, 21.5693302, 20.46, GO5069_LC
228682516, 123.1126404, 19.2214298, 20.81, GO5069_LC
228682517, 129.8887024, 23.5692005, 15.28, GO5073_SC|GO5073_LC
228683400, 127.3239975, 11.5666857, 16.74, GO5097_LC


Make a list of all of the csv files in the `proposed_targets` directory.

In [6]:
fns = glob.glob('../metadata/raw/proposed_targets/*.csv')

Use the nifty natsort package to sort the strings naturally.

In [7]:
fns = natsort.natsorted(fns)

In [8]:
fns

['../metadata/raw/proposed_targets/K2Campaign0targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign1targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign2targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign3targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign4targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign5targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign6targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign7targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign8targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign9atargets.csv',
 '../metadata/raw/proposed_targets/K2Campaign9btargets.csv',
 '../metadata/raw/proposed_targets/K2Campaign10targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign11targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign12targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign13targets.csv',
 '../metadata/raw/proposed_targets/K2Campaign14targets.csv',
 '../metadata/raw/proposed_target

In [9]:
df_all = pd.DataFrame()

Campaigns 18 and 19 have non-standard columns.

In [10]:
keys = ['EPIC', 'RA', 'Dec', 'Kp', 'InvestigationIDs']
vals = ['EPIC ID', 'RA (J2000) [deg]', 'Dec (J2000) [deg]', 'magnitude',
       'Investigation IDs']
conv_col_dict={key:val for key, val in zip(keys, vals)}

conv_col_dict['rafloat'] = 'RA (J2000) [deg]'
conv_col_dict['decfloat'] = 'Dec (J2000) [deg]'

In [11]:
for fn in fns:
    id_min, id_max = fn.rfind('K2Campaign')+10, fn.find('targets.csv')
    campaign = fn[id_min:id_max]
    df = pd.read_csv(fn)
    # We need to clean the columns due to errant Campaign 1 whitespace
    df.rename(columns={col:col.strip(' ') for col in df.columns}, inplace=True)
    if campaign in ['18', '19']:
        df.rename(columns=conv_col_dict, inplace=True)
    df['campaign'] = campaign
    df_all = df_all.append(df, ignore_index=True)
    print(campaign, end=' ')

0 1 2 3 4 5 6 7 8 9a 9b 10 11 12 13 14 15 16 17 18 19 

In [12]:
df_all = df_all[vals + ['campaign']]

In [13]:
df_all.iloc[70000:70005]

Unnamed: 0,EPIC ID,RA (J2000) [deg],Dec (J2000) [deg],magnitude,Investigation IDs,campaign
70000,210647774,54.196178,17.63787,11.286,GO4060_LC|GO4033_LC|GO4007_LC,4
70001,210647804,51.69544,17.638214,13.587,GO4020_LC|GO4060_LC|GO4011_LC,4
70002,210647813,52.355005,17.638277,12.824,GO4029_LC|GO4033_LC|GO4007_LC,4
70003,210647818,56.497824,17.638409,17.511,GO4011_LC,4
70004,210648137,58.730242,17.642622,13.075,GO4020_LC|GO4060_LC|GO4007_LC,4


Most entries are populated, with some exceptions to RA, DEC, and magnitude.

In [14]:
df_all.isnull().describe()

Unnamed: 0,EPIC ID,RA (J2000) [deg],Dec (J2000) [deg],magnitude,Investigation IDs,campaign
count,588991,588991,588991,588991,588991,588991
unique,1,2,2,2,1,1
top,False,False,False,False,False,False
freq,588991,562219,562219,562218,588991,588991


Most recent campaigns have about 40,000 targets proposed.

In [15]:
vec = df_all.campaign.value_counts()
vec[natsort.natsorted(vec.index)]

0      7902
1     21647
2     16665
3     16833
4     17202
5     25774
6     47550
7     15085
8     29939
9a     3417
9b     3550
10    41531
11    32810
12    45951
13    26170
14    39026
15    35078
16    35571
17    46229
18    36909
19    44152
Name: campaign, dtype: int64

In [16]:
df_all.shape

(588991, 6)

Naively, the K2 mission has targeted almost half-a-million sources!  However, this coarse count also includes "tiles" and custom aperture masks designed to mosaic a single complex scene, itself composed of multiple, extended, or moving objects.

In [17]:
df_all.to_csv('../metadata/tidy/GO_proposal_metadata.csv', index=False)

In [18]:
df_all.columns

Index(['EPIC ID', 'RA (J2000) [deg]', 'Dec (J2000) [deg]', 'magnitude',
       'Investigation IDs', 'campaign'],
      dtype='object')

In [19]:
! du -hs ../metadata/tidy/GO_proposal_metadata.csv

 33M	../metadata/tidy/GO_proposal_metadata.csv


In [20]:
! wc -l ../metadata/tidy/GO_proposal_metadata.csv

  588992 ../metadata/tidy/GO_proposal_metadata.csv


In [21]:
! head ../metadata/tidy/GO_proposal_metadata.csv

EPIC ID,RA (J2000) [deg],Dec (J2000) [deg],magnitude,Investigation IDs,campaign
200000811, , , , LC_M35_TILE,0
200000812, , , , LC_M35_TILE,0
200000813, , , , LC_M35_TILE,0
200000814, , , , LC_M35_TILE,0
200000815, , , , LC_M35_TILE,0
200000816, , , , LC_M35_TILE,0
200000817, , , , LC_M35_TILE,0
200000818, , , , LC_M35_TILE,0
200000819, , , , LC_M35_TILE,0


In [22]:
! open ../metadata/tidy/

All done, the combined file is 33 MB, with over half-a-million rows.

<!--NAVIGATION-->
< [Munge metadata into tidy dataframes](01.00-Munge-metadata-into-tidy-dataframes.ipynb) | [Contents](Index.ipynb) | [EPIC Catalog Column Descriptions](01.02-EPIC_catalog_column_descriptions.ipynb) >