This program takes the information from the Observations, Objects, and iraf_dr5.1 tables and combines it into a single table, with a row per object. 

It also pulls information for each target from APASS, 2MASS, and UCAC4 based on RA/Dec matching. 

In [1]:
import pandas as pd
import os
import numpy as np
from astropy.io import fits

Read in the table of observations produced by Janez.

In [2]:
obs_data=pd.read_csv("/Users/kschles/Documents/GALAH/iraf_v5.1/observations.csv", low_memory=False)

Isolate science observations. Here I remove calibration images (biases/flats/arcs). I also extract all observations with CCD1 because I don't need to know all the information from each CCD. 

In [3]:
obs_data['ccd']=((obs_data['runccd_id'].astype(str)).str[10:11]).astype(int)
obs1_data=obs_data.loc[np.where((obs_data['ccd']==1) & ((obs_data['ndfclass_updated']=='MFOBJECT') | (obs_data['ndfclass_updated']=='MFFLX')))[0]]

Group the observations by cob_id. 

In [4]:
temp=obs1_data.groupby(['cob_id'], as_index=False)
cobid_obsdata=temp.agg({'runccd_id': 'first', 'ccd' : 'first', 'plate' : 'first', 'cfg_file': 'first', 'cenra': 'first', 'cendec': 'first', 'obsid': 'first', 'exposed': np.sum, 'std_name': 'first', 'qflag': 'first', 'oclass': 'first','ndfclass_updated': 'first'})

Read in the IRAF output table 

In [5]:
iraf_data=pd.read_csv("/Users/kschles/Documents/GALAH/iraf_v5.1/iraf_dr51_03072016.csv")
iraf_data.rename(columns=lambda x: x.strip(), inplace=True)
iraf_data['cob_id']=((iraf_data['name'].astype(str)).str[0:10]).astype(int)
iraf_data['cobpivot']=((iraf_data['name'].astype(str)).str[0:15])

Rather than being by object, the IRAF output table is organised by object AND ccd. So each observed object has a row for each CCD. To get the GUESS info for each CCD, I have to break it down. 

In [6]:
## CCD1 Data
ccd1_data=iraf_data.loc[np.where(iraf_data['ccd']==1)[0], ['cobpivot', 'v', 'snr', 'snr2', 'snr_guess']]
ccd1_data.rename(columns={'v': 'v_ccd1', 'snr' : 'snr_1', 'snr2' : 'snr2_1', 'snr_guess' : 'snr_guess_1'}, inplace=True)
ccd2_data=iraf_data.loc[np.where(iraf_data['ccd']==2)[0], ['cobpivot', 'v',  'snr', 'snr2', 'snr_guess']]
ccd2_data.rename(columns={'v': 'v_ccd2', 'snr' : 'snr_2', 'snr2' : 'snr2_2', 'snr_guess' : 'snr_guess_2'}, inplace=True)
ccd3_data=iraf_data.loc[np.where(iraf_data['ccd']==3)[0], ['cobpivot', 'v',  'snr', 'snr2', 'snr_guess']]
ccd3_data.rename(columns={'v': 'v_ccd3', 'snr' : 'snr_3', 'snr2' : 'snr2_3', 'snr_guess' : 'snr_guess_3'}, inplace=True)
ccd4_data=iraf_data.loc[np.where(iraf_data['ccd']==4)[0], ['cobpivot',  'v', 'snr', 'snr2', 'snr_guess']]
ccd4_data.rename(columns={'v': 'v_ccd4', 'snr' : 'snr_4', 'snr2' : 'snr2_4', 'snr_guess' : 'snr_guess_4'}, inplace=True)


Trim down to one entry per object, rather than one entry per object and CCD

In [7]:
iraf_agg=iraf_data[['name', 'cobpivot','cob_id', 'pivot', 'dirname', 'mag', 'radeg', 'dedeg', 'glon', 'glat', 'ebv', 'teff', 'logg', 'feh', 'combine_method', 'galah_id', 'v_comb', 'wavelength_flag']].groupby('cobpivot', as_index=False).first()

In [8]:
## Now combine the information from the other CCDs into the main table. This will give you one row 
## per combined spectra with the information from each CCD in an individual column. 
temp1=pd.merge(iraf_agg, ccd1_data, how='left', on='cobpivot')
temp2=pd.merge(temp1, ccd2_data, how='left', on='cobpivot')
temp3=pd.merge(temp2, ccd3_data, how='left', on='cobpivot')
iraf_output=pd.merge(temp3, ccd4_data, how='left', on='cobpivot')


In [9]:
## The GUESS values are often set to 999. or 9999. rather than null. So I need to adjust this. 
## Note also that v_ccd4 is invalid; it is merely a copy of v_comb. 
iraf_output.loc[np.where(iraf_output['v_ccd1'].astype(float)==999.)[0],'v_ccd1']=np.nan
iraf_output.loc[np.where(iraf_output['v_ccd2'].astype(float)==999.)[0],'v_ccd2']=np.nan
iraf_output.loc[np.where(iraf_output['v_ccd3'].astype(float)==999.)[0],'v_ccd3']=np.nan
iraf_output.loc[np.where(iraf_output['v_ccd4'].astype(float)==999.)[0],'v_ccd4']=np.nan
iraf_output.loc[np.where(iraf_output['teff'].astype(float)==9999.)[0],'teff']=np.nan
iraf_output.loc[np.where(iraf_output['logg'].astype(float)==9999.)[0],'logg']=np.nan
iraf_output.loc[np.where(iraf_output['feh'].astype(float)==9999.)[0],'feh']=np.nan

## Recalculate the mean and std radial velocity now that the values are not 999. Janez's v_comb
## combines v_ccd1, v_ccd2, and v_ccd3 as a weighted average (where the weights come from S/N). 
## However, this was done with 999. values rather than NaN which makes many of the values invalid. 
## Here I recalculate the mean RV and standard deviation without any weighting or 999. values. 
iraf_output['vmean']=iraf_output.loc[:, ['v_ccd1', 'v_ccd2', 'v_ccd3']].astype(float).mean(axis=1)
iraf_output['vstd']=iraf_output.loc[:, ['v_ccd1', 'v_ccd2', 'v_ccd3']].astype(float).std(axis=1)


Now we want to match the IRAF output table with the observations table based on cob_id. Each object will now have information about the field observation in its row. 

In [10]:
combo1=pd.merge(iraf_output,cobid_obsdata[['cenra', 'cendec', 'qflag', 'std_name', 'cfg_file', 'obsid', 'ndfclass_updated', 'cob_id', 'runccd_id']], how="left", on="cob_id")

I now want to match up the information in combo1 (observations+IRAF output) with the information from the objects table because I want info like objects.name and objects.comment. The objects table is huge, so, using Jeffrey's bash script, I've split it up by night. We match up the big_combo table with the objects table on a night by night basis using cob_id and pivot. 

In [11]:
## This groups the objects in combo1 table by date and creates a list of each date used. 
date_grouping=combo1.groupby('dirname')
date_list=combo1[['dirname']].groupby('dirname', as_index=False).first()

In [12]:
for i in range(len(date_list)):
    ## Use the aggregate table to pull out each individual date
    date_name=np.array(date_list.loc[i])[0].astype(int)
    print date_name
    ## Pulls out all of the target observations for that night, organised by COB_ID and Pivot
    extract=date_grouping.get_group(date_name)
    
    ## Read in the objects table for that night
    filename='/Users/kschles/Documents/GALAH/iraf_v5.1/objects_by_date/'+date_name.astype(str)+'.txt'
    temp_objects=pd.read_csv(filename, names=['runccd_id','run_id','pivot','fibre','type','ra','dec','x','y','xerr','yerr','theta','object_name','comment','mag','pmra','pmdec','pid','retractor','wlen','galah_id','out_name','airmass','barycentric','heliocentric'],index_col=False, low_memory=False)
    
    y=pd.merge(extract, temp_objects[['runccd_id', 'pivot', 'object_name', 'comment', 'mag', 'galah_id']], how="left", on=['runccd_id', 'pivot'])
    
    ## the result dataframe has the combination of big_combo with the objects table and we append the combo from each night. 
    if (i==0) : 
        result=y
    else :
        result=result.append(y)


131123
131216
131218
131222
140117
140207
140208
140209
140210
140212
140303
140304
140305
140307
140308
140309
140310
140311
140312
140313
140314
140315
140316
140409
140410
140412
140413
140414
140415
140607
140608
140609
140610
140611
140707
140708
140709
140710
140711
140712
140713
140805
140806
140807
140813
140814
141101
141102
141103
141104
141202
141231
150101
150102
150103
150105
150106
150107
150108
150109
150112
150204
150205
150206
150207
150208
150209
150210
150211
150330
150401
150402
150403
150404
150405
150406
150407
150408
150409
150410
150411
150412
150413
150426
150427
150428
150429
150430
150504
150531
150601
150602
150603
150604
150605
150607
150703
150704
150705
150706
150718
150719
150824
150826
150827
150828
150829
150830
150831
150901
150902
150903
151008
151009
151109
151110
151111
151219
151220
151224
151225
151226
151227
151228
151230
151231
160106
160107
160108
160109
160110
160111
160112
160123
160124
160125
160126
160129
160130


The result table has all objects, even those observed as MFFLX stars. Thus, it needs to be cleaned up. 

In [13]:
result.reset_index(drop=True, inplace=True)

In [14]:
## Clean up the table columns.
result2=result.loc[np.where(result['ndfclass_updated'].str.match('MFOBJECT', as_indexer=True)==True)[0],\
       ['cobpivot', 'name', 'cob_id', 'pivot', 'dirname', 'mag_x', 'radeg',\
       'dedeg', 'glon', 'glat', 'ebv', 'teff', 'logg', 'feh',\
       'combine_method', 'galah_id_x', 'vmean', 'vstd', 'v_ccd1', 'snr_1',\
       'snr2_1', 'snr_guess_1', 'v_ccd2', 'snr_2', 'snr2_2', 'snr_guess_2',\
       'v_ccd3', 'snr_3', 'snr2_3', 'snr_guess_3', 'v_ccd4', 'snr_4',\
       'snr2_4', 'snr_guess_4', 'cenra', 'cendec', 'qflag', 'std_name',\
       'cfg_file', 'obsid', 'ndfclass_updated', 'runccd_id', 'object_name',\
       'comment', 'wavelength_flag']]
result2.reset_index(drop=True, inplace=True)
result2.rename(columns={'mag_x' : 'mag', 'galah_id_x' : 'galah_id'}, inplace=True)


Use STILTS to match up the table with 2MASS, APASS, and UCAC4. Searching for a match for each target in the result2 table within 1 arcsecond. 

2MASS catalog is II/246/out 

UCAC4 catalog is 'I/322A/out' 

PPMXL is 'I/317'

SPM is 'I/320'

The USNO-B catalog is 'I/284/out'

If the OS command returns a 0, all is well with the matching. 

In [15]:
## First output the file to a temporary csv. 
result2[['cob_id', 'pivot', 'radeg', 'dedeg']].to_csv('result_temp.csv', index=False)

In [16]:
## Then query 2MASS with these targets. stilts.jar must be in your working directory. 
os.system('java -jar stilts.jar cdsskymatch cdstable=II/246/out find=each in=result_temp.csv ifmt=csv ra=radeg dec=dedeg radius=1 out=result_temp_2mass.csv')

0

In [None]:
## Query UCAC4 with STILTS and target list 
os.system('java -jar stilts.jar cdsskymatch cdstable=I/322A/out find=each in=result_temp.csv ifmt=csv ra=radeg dec=dedeg radius=1 out=result_temp_ucac4.csv')

In [70]:
## Query APASS with STILTS and target list 
os.system('java -jar stilts.jar cdsskymatch cdstable=II/336/apass9 find=each in=result_temp.csv ifmt=csv ra=radeg dec=dedeg radius=1 out=result_temp_apass.csv')

0

Once you have the relevant data tables, match them with the results2 table based on cob_id and pivot. 

In [201]:
## Read in the data you just pulled. 
twomass_data=pd.read_csv('result_temp_2mass.csv')
ucac4_data=pd.read_csv('result_temp_ucac4.csv')
#usnob_data=pd.read_csv('result_temp_usnob.csv')
apass_data=pd.read_csv('result_temp_apass.csv')

In [163]:
temp1=pd.merge(result2, twomass_data[['cob_id', 'pivot', '2MASS', 'Jmag', 'e_Jmag', \
                                     'Hmag', 'e_Hmag', 'Kmag', 'e_Kmag', 'Qfl', 'Rfl', 'X']], how='left', on=['cob_id', 'pivot'])
temp2=pd.merge(temp1, ucac4_data[['cob_id', 'pivot', 'UCAC4', 'pmRA', 'e_pmRA', 'pmDE', 'e_pmDE']], how='left', on=['cob_id', 'pivot'])
combined_table=pd.merge(temp2, apass_data[['cob_id', 'pivot', 'Vmag', 'e_Vmag', 'Bmag', 'e_Bmag', 
                                 'gpmag', 'e_gpmag', 'rpmag', 'e_rpmag', 'ipmag', 'e_ipmag']], how='left', on=['cob_id', 'pivot'])

In [164]:
## Adjust column names
combined_table.rename(columns={'2MASS' : '2MASS_ID', 'UCAC4' : 'UCAC4_ID', 'teff' : 'TEFF_GUESS', \
                              'logg' : 'LOGG_GUESS', 'feh' : 'FEH_GUESS', 'qflag' : 'RED_QFLAG', \
                              'galah_id' : 'GALAH_ID'}, inplace=True)
## Shift all to upper case
combined_table.columns = [x.upper() for x in combined_table.columns]

In [165]:
combined_table.columns.values

array(['COBPIVOT', 'NAME', 'COB_ID', 'PIVOT', 'DIRNAME', 'MAG', 'RADEG',
       'DEDEG', 'GLON', 'GLAT', 'EBV', 'TEFF_GUESS', 'LOGG_GUESS',
       'FEH_GUESS', 'COMBINE_METHOD', 'GALAH_ID', 'VMEAN', 'VSTD',
       'V_CCD1', 'SNR_1', 'SNR2_1', 'SNR_GUESS_1', 'V_CCD2', 'SNR_2',
       'SNR2_2', 'SNR_GUESS_2', 'V_CCD3', 'SNR_3', 'SNR2_3', 'SNR_GUESS_3',
       'V_CCD4', 'SNR_4', 'SNR2_4', 'SNR_GUESS_4', 'CENRA', 'CENDEC',
       'RED_QFLAG', 'STD_NAME', 'CFG_FILE', 'OBSID', 'NDFCLASS_UPDATED',
       'RUNCCD_ID', 'OBJECT_NAME', 'COMMENT', '2MASS_ID', 'JMAG', 'E_JMAG',
       'HMAG', 'E_HMAG', 'KMAG', 'E_KMAG', 'QFL', 'RFL', 'X', 'UCAC4_ID',
       'PMRA', 'E_PMRA', 'PMDE', 'E_PMDE', 'VMAG', 'E_VMAG', 'BMAG',
       'E_BMAG', 'GPMAG', 'E_GPMAG', 'RPMAG', 'E_RPMAG', 'IPMAG', 'E_IPMAG'], dtype=object)

Now I output the resulting combined table as a CSV to use for other programs. 

In [30]:
## Output to CSV
combined_table[['COB_ID', 'PIVOT', 'DIRNAME', 'MAG', 'RADEG', \
       'DEDEG', 'GLON', 'GLAT', 'EBV', 'TEFF_GUESS', 'LOGG_GUESS', \
       'FEH_GUESS', 'COMBINE_METHOD', 'GALAH_ID', 'V_COMB', 'V_CCD1', \
       'SNR_1', 'SNR2_1', 'SNR_GUESS_1', 'V_CCD2', 'SNR_2', 'SNR2_2', \
       'SNR_GUESS_2', 'V_CCD3', 'SNR_3', 'SNR2_3', 'SNR_GUESS_3', 'V_CCD4', \
       'SNR_4', 'SNR2_4', 'SNR_GUESS_4', 'CENRA', 'CENDEC', 'RED_QFLAG', \
       'STD_NAME', 'CFG_FILE', 'OBSID', 'NDFCLASS_UPDATED', 'RUNCCD_ID', \
       'OBJECT_NAME', 'COMMENT', '2MASS_ID', 'JMAG', 'E_JMAG', 'HMAG', \
       'E_HMAG', 'KMAG', 'E_KMAG', 'QFL', 'RFL', 'X','VMAG', 'E_VMAG', \
       'BMAG', 'E_BMAG', 'GPMAG', 'E_GPMAG', 'RPMAG', 'E_RPMAG', \
       'IPMAG', 'E_IPMAG', 'UCAC4_ID', 'PMRA', 'E_PMRA', 'PMDE', 'E_PMDE'\
        ]].to_csv('combined_table_04172016.csv', index=False)

In [196]:
combined_table.loc[np.where(combined_table['GALAH_ID'].isnull()==True)[0], 'GALAH_ID']=-1

In [194]:
## Output to FITS file 

## Target Information
col01=fits.Column(name='COB_ID', format='I10', array=combined_table['COB_ID'])
col02=fits.Column(name='PIVOT', format='I3', array=combined_table['PIVOT'])
col03=fits.Column(name='DIRNAME',format='I6', array=combined_table['DIRNAME'])
col04=fits.Column(name='GALAH_ID',format='K15', array=combined_table['GALAH_ID'])
col05=fits.Column(name='COMBINE_METHOD',format='I', array=combined_table['COMBINE_METHOD'])
col06=fits.Column(name='RADEG',format='F',unit='Degrees', array=combined_table['RADEG'])
col07=fits.Column(name='DEDEG',format='F',unit='Degrees', array=combined_table['DEDEG'])
col08=fits.Column(name='GLON',format='F',unit='Degrees', array=combined_table['GLON'])
col09=fits.Column(name='GLAT',format='F',unit='Degrees', array=combined_table['GLAT'])
col10=fits.Column(name='CFG_FILE',format='A60', array=combined_table['CFG_FILE'])
col11=fits.Column(name='CENRA',format='F', unit='Degrees', array=combined_table['CENRA'])
col12=fits.Column(name='CENDEC',format='F', unit='Degrees', array=combined_table['CENDEC'])
col13=fits.Column(name='OBJECT_NAME',format='A20', array=combined_table['OBJECT_NAME'])
col14=fits.Column(name='COMMENT',format='A20', array=combined_table['COMMENT'])
col15=fits.Column(name='EBV',format='F',unit='mag', array=combined_table['EBV'])
## GUESS Information
col16=fits.Column(name='TEFF_GUESS',format='F', unit='K', array=combined_table['TEFF_GUESS'])
col17=fits.Column(name='LOGG_GUESS',format='F', unit='dex', array=combined_table['LOGG_GUESS'])
col18=fits.Column(name='FEH_GUESS',format='F', unit='dex', array=combined_table['FEH_GUESS'])
col19=fits.Column(name='RV_CCD1',format='F', unit='km/s', array=combined_table['V_CCD1'])
col20=fits.Column(name='RV_CCD2',format='F', unit='km/s',array=combined_table['V_CCD2'])
col21=fits.Column(name='RV_CCD3',format='F', unit='km/s',array=combined_table['V_CCD3'])
col22=fits.Column(name='RV_MEAN',format='F', unit='km/s',array=combined_table['VMEAN'])
col23=fits.Column(name='RV_STDDEV',format='F', unit='km/s',array=combined_table['VSTD'])
col24=fits.Column(name='FLAG_GUESS',format='L', unit='',array=combined_table['WAVELENGTH_FLAG'])
## WHAT ABOUT OTHER GUESS FLAGS?
## S/N Information
col25=fits.Column(name='SNR_ERROR_CCD1',format='E', unit='',array=combined_table['SNR2_1'])
col26=fits.Column(name='SNR_ERROR_CCD2',format='E', unit='',array=combined_table['SNR2_2'])
col27=fits.Column(name='SNR_ERROR_CCD3',format='E', unit='',array=combined_table['SNR2_3'])
col28=fits.Column(name='SNR_ERROR_CCD4',format='E', unit='',array=combined_table['SNR2_4'])
col29=fits.Column(name='SNR_GUESS_CCD1',format='E', unit='',array=combined_table['SNR_GUESS_1'])
col30=fits.Column(name='SNR_GUESS_CCD2',format='E', unit='',array=combined_table['SNR_GUESS_2'])
col31=fits.Column(name='SNR_GUESS_CCD3',format='E', unit='',array=combined_table['SNR_GUESS_3'])
col32=fits.Column(name='SNR_GUESS_CCD4',format='E', unit='',array=combined_table['SNR_GUESS_4'])
## 2MASS Information 
col33=fits.Column(name='2MASS_ID',format='A20', unit='',array=combined_table['2MASS_ID'])
col34=fits.Column(name='JMAG',format='E', unit='mag',array=combined_table['JMAG'])
col35=fits.Column(name='E_JMAG',format='E', unit='mag',array=combined_table['E_JMAG'])
col36=fits.Column(name='HMAG',format='E', unit='mag',array=combined_table['HMAG'])
col37=fits.Column(name='E_HMAG',format='E', unit='mag',array=combined_table['E_HMAG'])
col38=fits.Column(name='KMAG',format='E', unit='mag',array=combined_table['KMAG'])
col39=fits.Column(name='E_KMAG',format='E', unit='mag',array=combined_table['E_KMAG'])
col40=fits.Column(name='2MASS_QFL',format='A5', unit='',array=combined_table['QFL'])
col41=fits.Column(name='2MASS_RFL',format='I3', unit='',array=combined_table['RFL'])
col42=fits.Column(name='2MASS_XFL',format='I1', unit='',array=combined_table['X'])
## APASS Information 
col43=fits.Column(name='BMAG',format='E', unit='mag',array=combined_table['BMAG'])
col44=fits.Column(name='E_BMAG',format='E', unit='mag',array=combined_table['E_BMAG'])
col45=fits.Column(name='VMAG',format='E', unit='mag',array=combined_table['VMAG'])
col46=fits.Column(name='E_VMAG',format='E', unit='mag',array=combined_table['E_VMAG'])
col47=fits.Column(name='GPMAG',format='E', unit='mag',array=combined_table['GPMAG'])
col48=fits.Column(name='E_GPMAG',format='E', unit='mag',array=combined_table['E_GPMAG'])
col49=fits.Column(name='RPMAG',format='E', unit='mag',array=combined_table['RPMAG'])
col50=fits.Column(name='E_RPMAG',format='E', unit='mag',array=combined_table['E_RPMAG'])
col51=fits.Column(name='IPMAG',format='E', unit='mag',array=combined_table['IPMAG'])
col52=fits.Column(name='E_IPMAG',format='E', unit='mag',array=combined_table['E_IPMAG'])
## UCAC4 Information 
col53=fits.Column(name='UCAC4_ID',format='A10', unit='',array=combined_table['UCAC4_ID'])
col54=fits.Column(name='PMRA',format='E', unit='mas/yr',array=combined_table['PMRA'])
col55=fits.Column(name='E_PMRA',format='E', unit='mas/yr',array=combined_table['E_PMRA'])
col56=fits.Column(name='PMDE',format='E', unit='mas/yr',array=combined_table['PMDE'])
col57=fits.Column(name='E_PMDE',format='E', unit='mas/yr',array=combined_table['E_PMDE'])



In [195]:
#get_ipython().system(u'rm iraf_output_combination.fits')
cols=fits.ColDefs([col01,col02,col03,col04,col05,col06,col07,col08,col09,col10,\
                   col11,col12,col13,col14,col15,col16,col17,col18,col19,col20,\
                   col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,\
                   col31,col32,col33,col34,col35,col36,col37,col38,col39,col40,\
                   col41,col42,col43,col44,col45,col46,col47,col48,col49,col50,\
                   col51,col52,col53,col54,col55,col56,col57])
tbhdu=fits.BinTableHDU.from_columns(cols)
tbhdu.writeto('iraf_output_combination.fits')


output to csv with comparable information??

IN THE FUTURE WE CAN COMBINE WITH ADDITIONAL INFORMATION, LIKE VALUES FROM TOMAZ AND JANE

Now match up with Tomaz' distances

In [30]:
objects=pd.read_csv("/Users/kschles/Documents/GALAH/iraf_v5.0/objects_table_manipulate_byday.csv")
distances=pd.read_csv('/Users/kschles/Documents/GALAH/distances/zwitter/iraf_final_ed.csv')

In [31]:
distances['ccd1_filename']=((distances['name'].astype(str)).str[0:15]).astype(int)
distances['filename']=distances['name'].astype(str)
distances['cob_id']=(distances['filename'].str[0:10]).astype(int)

In [47]:
dist_combo=pd.merge(objects, distances[['cob_id', 'pivot', 'dmod', 'edmod']], how='left', on=['cob_id', 'pivot'])