## 30 July 2018
-- Laurin Gray

This notebook is to run the DESK SED fitting routines created by Dr. Steven Goldman (https://github.com/s-goldman/Dusty-Evolved-Star-Kit) on our tiered catalogs.  We have separated each source into its own csv file with wavelength in the first column and flux density in Jy in the second.

I want to produce a single csv file with the fitting results for each tier, but I want each plot to be made individually.  I plan to accomplish this by creating a list of the files in a tier, and iterating through the list.  In each loop, the produced fitting results & plotting outputs will be added into master files, and the SED plot will be moved to another folder.

Data is from the catalog of Spitzer sources of Khan et al. (2015), matched with sources from Whitelock et al. (2013) in CasJobs

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import gaussian_kde
import csv
import pathlib
from pathlib import Path
import os
import re

In [2]:
def natural_key(string_):
    """
    Function for natural sorting of file list.
    From: 
    https://stackoverflow.com/questions/2545532/python-analog-of-natsort-function-sort-a-list-using-a-natural-order-algorithm
    See http://www.codinghorror.com/blog/archives/001018.html
    """
    
    return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_)]

In [3]:
def run_desk():
    """
    Function for easily running DESK SED fitting routines produced by Dr. Steven Goldman 
    (https://github.com/s-goldman/Dusty-Evolved-Star-Kit).
    
    Function runs the DESK scripts on one file at a time, where each file is a csv of the wavelengths 
    and flux densities of an individual source.  Source files for a red candidate tier should be within the 
    same folder, accessed by dir_src.  The function creates master lists of the fitting results & plotting outputs
    for all the sources within a tier, while keeping the plots for each source separate.
    
    Within the code, dir_dest should be changed by the user to point to the "put_target_data_here" folder of DESK 
    within their own computer.  dir_plot is the location of where the user wants to store the individual plots 
    (recommended to create a folder for each tier in this location before running function).  
    The last part of the address for dir_src and dir_plot need to be the same (since they're both sorted by tier).
    dir_outputs needs to be changed by the user to point to the DESK folder (where the raw output csvs are stored)
    
    Before calling function, the user defines:
        # variables to stay the same every time the code runs
        dir_src = '/Users/lgray/Documents/Phot_data/SED_Fit_Sources/30July2018/'
        dir_dest = '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/put_target_data_here/' 
        dir_plot = '/Users/lgray/Documents/Phot_data/SED_Plots/'
        dir_outputs = '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/'
        single_fitting_results = dir_outputs + 'fitting_results.csv'
        single_plot_output = dir_outputs + 'fitting_plotting_outputs.csv'
        
        # variables that will be input into the function to choose which data to run SED fits for
        folder = 'in_eight/'
        catalog = pd.read_csv('/Users/lgray/Documents/Phot_data/Red_Cand_Catalogs/30July2018/30July2018_LG_RedCand_8.csv')
        master_result = '/Users/lgray/Documents/Phot_data/SED_Fit_Results/fitting_results_8.csv'
        master_plot_output = '/Users/lgray/Documents/Phot_data/SED_Plot_Output/fitting_plotting_outputs_8.csv'
    
    User also defines natural_key (see function notebook) and 
        %cd '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/python_scripts'
    into the directory for the scripts before running this function.
    
    Other notes:
        - To avoid printing the output for every single source, comment out those lines in sed_fitting.py
        - Make sure to adjust the options in sed_fitting.py as desired.
        - All folders that data is being created in or moved to must exist before running the function.
        
    Call example:
        run_desk()
    """
    
    list_of_sources = catalog.ID.values # list of just the sources in the current tier
    list_of_files = os.listdir(dir_src+folder) # list of the filenames in the tier, i.e. '70.csv'
    list_of_files = sorted(list_of_files, key=natural_key) # sorted with natural sorting
    
    list_of_files = ['70.csv', '85.csv', '106.csv'] # run with just 3 files to make sure it works
    
    c = 0
    for src in list_of_files:
        if c == 0:
            os.system('rm ' + dir_dest+'*.csv') # empty place_target_data_here
            os.system('cp ' + dir_src+folder+src+ ' ' + dir_dest) # cp dir_src+src dir_dest
            %run sed_fitting.py
            %run plotting_seds.py
        
        
            # read fitting_results.csv into a new file in dir_result
            master_fit = open(master_result, 'w') # master file
            sing_fit = open(single_fitting_results, 'r')
            writer = csv.writer(master_fit) # writer on master fit file
            reader = csv.reader(sing_fit) # reader on single fit file

            for row in reader:
                data = row

            writer.writerow(['source', 'L', 'vexp_predicted', 'teff', 'tinner', 
                             'odep', 'mdot']) # write headers (only 1st)
            writer.writerow(data) # write row of data to master file
            master_fit.close()
            sing_fit.close()
        
        
            # read fitting_plotting_outputs.csv into a new file in dir_result
            master_plot = open(master_plot_output, 'w')
            sing_plot_output = open(single_plot_output, 'r')
            writer = csv.writer(master_plot)
            reader = csv.reader(sing_plot_output)

            for row in reader:
                data = row

            writer.writerow(['target_name', 'data_file', 'norm', 'index', 'grid_name', 'teff', 'tinner', 'number', 
                             'odep', 'mdot', 'vexp'])
            writer.writerow(data)
            master_plot.close()
            sing_plot_output.close()
        
        
            # move plot & rename
            os.system('mv ' + dir_outputs+'output_seds.png' + ' ' + dir_plot+folder) # move plot to my plot directory
            # rename plot with source ID
            os.rename(dir_plot+folder+'output_seds.png', dir_plot+folder+str(catalog.ID.values[c])+'.png') 
            c = c+1
        
        else:
            #print progress updates every 10 sources (helpful if printing is commented out in sed_fitting.py)
            if c%10 == 0:
                print('On source ', c, '/', len(list_of_files))
            
            k = c-1
            os.system('rm ' + dir_dest+list_of_files[k]) # remove previous file
            os.system('cp ' + dir_src+folder+src + ' ' + dir_dest)  
            %run sed_fitting.py
            %run plotting_seds.py
        
        
            # read fitting_results.csv into existing file in dir_result
            master_fit = pd.read_csv(master_result)
            sing_fit = pd.read_csv(single_fitting_results)

            master_fit = pd.concat([master_fit, sing_fit], axis=0)
            master_fit.to_csv(master_result, index=False)
        
        
            # read fitting_plotting_outputs.csv into existing file in dir_result
            master_plot = pd.read_csv(master_plot_output)
            sing_plot_output = pd.read_csv(single_plot_output)

            master_plot = pd.concat([master_plot, sing_plot_output], axis=0)
            master_plot.to_csv(master_plot_output, index=False)
        
        
            # move plot & rename
            os.system('mv ' + dir_outputs+'output_seds.png' + ' ' + dir_plot+folder)
            os.rename(dir_plot+folder+'output_seds.png', dir_plot+folder+str(catalog.ID.values[c])+'.png')
            c = c+1

In [4]:
# variables to stay the same every time the code runs (modified for user's computer)
dir_src = '/Users/lgray/Documents/Phot_data/SED_Fit_Sources/30July2018/'
dir_dest = '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/put_target_data_here/' 
dir_plot = '/Users/lgray/Documents/Phot_data/SED_Plots/'
dir_outputs = '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/'
single_fitting_results = dir_outputs + 'fitting_results.csv'
single_plot_output = dir_outputs + 'fitting_plotting_outputs.csv'

# variables that will be input into the function to choose which data to run SED fits for
folder = 'in_eight/'
catalog = pd.read_csv('/Users/lgray/Documents/Phot_data/Red_Cand_Catalogs/30July2018/30July2018_LG_RedCand_8.csv')
master_result = '/Users/lgray/Documents/Phot_data/SED_Fit_Results/fitting_results_8.csv'
master_plot_output = '/Users/lgray/Documents/Phot_data/SED_Plot_Output/fitting_plotting_outputs_8.csv'

In [5]:
# move to script directory
%cd '/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/python_scripts'

/Users/lgray/anaconda3/lib/python3.6/site-packages/desk/python_scripts


In [6]:
run_desk()


Time: 0.33 minutes


  y_model = np.log10(y_model)



Time: 0.33 minutes


  y_model = np.log10(y_model)



Time: 0.32 minutes


  y_model = np.log10(y_model)


I've also modified line 74 of plotting_seds.py to read:

    ax1.set_ylim(min(y_data)-1, max(y_data)+1)

This gives me an auto-adjusted y-axis.

Also, while this function does save a complete list of all the plotting output data generated, that file cannot be used to plot SEDs with plotting_seds.py because the second column won't point to the correct file (assuming the master file is located outside of the DESK output folder).  I assume this could be fixed by moving the plotting output master file to the outputs location & renaming it "fitting_plotting_outputs.csv" before running the plotting script.