# GMACS ETC DATASET PACKAGER  
**Follow this guide to update the GMACS ETC dataset**  
*Last updated: 19 July, 2018*  

## About this packager
This is designed to take column-oriented data and pre-package it for quicker access in real-time interactivity. This reduces server workload, and improves overall latency.  
Most variables are named according to the following convention:
- the first word (or prefix) denotes either:
    - the data group that the information is related to,
        - e.g. `skyfiles_path` and `skyfiles_files`
        - e.g. `star_types_path` and `star_types_files`
    - or, the type of data contained in the variable *as they pertain to building purposes*
        - e.g. `string_example` will be an example variable intended to be used as a string.
        - e.g. `coalesced_galaxy_types` will be the entire group of columns for every 'galaxy type' file.
- the last word (or suffix) *may* denote a color specification (if one exists).

## Prepare your data
Figure out where your data goes:
- core/
    - kinny/ (galaxy types)
    - pickle/ (star types)
    - filters/ (filters)
    - skybackground/ (sky files)
    - efficiencies/ (ccd and grating efficiencies)
    - dichroic.txt (dichroic glass)
    - atmo_extinction.dat (atmospheric extinction)

Data should adhere to the standardizations listed below. If you need to do something different, that's fine, and please document it.
- comma deliniation
- no spaces
- the first row should be column names, not data-points
- first-row column-names should be in alphabetical order (a,b,c...)

## Run this script thing

### Import some tools
TQDM handles progress bars to make try/except operations display cleanly during compilation. You'll see.
Scipy now handles only interpolation. Linear interpolations are named by number of dimensions (e.g. `interpolate.interp1d()` or `interpolate.interp2d()`). Pchip is its own function:`interpolate.PchipInterpolator()`.

In [None]:
import os
from tqdm import tqdm
from scipy import interpolate
import pandas as pd
import numpy as np

## Set up the variable name things
You could try to simplify this process by dynamically populating file lists, e.g. `for file in os.listdir(star_files_path):` or similar, but then you'll have to figure out a system for determining which file occupies which index of a coalescence. For this reason, I have made the lists of data files explicit.

In [None]:
step_size = 0.1 				# scales interpolation steps
galaxy_types_path = 'core/kinney/'
galaxy_types_files = ['SB1','SB2','SB3','SB4','SB5','SB6','S0','Sa',
                      'Sb','Sc','bulge','ellipticals','lbg_all_flam']
star_types_path = 'core/pickle/'
star_types_files = ['o5v.dat','b0v.dat','b57v.dat',
                    'a0v.dat','a5v.dat','f0v.dat',
                    'g0v.dat','g5v.dat','k0v.dat',
                    'k5v.dat','m0v.dat','m5v.dat']
filter_path = 'core/filters/'
filter_files = ['photonUX','photonB','photonV','photonR','photonI','u','g','r','i','z']
skyfiles_path = 'core/skybackground/'
skyfiles_files = ['00d','03d','07d','10d','14d']
efficiency_path = 'core/efficiencies/'
efficiency_grating_files = ['grating_red_low_res','grating_blue_low_res']
efficiency_ccd_files = ['e2v_red','e2v_blue']
dichroic_file = 'core/dichroic.txt'
atmospheric_extinction = 'core/atmo_extinction.dat'
lol_units = [' units',' units'] # for TQDM... just ignore this
error_flag = 0 # for counting errors... hopefully never.

## Import your data
This loads each file in the order listed in their files variable (e.g. `star_types_files`). In this way, `coalesced_star_types` contains all the data from all the `star_types_files` files.to access one of these files, simply use the index corrosponding to the index of its name in the `star_types_files list.` For example, star type file `05v.dat` is accessible as the variable `coalesced_star_types[0].`  
Pandas, which is a tool used to import the data, assigns a second index to each column, whose index name is the first cell in that column. For this reason, and in accordance with the data	preparation guidelines, the first cells of every column should be systematically named. At the time of this writing, each column was in alphabetical order (e.g. 'a', 'b', 'c', etc). In this way, accessing a specific column within a `star_types_files` file is done like so:  
- `coalesced_star_types[3]['a']`, or
- `coalesced_star_types[3].a`,  
which would access the first column of the fourth file in star_types_files (0th indexing, so index 3 is the 4th cell).  
If a data type has only one file (such as 'atmospheric extinction'), the first index is negated, and the only index is that of each column. For example:  
- `coalesced_atmospheric_extinction['a']`, or  
- `coalesced_atmospheric_extinction.a`  

The first group (galaxy types) features a comment-walkthrough. Check it out.

In [None]:
# galaxy types
coalesced_galaxy_types = {} # this will be a list of lists of columns
try:
	for i,galaxy_types_file in tqdm(enumerate(galaxy_types_files),
		desc='Loading object data',ncols=0,unit=tickle(lol_units)):
        # `i,galaxy_types_file` means that, for each thing in the list of galaxy_types_files,
        # `i` will be the index of the value, and 'galaxy_types_file' will be the string name
        # as it appears in the `galaxy_types_files` list
        
        # this sets up the path to the files. change the suffix ('.txt', '.dat', etc) as needed.
		_path = os.path.join(galaxy_types_path,"{0}{1}".format(galaxy_types_file,'.txt'))
        
		# Pandas import the data. Literal pandas. Delimiter is here too; change as needed.
        _value = pd.read_csv(_path,delimiter=',')
        
        # this is where the file `galaxy_types_file` becomes an index of coalesced_galaxy_types.
        coalesced_galaxy_types[i] = _value
except:
	print('[Error :: object data]')
	error_flag += 1

# star types
coalesced_star_types = {}
try:
	for i,star_types_file in tqdm(enumerate(star_types_files),
		desc='Loading star data',ncols=0,unit=tickle(lol_units)):
		_path = os.path.join(star_types_path,star_types_file) 
		_value = pd.read_csv(_path,delimiter=',')
		coalesced_star_types[i] = _value
except:
	print('[Error :: star data]')
	error_flag += 1

# filters
coalesced_filters = {}
try:
	for i,filter_file in tqdm(enumerate(filter_files),
		desc='Loading filter data',ncols=0,unit=tickle(lol_units)):
		_path = os.path.join(filter_path,"{0}{1}".format(filter_file,'.txt'))
		_value = pd.read_csv(_path,delimiter=',')
		coalesced_filters[i] = _value
except:
	print('[Error :: filter data]')
	error_flag += 1

# sky files
coalesced_sky_files = {}
try:
	for i,skyfiles_file in tqdm(enumerate(skyfiles_files),
		desc='Loading atmospheric data',ncols=0,unit=tickle(lol_units)):
		_path = os.path.join(skyfiles_path,"{0}{1}".format(skyfiles_file,'.txt'))
		_value = pd.read_csv(_path,delimiter=',')
		coalesced_sky_files[i] = _value
except:
	print('[Error :: atmospheric data]')
	error_flag += 1

# grating efficiency files
coalesced_grating_files = {}
try:
	for i,efficiency_grating_file in tqdm(enumerate(efficiency_grating_files),
		desc='Loading grating efficiencies',ncols=0,unit=tickle(lol_units)):
		_path = os.path.join(efficiency_path,"{0}{1}".format(efficiency_grating_file,'.txt'))
		_value = pd.read_csv(_path,delimiter=',')
		coalesced_grating_files[i] = _value
	grating_red_1 = np.dot(coalesced_grating_files[0]['a'],10) # had to be multiplied by 10 somewhere, so might as well be here.
	grating_red_2 = coalesced_grating_files[0]['b']
	grating_blue_1 = np.dot(coalesced_grating_files[1]['a'],10)
	grating_blue_2 = coalesced_grating_files[1]['b']
except:
	print('[Error :: grating efficiencies]')
	error_flag += 1

# CCD efficiency files
coalesced_ccd_files = {}
try:
	for i,efficiency_ccd_file in tqdm(enumerate(efficiency_ccd_files),
		desc='Loading CCD efficiencies',ncols=0,unit=tickle(lol_units)):
		_path = os.path.join(efficiency_path,"{0}{1}".format(efficiency_ccd_file,'.txt'))
		_value = pd.read_csv(_path,delimiter=',')
		coalesced_ccd_files[i] = _value
	# parse parse parse
	ccd_efficiency_red_1 = np.dot(coalesced_ccd_files[0]['a'],10)
	ccd_efficiency_red_2 = coalesced_ccd_files[0]['b']
	ccd_efficiency_blue_1 = np.dot(coalesced_ccd_files[1]['a'],10)
	ccd_efficiency_blue_2 = coalesced_ccd_files[1]['d'] # data came in like this
except:
	print('[Error :: ccd efficiencies]')
	error_flag += 1

# dichroic efficiency and atmospheric extinction
try:
	# dichro
	coalesced_dichroic = pd.read_csv(dichroic_file,delimiter=',')
	dichro_x = np.dot(coalesced_dichroic['a'],10) # wavelength in A
	dichro_y1 = coalesced_dichroic['b'] # reflectivity, blue channel
	dichro_y2 = coalesced_dichroic['c'] # transmission, red channel
	# atmo ext
	coalesced_atmospheric_extinction = pd.read_csv(atmospheric_extinction,delimiter=',')
	atmo_ext_x = coalesced_atmospheric_extinction['a']
	atmo_ext_y = coalesced_atmospheric_extinction['b']
except:
	print('[Error :: dichroic / atmo. extinction]')
	error_flag += 1