# Reading/Writing a 🌈



In [None]:
from chromatic import *

## Reading Files

One key goal of `chromatic` is to make it easy to load spectroscopic light curves from a variety of different file formats, so that the outputs from multiple different pipelines can be standardized into objects that can be direcly compared to one another. We hope to provide an straightforward way to check one analysis vs another as quickly as possible.

### Download Example Inputs

If you want to test out any of these readers, you'll need data files in each format to test on. You can download some example datasets from [this link](https://www.dropbox.com/s/es5drnp6ufkz8wv/example-datasets.zip?dl=0). Simply extract that `.zip` file into the directory from which you'll be running this notebook. 

### `chromatic` rainbow files (`*.rainbow.npy`)

The `chromatic` toolkit saves files in its own default format, which can then be shared and loaded back in. These files directly encode the core dictionaries in binary files, so they load and save quickly. They have the extension `.rainbow.npy`. These files can be written (see below) from any `Rainbow` object. 

In [None]:
rainbow_chromatic = Rainbow('example-datasets/chromatic/simulated.rainbow.npy')

The `Rainbow` reader will try to guess the format of the file from the filepath. If that doesn't work for some reason, in this case you can feed in the keyword `format='rainbownpy'`, to require the use of the `from_rainbownpy` reader needed for these files.

### generic text files (`*.txt`, `*.csv`)

Text files are slower to read or write, but everyone can make them! This reader will try to load one giant text file in which light curves for all wavelengths are stacked on top of each other or spectra for all times are stacked on top of each other. The text file should at least have columns that look like:
- `wavelength` for wavelength in microns
- `time` for time in days (preferably BJD$_{\rm TDB}$)
- `flux` for flux in any units
- `uncertainty` for flux uncertainties in the same units as `flux`
Additional columns will also be read, and they will be stored in the `.fluxlike` core dictionary.

In [None]:
rainbow_chromatic = Rainbow('example-datasets/chromatic/simulated.rainbow.txt')

If the file-format guess fails, you can feed in the keyword `format='text'` to tell the reader to expect one of these files.

### STScI `jwst` pipeline outputs (`x1dints.fits`)

The `jwst` pipeline developed at the Space Telescope Science Institute will produce extract 1D stellar spectra for time-series observations with the James Webb Space Telescope. Details about the pipeline itself are available [here](https://jwst-pipeline.readthedocs.io/en/latest/). 

These files typically end with the `_x1dints.fits` suffix. Each file contains a number of individual "integrations" (= time points). Because the datasets can get large, sometimes a particular observation might be split into multiple segments, each with its own file. As such, the reader for these files is designed to handle either a single file or a path with a `*` in it that points to a group of files from an observation that's been split into segments.

This reader has been tested on all of the `x1dints` files produced as Stage 2 Data Products in the simulated datasets available [here](https://app.box.com/folder/154382715453?s=tj1jnivn9ekiyhecl5up7mkg8xrd1htl).

In [None]:
rainbow_stsci = Rainbow('example-datasets/stsci/*_x1dints.fits')

If the file-format guess fails, you can feed in the keyword `format='x1dints'` to tell the reader to expect one of these files.

### `eureka` pipeline outputs (`S3_*_Table_Save.txt`)

The `eureka` pipeline is one of many community tools being designed to extract spectra from JWST data. Details about the pipeline itself are available [here](https://github.com/kevin218/Eureka). 

These files typically have names that look something like `S3_*_Table_Save.txt`, and they contain fluxes as a function of wavelength and time, stored as an astropy `ecsv` table.

In [None]:
rainbow_eureka = Rainbow('example-datasets/eureka/S3_wasp43b_Table_Save.txt')

If the file-format guess fails, you can feed in the keyword `format='eureka'` to tell the reader to expect one of these files.

## Writing Files

### `chromatic` rainbow files (`*.rainbow.npy`)

The default file format for saving files encodes the core dictionaries in binary files, using the extension `.rainbow.npy`. This is a file that can be read directly back into `chromatic`. (Indeed, the commands below created the file that we read above.)

In [None]:
simulated = SimulatedRainbow().inject_transit()
simulated.save('example-datasets/chromatic/simulated.rainbow.npy')

### generic text files (`*.txt`, `*.csv`)

Text files provide a more generally readable file format, even though they may be slower to read or write. This writer will create one giant text file that stacks the light curves for all wavelengths on top of each other (if the `group_by='wavelength'` keyword is set) or the spectra for all times on top of each other (if the `group_by='time'` keyword is set). The resulting text file should at least have columns that look like:
- `wavelength` for wavelength in microns
- `time` for time in days (preferably BJD$_{\rm TDB}$)
- `flux` for flux in any units
- `uncertainty` for flux uncertainties in the same units as `flux`

In [None]:
simulated.save('example-datasets/chromatic/simulated.rainbow.txt')

## Create your Own!

Naturally, you might want to add new readers to make use of the outputs from other pipelines or new writers to feed into various light curve analyses. To facilitate this, templates are available with human-friendly instructions for how to add a new reader or writer. If you want to try to incorporate a new reader or writer, please:
1. Install in development mode (see the [installation instructions](../installation))
2. Navigate to `chromatic/rainbows/[readers|writers]/template.py`.
3. Follow the instructions.

Good luck! If you add something, please consider submitting a Pull Request to share it with the world!