# Data format and preparation

## ESO pipelines

Unlike with MAROON-X, we do not have an efficient team of people giving us the data in a usable form. We have to prepare our own data, using ESO pipelines.

ESO pipelines are a bit messy to install, but the ESO support team is very responsive and will help you in the process. I managed to install the pipelines on a Mac relatively effortlessly.

ESO pipelines are called "receipes", and can be run in different environments. Esorex is a command line tool. However, there is a convenient API called Esoreflex, which wraps Esorex in a more immediate form. This is what I use for the ESPRESSO pipelines.

To run the pipeline, you will need tens of GBs (up to 100 or 200 GBs) of free space for each dataset, so make sure you have enough space. Create a folder tree:

ESPRESSO_observations/<Run_name>/data_with_raw_calibs
ESPRESSO_observations/<Run_name>/reflex_end_products

We will have two runs: WASP-76b_transmission_run_1; WASP-76b_transmission_run_2

## Data download

For this tutorial, we will use the data published in Ehrenreich et al. (2021). Step 0 is to get the data from the ESO archive. 

There is a user friendly version of the archive (science portal) that contains pre-reduced spectra. These are good for a quick look, but not for science. So, we instead use the raw data portal (http://archive.eso.org/eso/eso_archive_main.html)

Let's search for WASP-76b, and select the ESPRESSO instrument. To make sure you do not lose any data, click on the Program ID. Ehrenreich et al. uses two runs, both public: 1102.C-0744(C) and 1102.C-0744(D). In the window that opens, click on "FileList". That gives you a list of all science exposures in that run. We "Mark all", and "Request marked datasets (new service)".

We will not just need the science exposures, but also the calibrations, in order to run the pipeline. So let's "run association" of the **raw** calibrations (not processed). After the associatiton is done (it takes a while), we can download everything. I recommend the shell script.

Place the script for each run in the "data_with_raw_calibs", change permissions if needed (chmod u+x <script_name>) and run. You are downloading about 10 GB of data, so it may take a while. After everything is done, unzip the archives.

You should have lots of fits files. Most of them are calibrations, and may contain a date that does not coincide with the observational night. The observations have an associated raw2raw.xml, which associates the calibrations to the exposure. If some files are missing, repeat the process.

## Using esoreflex

You will need a configuration file of type ".kar". If you installed your ESO pipelines, you'll have an example to follow. In this drive (https://drive.google.com/drive/u/2/folders/1Xfu2wWyn8WO1DgOQsYmxj8gBMTMWUJ-W) I placed the one that I used for the tutorial.

1) Change the RAW_DATA_DIR and END-PRODUCTS-DIR according to your tree;
2) Tools -> Animate at Runtime -> 10

Launch!

Esoreflex requires some interaction. First, it will read the files, and present you with boxes to tick to select the files you'd like to reduce. If they are greyd out, it means that it could not find the calibrations. Go back to previous steps, and/or check your directory tree.

In the first pop-up window, have a look at the extracted spectra using the mask. If this is satisfactory, click the option "Use the parameters above as initial values in the subsequent executions of this receipe". In this way you'll have no more pop-ups, and esoreflex will run on its own.

You can also interrupt the execution of the pipeline and resume at a different moment. Esoreflex knows which files you have reduced or not, and by default it should not overwrite them. If you wish to overwrite, find the "lazy mode" and disable it (so Esoreflex will actually do everything from the beginning


## Output of esoreflex 

In the output folder, esoreflex creates a folder with the date and time on which the workflow was called. So if you do parts of the analysis on different times, you will have the results under different folders. Within this folder you will find one subfolder for each exposure, which contains the final products. Only some of them are useful for us.

ESPRESSO has two fibers. Fiber A contains the spectra of the planet, and fiber B can be placed on different sources depending on the observer's choice. Usually, it is placed on sky, to monitor telluric emission.

**<OB_name>_S2D_BLAZE_A_<Esposure_UT_time>**: gives the blaze function in fiber A.

**<OB_name>_S2D_SKYSUB_A_<Esposure_UT_time>**: gives the 2-dimensional spectra from fiber A, after subtracting fiber B. This is necessary to remove the telluric emission lines, but it could in some cases introduce extra noise. This is what I am currently using and is suggested, but you may explore the impact of avoiding the sky subtrction.

In addition, there are many other files, for both fiber A and B. For example, there are flux calibrated spectra (currently, Feb 2022, ESPRESSO flux calibration is not completely reliable, and not accurate enough to recover the flux at the precision needed for exoplanet atmosphere applications). Finally, there are CCFs calculated by the pipeline with an algorithm which is very similar to the one I give you (but much slower!).

From here on, we are on our own and we can start the data reduction.

# Data reduction