# Tutorial: Loading data

This notebook will teach you how to load data in the *EL-PASO* framework. You will load data from a csv file on disk, as well as download a cdf file from an online repository and load data from it. 

## Loading from a csv file

As a first step, we will load data from example_orbit.csv, which looks like this:

In [None]:
import pandas as pd

pd.read_csv("example_orbit.csv").head()

The data holds four columns describing the orbit of a satellite through time, altitude, latitude, and logitude. While we can certainly read in the data using pandas as above, we want to use the EL-PASO SourceFile class, which extracts the necessary data and puts it into variables. For this, we need to tell EL-PASO how to read the file by creating a list of ExtractionInfos. Each ExtractionInfo is used to create one variable based on data from the file and the given unit. The result_key parameter is used later on to identify the variable after loading.

In [None]:
from datetime import datetime, timezone

from astropy import units as u

import el_paso as ep

start_time = datetime(2019, 7, 30, 17, tzinfo=timezone.utc)
end_time = datetime(2019, 8, 3, 5, tzinfo=timezone.utc)

extraction_infos = [
    ep.ExtractionInfo(
        result_key="Epoch",
        name_or_column="DATETIME",
        unit=u.dimensionless_unscaled,
    ),
    ep.ExtractionInfo(
        result_key="alt",
        name_or_column="alt(km)",
        unit=u.km,
    ),
    ep.ExtractionInfo(
        result_key="lon",
        name_or_column="lon(deg)",
        unit=u.km,
    ),
    ep.ExtractionInfo(
        result_key="lat",
        name_or_column="lat(deg)",
        unit=u.km,
    ),
]

Now we are ready to extract the data and put it into the variables. The extract_variables function return a dictionary holding Variables based on the extraction_infos.

In [None]:
variables = ep.extract_variables_from_files(
    start_time,
    end_time,
    "single_file",
    data_path=".",
    file_name_stem="example_orbit.csv",
    extraction_infos=extraction_infos,
)

print(variables.keys())
print(variables["Epoch"].metadata)
print(variables["Epoch"].get_data())

## Download and load a cdf file

This example will show you how to inspect a CDF file and how to extract variables from it using a *SourceFile*.

First thing you want to do is knowing what are the contents of the CDF file. For this, a script is provided by EL-PASO, which prints a table with all relevant information:

In [None]:
import sys

sys.path.append("../")

from scripts.inspect_cdf_file import inspect_cdf_file

inspect_cdf_file("rbspa_rel04_ect-hope-pa-l3_20170708_v7.4.0.cdf")

The next step is to think which variables you want to use for the processing and translating them to EL-PASO ExtractionInfos.

In [None]:
from astropy import units as u

import el_paso as ep

extraction_infos = [
    ep.ExtractionInfo(
        result_key="Epoch",
        name_or_column="Epoch_Ele",
        unit=u.tt2000,
    ),
    ep.ExtractionInfo(
        result_key="Energy_FEDU",
        name_or_column="HOPE_ENERGY_Ele",
        unit=u.eV,
    ),
    ep.ExtractionInfo(
        result_key="FEDU",
        name_or_column="FEDU",
        unit=(u.cm**2 * u.s * u.sr * u.keV) ** (-1),
    ),
]

In this example, we want to download the data from the server as this is how it is done for most data sets. The file_name_stem contains a pattern (YYYYMMDD) to describe the date of the file. While loading the data, this pattern will be replaced by the correct date. A similar pattern is used for the download url. The file_name_stem also contains the regex expression '.{6}', which is used to find files with different versions. The most up-to-date version will always be downloaded. 

In [None]:
from datetime import datetime, timezone

start_time = datetime(2017, 7, 8, tzinfo=timezone.utc)
end_time = datetime(2017, 7, 9, 23, 59, 59, tzinfo=timezone.utc)

file_name_stem = "rbspa_rel04_ect-hope-pa-l3_YYYYMMDD_.{6}.cdf"

ep.download(
    start_time,
    end_time,
    save_path=".",
    download_url="https://spdf.gsfc.nasa.gov/pub/data/rbsp/rbspa/l3/ect/hope/pitchangle/rel04/YYYY/",
    file_name_stem=file_name_stem,
    file_cadence="daily",
    method="request",
    skip_existing=True,
)

variables = ep.extract_variables_from_files(
    start_time, end_time, "daily", data_path=".", file_name_stem=file_name_stem, extraction_infos=extraction_infos
)
variables