# Add your own solar irradiance spectrum dataset
This tutorial illustrates how to add your own solar irradiance spectrum dataset from a data file that includes wavelength and solar spectral irradiance values, and use it within Eradiate.
<div class="alert alert-info">

This tutorial is for advanced users.

</div>

## Create the dataset object

Say your custom solar irradiance spectrum data is saved in a [commat-separated values file](https://en.wikipedia.org/wiki/Comma-separated_values) called `my_data.csv` with wavelength values in the first column and solar spectral irradiance values in the second column.
You would like to be able to use it within Eradiate.
For that, you need to convert this `csv` file into a `netcdf` file with the right format for Eradiate.
Here is how that can be achieved.
First, we read our data into a `DataFrame` object from the [pandas](https://pandas.pydata.org/) library:

In [1]:
import pandas as pd
df = pd.read_csv("my_data.csv", header=1, names=["w", "ssi"])

Next, we create a `Dataset` object with the values of wavelength and solar spectral irradiance that we have just read.
We create the data variable `ssi` (for **solar spectral irradiance**) with the dimension `w` (for **wavelength**) and the required metadata (including units).
The dataset must have two coordinates: `w` and `t` (for **time**) with corresponding metadata.
Our data does not include the time dimension so we just set the time coordinate to an empty array with 0 dimension.
Finally, set the attributes (`attrs`) of our dataset, including a nice title!
Refer to the
[CF-1.8 convention document](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#description-of-file-contents)
for the meaning of these attributes.

In [2]:
import datetime
import numpy as np
import xarray as xr

ds = xr.Dataset(
    data_vars={
        "ssi": ("w", df.ssi.values, {
            "standard_name": "solar_irradiance_per_unit_wavelength",
            "long_name": "solar spectral irradiance",
            "units": "W/m^2/nm"})
    },
    coords={
        "w": ("w", df.w.values, {
            "standard_name": "wavelength",
            "long_name": "wavelength",
            "units": "nm"}),
        "t": ("t", np.empty(0), {
            "standard_name": "time",
            "long_name": "time"})
    },
    attrs={
        "title": "My awesome dataset",
        "convention": "CF-1.8",
        "history": f"{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')} - data set creation - path/to/my_script.py",
        "source": "My custom observation data",
        "references": "My article, doi:10.1000/xyz123"
    }
)
display(ds)

## Validate the dataset's metadata

Before going any further, we must validate the metadata of the dataset we've created.

<div class="alert alert-info">

The metadata of Eradiate's solar irradiance spectrum datasets must follow strict specifications defined by the `ssi_dataset_spec` variable in the [xarray](../../../rst/api_reference/generated/eradiate.util.xarray.rst) module.
    
</div>

In [3]:
from eradiate.util.xarray import ssi_dataset_spec

We can validate our dataset's metadata by running:

In [4]:
ds.ert.validate_metadata(ssi_dataset_spec)

### Normalisation

A lazier way to define the dataset is to omit the `standard_name` and `long_name` metadata fields and **normalise** the dataset's metadata, which will add the missing fields:

In [5]:
ds = xr.Dataset(
    data_vars={
        "ssi": ("w", df.ssi.values, {"units": "W/m^2/nm"})},
    coords={
        "w": ("w", df.w.values, {"units": "nm"}),
        "t": ("t", np.empty(0))
    },
    attrs={
        "title": "My awesome dataset",
        "convention": "CF-1.8",
        "history": f"{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')} - data set creation - this_toolchain version 0.1",
        "source": "Observation data from this instrument",
        "references": "My article, doi:10.1000/xyz123"
    }
)
ds.ert.normalize_metadata(ssi_dataset_spec)
display(ds)

This is not recommended but we could even have omitted the `units` field because our `ssi` and `w` have the same units as the default units in `ssi_dataset_spec`: 

In [6]:
display(ssi_dataset_spec.var_specs["ssi"].schema["units"])
display(ssi_dataset_spec.coord_specs["w"].schema["units"])

{'allowed': ['W/m^2/nm'], 'default': 'W/m^2/nm', 'required': True}

{'allowed': ['nm'], 'default': 'nm', 'required': True}

### Optional attributes

If your data comes from observation, you may want to indicate the observation start date and end date in the dataset attributes.
This information is useful to indicate in what range of dates the dataset can be considered as an accurate representation of the actual solar irradiance spectrum.
Use the `obs_start` and `obs_end` attributes to indicate those dates.
If applicable, use the `url` attributes to indicate the url where the raw data can be downloaded.
Use the `comment` attribute to add miscellaneous information, e.g. some processing that you performed on the raw data.
Finally, you can create any other attribute that you wish, provided its name does not conflict with an existing attribute.

In [7]:
ds.attrs["obs_start"] = str(datetime.date(1992, 3, 24))
ds.attrs["obs_end"] = str(datetime.date(1992, 4, 2))
ds.attrs["url"] = f"https://this.is.where.the.data.can.be.downloaded (last accessed on {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')})"
ds.attrs["comment"] = "the original data was re-binned in larger 2nm-wide wavelength bins."
ds.attrs["_my_attribute"] = "Other info"
display(ds)

## Save the dataset to a netcdf file

We are not done yet!
To register your dataset in the list of available datasets of Eradiate, you must first save the dataset to a netcdf file.
It is recommended to save the dataset in `$ERADIATE_DIR/resources/data/spectra/solar_irradiance`:

In [8]:
import os
%cd {os.environ["ERADIATE_DIR"]}/resources/data/spectra/solar_irradiance/
ds.to_netcdf("my_awesome_dataset.nc")

/Users/yvan/Documents/src/eradiate/eradiate/resources/data/spectra/solar_irradiance


If you list the files in that folder, you should see your newly added netcdf file next to the Eradiate's predefined solar irradiance spectrum datasets:

In [9]:
%ls

blackbody_sun.nc           thuillier_2003.nc
meftah_2017.nc             whi_2008_time_period_1.nc
my_awesome_dataset.nc      whi_2008_time_period_2.nc
solid_2017_mean.nc         whi_2008_time_period_3.nc


## Use your own solar irradiance spectrum dataset

To use your own solar irradiance spectrum dataset, you must "hack" Eradiate's solar irradiance data getter and add the path to your dataset to the list of registered paths:

In [10]:
from eradiate.data.solar_irradiance_spectra import _SolarIrradianceGetter
_SolarIrradianceGetter._PATHS["my_awesome_dataset"] = "spectra/solar_irradiance/my_awesome_dataset.nc"

Now, you are able to use your own solar irradiance spectrum within Eradiate.
The following code illustrates how to define a directional illumination scene element based on the custom solar irradiance spectrum:

In [11]:
from eradiate.scenes.illumination import DirectionalIllumination
from eradiate.scenes.spectra import SolarIrradianceSpectrum
DirectionalIllumination(irradiance=
    SolarIrradianceSpectrum(dataset="my_awesome_dataset")
)

DirectionalIllumination(id='illumination', zenith=<Quantity(0.0, 'degree')>, azimuth=<Quantity(0.0, 'degree')>, irradiance=SolarIrradianceSpectrum(id=None, dataset='my_awesome_dataset', scale=1.0))