[IPython Notebook](datasets.ipynb) |  [Python Script](datasets.py)

Datasets
============================

Datasets tell PHOEBE how and at what times to compute the model.  In some cases these will include the actual observational data, and in other cases may only include the times at which you want to compute a synthetic model.

Adding a dataset - even if it doesn't contain any observational data - is required in order to compute a synthetic model (which will be described in the following [Compute Tutorial](compute).

Setup
-----------------------------

As always, let's do imports and initialize a logger and a new Bundle.  See [Building a System](building_a_system.html) for more details.

In [1]:
import phoebe2
from phoebe2 import u # units
import numpy as np
import matplotlib.pyplot as plt

logger = phoebe2.logger(clevel='INFO')

b = phoebe2.Bundle.default_binary()

  return super(Quantity, self).__eq__(other)


Adding a Dataset from Arrays
--------------------------------------

To add a dataset, you need to provide the function in
phoebe2.parameters.dataset for the particular type of data you're dealing with, as well
as any of your "observed" arrays.

The available methods include:

* ORB (orbit/positional data)
* MESH (discretized mesh of stars)
* LC (light curve)
* RV (radial velocity)
* ETV (eclipse timing variations)
* more coming soon

### Without Observations

The simplest case of adding a dataset is when you do not have observational "data" and only want to compute a synthetic model.  Here all you need to provide is an array of times and information about the type of data and how to compute it.

This situation will almost always be the case for orbits and meshes - its unlikely that you have observed positions and velocities for each of your components, but you still may like to store that information for plotting or diagnostic purposes.

Here we'll do just that - we'll add an orbit dataset which will track the positions and velocities of both our 'primary' and 'secondary' stars (by their component tags) at each of the provided times.

In [2]:
b.add_dataset(phoebe2.dataset.orb, time=np.linspace(0,10,20), dataset='orb01', component=['primary', 'secondary'])

<ParameterSet: 3 parameters | components: _default, primary, secondary>

As you could probably predict by now, add_dataset can either take a function or the name of a function in phoebe2.parameters.dataset.  The following line would do the same thing (except we'll give it a new dataset tag to avoid throwing an error).

In [3]:
b.add_dataset('ORB', time=np.linspace(0,10,20), dataset='orb02', component=['primary', 'secondary'])

<ParameterSet: 3 parameters | components: _default, primary, secondary>

Note that dataset methods are referred to in uppercase.  Providing 'orb' here instead of 'ORB' will work, but in all cases we'll use the official uppercase designation.

You may notice that add_dataset does take some time to complete.  In the background, the passband is being loaded (when applicable) and many parameters are created and attached to the Bundle.

If you do not provide a list of component(s), they will be assumed for you based on the dataset method.  Light curves and meshes can only attach at the system level (component=None), for instance, whereas RVs and ETVs can attach for each star.

In [4]:
b.add_dataset('RV', time=np.linspace(0,10,20), dataset='rv01')

<ParameterSet: 24 parameters | methods: RV, RV_dep>

In [5]:
print b.filter(dataset='rv01').components

['_default', 'primary', 'secondary']


Here we'll add an RV dataset and see that it was automatically created for both stars in our system, as well as for '\_default'.  The default parameters hold the values that will be replicated if a new component is added to the system in the future.

Since we did not explicitly state that we only wanted the primary and secondary components, the time array on '\_default' is filled as well.  If we were then to add a tertiary component, its RVs would automatically be computed because of this replicated time array.

In [6]:
print b['time@_default@rv01']

Qualifier: time
Description: Observed times
Value: [  0.           0.52631579   1.05263158 ...,   8.94736842
   9.47368421  10.        ] d
Constrained by: 
Constrains: None
Related to: None


For the 'ORB' datasets defined earlier, on the other hand, we explicitly provided components.  In that case the '\_default' times will be empty - adding a new component will result in this empty array being replicated and the orbit will NOT be computed for the tertiary component.  Of course, you could always manually copy the time array at a later time if you wanted the orbit to be computed..

In [7]:
print b['time@_default@orb01']

Qualifier: time
Description: Observed times
Value: [] d
Constrained by: 
Constrains: None
Related to: None


For more information on self-replicating parameters, see the "copy for" section in the [General Concepts Tutorial](general_concepts)

### With Observations

Loading datasets with observations is (nearly) as simple.  

Passing arrays to any of the dataset columns will apply it to all of the same components in which the time will be applied (see the 'Without Observations' section above for more details).  This make perfect sense for fluxes in light curves where the time and flux arrays are both at the system level:

In [8]:
b.add_dataset('LC', time=[0,1], flux=[1,0.5], dataset='lc01')

<ParameterSet: 18 parameters | methods: LC, LC_dep>

In [9]:
print b['flux@lc01@dataset']

Qualifier: flux
Description: Observed flux
Value: [ 1.   0.5] W / m3
Constrained by: 
Constrains: None
Related to: None


For datasets which attach to individual components, however, this isn't always the desired behavior.

For a single-lined RV where we only attach to one component, everything is as expected.

In [10]:
b.add_dataset('RV', time=[0,1], rv=[-3,3], dataset='rv02', component='primary')

<ParameterSet: 24 parameters | methods: RV, RV_dep>

In [11]:
print b['rv@rv02']

rv@_default@rv02@dataset: [] km / s
rv@primary@rv02@dataset: [-3.  3.] km / s
rv@secondary@rv02@dataset: [] km / s


However, for a double-lined RV we probably **don't** want to do the following:

In [12]:
b.add_dataset('RV', time=[0,1], rv=[-3,3], dataset='rv03')

<ParameterSet: 24 parameters | methods: RV, RV_dep>

In [13]:
print b['rv@rv03']

rv@_default@rv03@dataset: [-3.  3.] km / s
rv@primary@rv03@dataset: [-3.  3.] km / s
rv@secondary@rv03@dataset: [-3.  3.] km / s


Instead we want to pass different arrays to the 'rv@primary' and 'rv@secondary'.  This can be done by explicitly stating the components in a dictionary sent to that argument:

In [14]:
b.add_dataset('RV', time=[0,1], rv={'primary': [-3,3], 'secondary': [4,-4]}, dataset='rv04')

<ParameterSet: 24 parameters | methods: RV, RV_dep>

In [15]:
print b['rv@rv04']

rv@_default@rv04@dataset: [] km / s
rv@primary@rv04@dataset: [-3.  3.] km / s
rv@secondary@rv04@dataset: [ 4. -4.] km / s


Alternatively, you could of course not pass the values while calling add_dataset and instead call the set_value method after and explicitly state the components at that time.

### With Passband Options

Passband options follow the exact same rules as dataset columns.

Sending a single value to the argument will apply it to *each* component in which the time array is attached (either based on the list of components sent or the defaults from the dataset method).

Note that for light curves, in particular, this rule gets slightly bent.  The dataset arrays for light curves are attached at the system level, *always*.  The passband-dependent options, however, exist for each star in the system.  So, that value will get passed to each star if the component is not explicitly provided.

In [16]:
b.add_dataset('LC', time=[0,1], ld_coeffs=[0,0], dataset='lc02')

<ParameterSet: 18 parameters | methods: LC, LC_dep>

In [17]:
print b['time@lc02']

Qualifier: time
Description: Observed times
Value: [ 0.  1.] d
Constrained by: 
Constrains: None
Related to: None


In [18]:
print b['ld_coeffs@lc02']

ld_coeffs@_default@lc02@dataset: [ 0.  0.]
ld_coeffs@primary@lc02@dataset: [ 0.  0.]
ld_coeffs@secondary@lc02@dataset: [ 0.  0.]


As you might expect, if you want to pass different values to different components, simply provide them in a dictionary.

In [19]:
b.add_dataset('LC', time=[0,1], ld_coeffs={'primary': [0,0], 'secondary': [0.25, 0.25]}, dataset='lc03')

<ParameterSet: 18 parameters | methods: LC, LC_dep>

In [20]:
print b['ld_coeffs@lc03']

ld_coeffs@_default@lc03@dataset: [ 0.5  0.5]
ld_coeffs@primary@lc03@dataset: [ 0.  0.]
ld_coeffs@secondary@lc03@dataset: [ 0.25  0.25]


Note here that we didn't explicitly override the defaults for '\_default', so they used the phoebe-wide defaults.  If you wanted to set a value for the ld_coeffs of any star added in the future, you would have to provide a value for '\_default' in the dictionary as well.

This syntax may seem a bit bulky - but alternatively you can add the dataset without providing values and then change the values individually using dictionary access or set_value.

Adding a Dataset from a File
-------------------------------------

### Manually from Arrays

For now, the only way to load data from a file is to do the parsing externally and pass the arrays on (as in the previous section).

Here we'll load times, fluxes, and errors of a light curve from an external file and then pass them on to a newly created dataset.  Since this is a light curve, it will automatically know that you want the summed light from all copmonents in the hierarchy.

In [21]:
times, fluxes, errors = np.loadtxt('test.lc.in', unpack=True)
b.add_dataset(phoebe2.dataset.lc, time=times, flux=fluxes, sigma=errors, dataset='lc04')

<ParameterSet: 18 parameters | methods: LC, LC_dep>

### Directly Parsing File

COMING SOON - maybe

Enabling and Disabling Datasets
---------------------------------------

COMING SOON - probably just point to the next tutorial (compute)?

Dealing with Phases
-------------------------------

Datasets will no longer accept phases.  It is the user's responsibility to convert
phased data into times given an ephemeris.  But it's still useful to be able to
convert times to phases (and vice versa) and be able to plot in phase.

The following functions handle those conversions:

In [22]:
print b.get_ephemeris()

{'dpdt': 0.0, 'phshift': 0.0, 'period': 3.0, 't0': 0.0}


In [23]:
print b.to_phase(0.0)

0.0


In [24]:
print b.to_time(-0.25)

-0.75


All of these by default use the period in the top-level of the current hierarchy,
but accept a component keyword argument if you'd like the ephemeris of an
inner-orbit or the rotational ephemeris of a star in the system.

We'll see how plotting works later, but if you manually wanted to plot the dataset
with phases, all you'd need to do is:

In [25]:
print b.to_phase(b['time@primary@orb01'])

[ 0.          0.1754386   0.35087719  0.52631579  0.70175439  0.87719298
  0.05263158  0.22807018  0.40350877  0.57894737  0.75438596  0.92982456
  0.10526316  0.28070175  0.45614035  0.63157895  0.80701754  0.98245614
  0.15789474  0.33333333]


or

In [26]:
print b.to_phase('time@primary@orb01')

[ 0.          0.1754386   0.35087719  0.52631579  0.70175439  0.87719298
  0.05263158  0.22807018  0.40350877  0.57894737  0.75438596  0.92982456
  0.10526316  0.28070175  0.45614035  0.63157895  0.80701754  0.98245614
  0.15789474  0.33333333]


* TODO: show how to load data that was phased (not yet supported)
* TODO: point to how to plot phased data

Removing Datasets
----------------------

Removing a dataset will remove matching parameters in either the dataset, model, or constraint contexts.  This action is permanent and not undo-able via [Undo/Redo](undo_redo).

In [27]:
b.datasets

['rv04',
 'rv01',
 'rv02',
 'rv03',
 'orb02',
 'orb01',
 'lc03',
 'lc02',
 'lc01',
 'lc04']

The simplest way to remove a dataset is by its dataset tag:

In [28]:
b.remove_dataset('lc01')

In [29]:
b.datasets

['rv04', 'rv01', 'rv02', 'rv03', 'orb02', 'orb01', 'lc03', 'lc02', 'lc04']

But remove_dataset also takes any other tag(s) that could be sent to filter.

**WARNING**: since remove_dataset removes from contexts other than just 'dataset', this will remove **all** Parameters that match the filter provided.  There are some precautions in place that will raise Errors if you leave the filter blank or try to pass a qualifier, but it isn't a bad idea to test the filter first to ensure you aren't removing Parameters that you don't want removed.

In [30]:
b.remove_dataset(method='RV')

In [31]:
b.datasets

['orb02', 'lc02', 'orb01', 'lc03', 'lc04']

Dataset Types
------------------------

For a full explanation of all related options and Parameter see the respective dataset tutorials:

* [ORB dataset](ORB)
* [MESH dataset](MESH)
* [LC dataset](LC)
* [RV dataset](RV)
* [ETV dataset](ETV)