1. Usage Overview

From the bin directory, run ./RealTrace [-options] with the following options:

-h, --help                         this help message
-i, --infile                       (required) input data file
-b, --parameter_bounds             (required) file(s) setting the type, step, bounds of the parameters
-c, --csv_config                   file that sets the columns that will be used from the input file
-o, --outdir                       specify output direction and do not use default
-t, --tolerance_maximization       absolute tolerance of maximization between optimization steps, default: 1e-10
-r, --rel_tolerance_joints         relative tolerance of joint calculation: default 1e-10
-n, --n_samples                    number of samples: default 10
-space, --search_space             search parameter space in {'log'|'linear'} space, default: 'log'
-noise, --noise_model              measurement noise of fp content {'scaled'|'const'} default: 'scaled'
-div, --cell_division_model        cell divison model {'binomial'|'gauss'} default: 'binomial'
-m, --maximize                     run maximization
-p, --predict                      run prediction
-s, --samples                      sample trajectories
-j, --joints                       run calculation of joint probabilities

Example: ./RealTrace -c csv_config.txt -b parameters.txt -i data/example.csv -o out/ -l 1 -t 1e-10 -m -p

Required arguments

infile sets the input file that contains the data, e.g., as provided by MOMA.
parameter_bounds sets the file that defines the parameter space.

Running modes

Likelihood maximization: -m, --maximize more details
Prediction: -p, --predict more details
Sample trajectories: -s, --samples more details
Run joints calculation: -j, --joints more details

Input file

The input file is assumed to fulfill the following:

The data points of a cell appear as consecutive rows and are in the correct order with respect to time.
The data set has to include all columns that are set via the csv_config file, i.e., time_col, length_col, fp_col.
The cells can be uniquely identified via the tags provided via parent_tags and cell_tags and each mother cell has at most 2 daughter cells. If that is not the case, the parent_tags and cell_tags are not sufficient and a warning will be printed.
To estimate the initial covariance matrix, the data set needs to contain at least (!) 2 cells.
An optional column may be added for the usage of segments, see below for more information.

Parameter file

How the different parameters are treated during the likelihood maximization is defined by the following syntax:

free_parameter = init, step
bound_parameter = init, step, lower, upper
fixed_parameter = init

An example file can look like this:

mean_lambda = 0.01, 1e-3
gamma_lambda = 0.01, 1e-3, 1e-4, 0.05
var_lambda = 1e-07

mean_q = 10, 1e-1
gamma_q = 0.01, 1e-3, 1e-4, 0.05
var_q = 1, 1e-2

beta = 5e-2

var_x = 1e-3, 1e-5
var_g = 1, 1e-3

var_dx = 1e-4, 1e-5
var_dg = 1, 1e-2

ALL parameters are restricted to positive numbers by default, avoiding unphysical/meaningless parameter ranges. However, this can be overwritten by setting bounds

During the maximization, the step will be the initial step size. From nlopt doc: "For derivative-free local-optimization algorithms, the optimizer must somehow decide on some initial step size to perturb x by when it begins the optimization. This step size should be big enough that the value of the objective changes significantly, but not too big if you want to find the local optimum nearest to x."

Model parameters: The 2 OU processes are described with a mean value (thus the mean growth/production rate), a gamma parameter determining how fast the process is driven towards its mean after a deviation, and a characteristic kick size (parameterized by the square kick sizes var_lambda and var_q) that scales the noise term.

Growth rate fluctuations parameters:
- mean_lambda
- gamma_lambda
- var_lambda
Fluorescence production fluctuation parameters:
- mean_q
- gamma_q
- var_q
Bleaching rate:
- beta
Cell division noise parameters:
- var_dx
- var_dg
Measurement noise parameters:
- var_x
- var_g

Optional arguments

(Defaults are in brackets.)

csv_config sets the file that contains information on which columns will be used from the input file (see 2.3.1)
tolerance_maximization (1e-10) sets the stopping criterion by setting the tolerance of maximization: Stop when an optimization step changes the function value by less than tolerance. By setting very low tolerances, one might encounter rounding issues; in that case, the last valid step is taken, and a warning is printed to stderr.
rel_tolerance_joints (1e-10) sets the stopping criterion for the joint calculation. The calculation is stopped when the cross-covariances between the two time points are smaller than the product of the corresponding means times the set tolerance. $\frac{\text{Cov}(z_{n+m}, z_n){i,j}}{ \langle z{n+m}\rangle_i \langle z_n\rangle_j} < \text{tolerance }$
outdir overwrites the default output directory, which is (given the infile dir/example.csv/) dir/example_out/
search_space (log) sets the search space of the parameters to be either in log space or linear space. The parameter file does not need to be changed as everything is done internally.
noise_model (scaled) defines how the measurement noise depends on the content of fluorescence proteins. const means that the measurement is constant with a variance var_g. scaled means the variance of the measurement scales linearly with the fluorescence protein content. In this case, var_g is the prefactor of the scaling.
cell_division (binomial) defines the model for cell division. binomial splits the FP content according to the cell sizes of the daughter cells and binomial sampling. In this case, the parameter var_dg is the conversion factor between the FP input and the physical number of independent molecules that can be distributed across cells. gauss refers to a model where the FP contents of the daughter cells are drawn from a Gaussian with variance var_dg centered around half of the mother cell FP content

Csv_config file

Example:

time_col = time_min
rescale_time = 60
length_col = length_um
fp_col = GFP
cell_tags = date, cell_id
parent_tags = date, parent_id

The following settings define how the input file will be interpreted. (Defaults are in brackets.)

time_col (time): column from which the time is read
rescale_time (1): the factor by which time will be divided at the start, thus changing the time unit (e.g. rescale_time=60 may change the time unit from sec to min)
length_col (length): column from which the length of the cell is read
length_islog (false): indicates if the cell length in the data file is in logscale (true) or not (false)
fp_col (gfp): column from which the fluorescence protein content is read
delm (,): delimiter between columns, probably ',' or ';'
segment_col (): column from which the segment index is read. Not setting segment_col in the file indicates that segment indices will not be used
filter_col (): column from which the filter will be read. To include a data point, set the entry in this column to True, true, TRUE or 1 and to EXclude a data point, set the entry in this column to False, false, FALSE or 0. Not setting filter_col in the file indicates that the input file will not be filtered
cell_tags (cell_id): columns that will make up the unique cell id, separated by ','
parent_tags (parent_id): columns that will make up the unique cell id of the parent cell ','

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Usage Overview

Required arguments

Running modes

Input file

Parameter file

Optional arguments

Csv_config file

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally